Struct regex_syntax::hir::Hir

source ·
pub struct Hir { /* private fields */ }
Expand description

A high-level intermediate representation (HIR) for a regular expression.

The HIR of a regular expression represents an intermediate step between its abstract syntax (a structured description of the concrete syntax) and compiled byte codes. The purpose of HIR is to make regular expressions easier to analyze. In particular, the AST is much more complex than the HIR. For example, while an AST supports arbitrarily nested character classes, the HIR will flatten all nested classes into a single set. The HIR will also “compile away” every flag present in the concrete syntax. For example, users of HIR expressions never need to worry about case folding; it is handled automatically by the translator (e.g., by translating (?i)A to [aA]).

If the HIR was produced by a translator that disallows invalid UTF-8, then the HIR is guaranteed to match UTF-8 exclusively.

This type defines its own destructor that uses constant stack space and heap space proportional to the size of the HIR.

The specific type of an HIR expression can be accessed via its kind or into_kind methods. This extra level of indirection exists for two reasons:

  1. Construction of an HIR expression must use the constructor methods on this Hir type instead of building the HirKind values directly. This permits construction to enforce invariants like “concatenations always consist of two or more sub-expressions.”
  2. Every HIR expression contains attributes that are defined inductively, and can be computed cheaply during the construction process. For example, one such attribute is whether the expression must match at the beginning of the text.

Also, an Hir’s fmt::Display implementation prints an HIR as a regular expression pattern string, and uses constant stack space and heap space proportional to the size of the Hir.

Implementations§

source§

impl Hir

source

pub fn kind(&self) -> &HirKind

Returns a reference to the underlying HIR kind.

source

pub fn into_kind(self) -> HirKind

Consumes ownership of this HIR expression and returns its underlying HirKind.

source

pub fn empty() -> Hir

Returns an empty HIR expression.

An empty HIR expression always matches, including the empty string.

source

pub fn literal(lit: Literal) -> Hir

Creates a literal HIR expression.

If the given literal has a Byte variant with an ASCII byte, then this method panics. This enforces the invariant that Byte variants are only used to express matching of invalid UTF-8.

source

pub fn class(class: Class) -> Hir

Creates a class HIR expression.

source

pub fn anchor(anchor: Anchor) -> Hir

Creates an anchor assertion HIR expression.

source

pub fn word_boundary(word_boundary: WordBoundary) -> Hir

Creates a word boundary assertion HIR expression.

source

pub fn repetition(rep: Repetition) -> Hir

Creates a repetition HIR expression.

source

pub fn group(group: Group) -> Hir

Creates a group HIR expression.

source

pub fn concat(exprs: Vec<Hir>) -> Hir

Returns the concatenation of the given expressions.

This flattens the concatenation as appropriate.

source

pub fn alternation(exprs: Vec<Hir>) -> Hir

Returns the alternation of the given expressions.

This flattens the alternation as appropriate.

source

pub fn dot(bytes: bool) -> Hir

Build an HIR expression for ..

A . expression matches any character except for \n. To build an expression that matches any character, including \n, use the any method.

If bytes is true, then this assumes characters are limited to a single byte.

source

pub fn any(bytes: bool) -> Hir

Build an HIR expression for (?s)..

A (?s). expression matches any character, including \n. To build an expression that matches any character except for \n, then use the dot method.

If bytes is true, then this assumes characters are limited to a single byte.

source

pub fn is_always_utf8(&self) -> bool

Return true if and only if this HIR will always match valid UTF-8.

When this returns false, then it is possible for this HIR expression to match invalid UTF-8.

source

pub fn is_all_assertions(&self) -> bool

Returns true if and only if this entire HIR expression is made up of zero-width assertions.

This includes expressions like ^$\b\A\z and even ((\b)+())*^, but not ^a.

source

pub fn is_anchored_start(&self) -> bool

Return true if and only if this HIR is required to match from the beginning of text. This includes expressions like ^foo, ^(foo|bar), ^foo|^bar but not ^foo|bar.

source

pub fn is_anchored_end(&self) -> bool

Return true if and only if this HIR is required to match at the end of text. This includes expressions like foo$, (foo|bar)$, foo$|bar$ but not foo$|bar.

source

pub fn is_line_anchored_start(&self) -> bool

Return true if and only if this HIR is required to match from the beginning of text or the beginning of a line. This includes expressions like ^foo, (?m)^foo, ^(foo|bar), ^(foo|bar), (?m)^foo|^bar but not ^foo|bar or (?m)^foo|bar.

Note that if is_anchored_start is true, then is_line_anchored_start will also be true. The reverse implication is not true. For example, (?m)^foo is line anchored, but not is_anchored_start.

source

pub fn is_line_anchored_end(&self) -> bool

Return true if and only if this HIR is required to match at the end of text or the end of a line. This includes expressions like foo$, (?m)foo$, (foo|bar)$, (?m)(foo|bar)$, foo$|bar$, (?m)(foo|bar)$, but not foo$|bar or (?m)foo$|bar.

Note that if is_anchored_end is true, then is_line_anchored_end will also be true. The reverse implication is not true. For example, (?m)foo$ is line anchored, but not is_anchored_end.

source

pub fn is_any_anchored_start(&self) -> bool

Return true if and only if this HIR contains any sub-expression that is required to match at the beginning of text. Specifically, this returns true if the ^ symbol (when multiline mode is disabled) or the \A escape appear anywhere in the regex.

source

pub fn is_any_anchored_end(&self) -> bool

Return true if and only if this HIR contains any sub-expression that is required to match at the end of text. Specifically, this returns true if the $ symbol (when multiline mode is disabled) or the \z escape appear anywhere in the regex.

source

pub fn is_match_empty(&self) -> bool

Return true if and only if the empty string is part of the language matched by this regular expression.

This includes a*, a?b*, a{0}, (), ()+, ^$, a|b?, \b and \B, but not a or a+.

source

pub fn is_literal(&self) -> bool

Return true if and only if this HIR is a simple literal. This is only true when this HIR expression is either itself a Literal or a concatenation of only Literals.

For example, f and foo are literals, but f+, (foo), foo(), `` are not (even though that contain sub-expressions that are literals).

source

pub fn is_alternation_literal(&self) -> bool

Return true if and only if this HIR is either a simple literal or an alternation of simple literals. This is only true when this HIR expression is either itself a Literal or a concatenation of only Literals or an alternation of only Literals.

For example, f, foo, a|b|c, and foo|bar|baz are alternation literals, but f+, (foo), foo(), `` are not (even though that contain sub-expressions that are literals).

Trait Implementations§

source§

impl Clone for Hir

source§

fn clone(&self) -> Hir

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Hir

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Display for Hir

Print a display representation of this Hir.

The result of this is a valid regular expression pattern string.

This implementation uses constant stack space and heap space proportional to the size of the Hir.

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Drop for Hir

A custom Drop impl is used for HirKind such that it uses constant stack space but heap space proportional to the depth of the total Hir.

source§

fn drop(&mut self)

Executes the destructor for this type. Read more
source§

impl PartialEq for Hir

source§

fn eq(&self, other: &Hir) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Eq for Hir

source§

impl StructuralPartialEq for Hir

Auto Trait Implementations§

§

impl Freeze for Hir

§

impl RefUnwindSafe for Hir

§

impl Send for Hir

§

impl Sync for Hir

§

impl Unpin for Hir

§

impl UnwindSafe for Hir

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T> ToString for T
where T: Display + ?Sized,

source§

default fn to_string(&self) -> String

Converts the given value to a String. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.