2Defines a Thompson NFA and provides the [`PikeVM`](pikevm::PikeVM) and
3[`BoundedBacktracker`](backtrack::BoundedBacktracker) regex engines.
5A Thompson NFA (non-deterministic finite automaton) is arguably _the_ central
6data type in this library. It is the result of what is commonly referred to as
7"regex compilation." That is, turning a regex pattern from its concrete syntax
8string into something that can run a search looks roughly like this:
10* A `&str` is parsed into a [`regex-syntax::ast::Ast`](regex_syntax::ast::Ast).
11* An `Ast` is translated into a [`regex-syntax::hir::Hir`](regex_syntax::hir::Hir).
12* An `Hir` is compiled into a [`NFA`].
13* The `NFA` is then used to build one of a few different regex engines:
14 * An `NFA` is used directly in the `PikeVM` and `BoundedBacktracker` engines.
15 * An `NFA` is used by a [hybrid NFA/DFA](crate::hybrid) to build out a DFA's
16 transition table at search time.
17 * An `NFA`, assuming it is one-pass, is used to build a full
18 [one-pass DFA](crate::dfa::onepass) ahead of time.
19 * An `NFA` is used to build a [full DFA](crate::dfa) ahead of time.
21The [`meta`](crate::meta) regex engine makes all of these choices for you based
22on various criteria. However, if you have a lower level use case, _you_ can
23build any of the above regex engines and use them directly. But you must start
24here by building an `NFA`.
26# Details
28It is perhaps worth expanding a bit more on what it means to go through the
29`&str`->`Ast`->`Hir`->`NFA` process.
31* Parsing a string into an `Ast` gives it a structured representation.
32Crucially, the size and amount of work done in this step is proportional to the
33size of the original string. No optimization or Unicode handling is done at
34this point. This means that parsing into an `Ast` has very predictable costs.
35Moreover, an `Ast` can be roundtripped back to its original pattern string as
37* Translating an `Ast` into an `Hir` is a process by which the structured
38representation is simplified down to its most fundamental components.
39Translation deals with flags such as case insensitivity by converting things
40like `(?i:a)` to `[Aa]`. Translation is also where Unicode tables are consulted
41to resolve things like `\p{Emoji}` and `\p{Greek}`. It also flattens each
42character class, regardless of how deeply nested it is, into a single sequence
43of non-overlapping ranges. All the various literal forms are thrown out in
44favor of one common representation. Overall, the `Hir` is small enough to fit
45into your head and makes analysis and other tasks much simpler.
46* Compiling an `Hir` into an `NFA` formulates the regex into a finite state
47machine whose transitions are defined over bytes. For example, an `Hir` might
48have a Unicode character class corresponding to a sequence of ranges defined
49in terms of `char`. Compilation is then responsible for turning those ranges
50into a UTF-8 automaton. That is, an automaton that matches the UTF-8 encoding
51of just the codepoints specified by those ranges. Otherwise, the main job of
52an `NFA` is to serve as a byte-code of sorts for a virtual machine. It can be
53seen as a sequence of instructions for how to match a regex.
56#[cfg(feature = "nfa-backtrack")]
57pub mod backtrack;
58mod builder;
59#[cfg(feature = "syntax")]
60mod compiler;
61mod error;
62#[cfg(feature = "syntax")]
63mod literal_trie;
64#[cfg(feature = "syntax")]
65mod map;
66mod nfa;
67#[cfg(feature = "nfa-pikevm")]
68pub mod pikevm;
69#[cfg(feature = "syntax")]
70mod range_trie;
72pub use self::{
73 builder::Builder,
74 error::BuildError,
75 nfa::{
76 DenseTransitions, PatternIter, SparseTransitions, State, Transition,
77 NFA,
78 },
80#[cfg(feature = "syntax")]
81pub use compiler::{Compiler, Config, WhichCaptures};