| 1 | /*! |
| 2 | Defines a Thompson NFA and provides the [`PikeVM`](pikevm::PikeVM) and |
| 3 | [`BoundedBacktracker`](backtrack::BoundedBacktracker) regex engines. |
| 4 | |
| 5 | A Thompson NFA (non-deterministic finite automaton) is arguably _the_ central |
| 6 | data type in this library. It is the result of what is commonly referred to as |
| 7 | "regex compilation." That is, turning a regex pattern from its concrete syntax |
| 8 | string into something that can run a search looks roughly like this: |
| 9 | |
| 10 | * A `&str` is parsed into a [`regex-syntax::ast::Ast`](regex_syntax::ast::Ast). |
| 11 | * An `Ast` is translated into a [`regex-syntax::hir::Hir`](regex_syntax::hir::Hir). |
| 12 | * An `Hir` is compiled into a [`NFA`]. |
| 13 | * The `NFA` is then used to build one of a few different regex engines: |
| 14 | * An `NFA` is used directly in the `PikeVM` and `BoundedBacktracker` engines. |
| 15 | * An `NFA` is used by a [hybrid NFA/DFA](crate::hybrid) to build out a DFA's |
| 16 | transition table at search time. |
| 17 | * An `NFA`, assuming it is one-pass, is used to build a full |
| 18 | [one-pass DFA](crate::dfa::onepass) ahead of time. |
| 19 | * An `NFA` is used to build a [full DFA](crate::dfa) ahead of time. |
| 20 | |
| 21 | The [`meta`](crate::meta) regex engine makes all of these choices for you based |
| 22 | on various criteria. However, if you have a lower level use case, _you_ can |
| 23 | build any of the above regex engines and use them directly. But you must start |
| 24 | here by building an `NFA`. |
| 25 | |
| 26 | # Details |
| 27 | |
| 28 | It is perhaps worth expanding a bit more on what it means to go through the |
| 29 | `&str`->`Ast`->`Hir`->`NFA` process. |
| 30 | |
| 31 | * Parsing a string into an `Ast` gives it a structured representation. |
| 32 | Crucially, the size and amount of work done in this step is proportional to the |
| 33 | size of the original string. No optimization or Unicode handling is done at |
| 34 | this point. This means that parsing into an `Ast` has very predictable costs. |
| 35 | Moreover, an `Ast` can be roundtripped back to its original pattern string as |
| 36 | written. |
| 37 | * Translating an `Ast` into an `Hir` is a process by which the structured |
| 38 | representation is simplified down to its most fundamental components. |
| 39 | Translation deals with flags such as case insensitivity by converting things |
| 40 | like `(?i:a)` to `[Aa]`. Translation is also where Unicode tables are consulted |
| 41 | to resolve things like `\p{Emoji}` and `\p{Greek}`. It also flattens each |
| 42 | character class, regardless of how deeply nested it is, into a single sequence |
| 43 | of non-overlapping ranges. All the various literal forms are thrown out in |
| 44 | favor of one common representation. Overall, the `Hir` is small enough to fit |
| 45 | into your head and makes analysis and other tasks much simpler. |
| 46 | * Compiling an `Hir` into an `NFA` formulates the regex into a finite state |
| 47 | machine whose transitions are defined over bytes. For example, an `Hir` might |
| 48 | have a Unicode character class corresponding to a sequence of ranges defined |
| 49 | in terms of `char`. Compilation is then responsible for turning those ranges |
| 50 | into a UTF-8 automaton. That is, an automaton that matches the UTF-8 encoding |
| 51 | of just the codepoints specified by those ranges. Otherwise, the main job of |
| 52 | an `NFA` is to serve as a byte-code of sorts for a virtual machine. It can be |
| 53 | seen as a sequence of instructions for how to match a regex. |
| 54 | */ |
| 55 | |
| 56 | #[cfg (feature = "nfa-backtrack" )] |
| 57 | pub mod backtrack; |
| 58 | mod builder; |
| 59 | #[cfg (feature = "syntax" )] |
| 60 | mod compiler; |
| 61 | mod error; |
| 62 | #[cfg (feature = "syntax" )] |
| 63 | mod literal_trie; |
| 64 | #[cfg (feature = "syntax" )] |
| 65 | mod map; |
| 66 | mod nfa; |
| 67 | #[cfg (feature = "nfa-pikevm" )] |
| 68 | pub mod pikevm; |
| 69 | #[cfg (feature = "syntax" )] |
| 70 | mod range_trie; |
| 71 | |
| 72 | pub use self::{ |
| 73 | builder::Builder, |
| 74 | error::BuildError, |
| 75 | nfa::{ |
| 76 | DenseTransitions, PatternIter, SparseTransitions, State, Transition, |
| 77 | NFA, |
| 78 | }, |
| 79 | }; |
| 80 | #[cfg (feature = "syntax" )] |
| 81 | pub use compiler::{Compiler, Config, WhichCaptures}; |
| 82 | |