1 | /*! |
2 | Defines a Thompson NFA and provides the [`PikeVM`](pikevm::PikeVM) and |
3 | [`BoundedBacktracker`](backtrack::BoundedBacktracker) regex engines. |
4 | |
5 | A Thompson NFA (non-deterministic finite automaton) is arguably _the_ central |
6 | data type in this library. It is the result of what is commonly referred to as |
7 | "regex compilation." That is, turning a regex pattern from its concrete syntax |
8 | string into something that can run a search looks roughly like this: |
9 | |
10 | * A `&str` is parsed into a [`regex-syntax::ast::Ast`](regex_syntax::ast::Ast). |
11 | * An `Ast` is translated into a [`regex-syntax::hir::Hir`](regex_syntax::hir::Hir). |
12 | * An `Hir` is compiled into a [`NFA`]. |
13 | * The `NFA` is then used to build one of a few different regex engines: |
14 | * An `NFA` is used directly in the `PikeVM` and `BoundedBacktracker` engines. |
15 | * An `NFA` is used by a [hybrid NFA/DFA](crate::hybrid) to build out a DFA's |
16 | transition table at search time. |
17 | * An `NFA`, assuming it is one-pass, is used to build a full |
18 | [one-pass DFA](crate::dfa::onepass) ahead of time. |
19 | * An `NFA` is used to build a [full DFA](crate::dfa) ahead of time. |
20 | |
21 | The [`meta`](crate::meta) regex engine makes all of these choices for you based |
22 | on various criteria. However, if you have a lower level use case, _you_ can |
23 | build any of the above regex engines and use them directly. But you must start |
24 | here by building an `NFA`. |
25 | |
26 | # Details |
27 | |
28 | It is perhaps worth expanding a bit more on what it means to go through the |
29 | `&str`->`Ast`->`Hir`->`NFA` process. |
30 | |
31 | * Parsing a string into an `Ast` gives it a structured representation. |
32 | Crucially, the size and amount of work done in this step is proportional to the |
33 | size of the original string. No optimization or Unicode handling is done at |
34 | this point. This means that parsing into an `Ast` has very predictable costs. |
35 | Moreover, an `Ast` can be roundtripped back to its original pattern string as |
36 | written. |
37 | * Translating an `Ast` into an `Hir` is a process by which the structured |
38 | representation is simplified down to its most fundamental components. |
39 | Translation deals with flags such as case insensitivity by converting things |
40 | like `(?i:a)` to `[Aa]`. Translation is also where Unicode tables are consulted |
41 | to resolve things like `\p{Emoji}` and `\p{Greek}`. It also flattens each |
42 | character class, regardless of how deeply nested it is, into a single sequence |
43 | of non-overlapping ranges. All the various literal forms are thrown out in |
44 | favor of one common representation. Overall, the `Hir` is small enough to fit |
45 | into your head and makes analysis and other tasks much simpler. |
46 | * Compiling an `Hir` into an `NFA` formulates the regex into a finite state |
47 | machine whose transitions are defined over bytes. For example, an `Hir` might |
48 | have a Unicode character class corresponding to a sequence of ranges defined |
49 | in terms of `char`. Compilation is then responsible for turning those ranges |
50 | into a UTF-8 automaton. That is, an automaton that matches the UTF-8 encoding |
51 | of just the codepoints specified by those ranges. Otherwise, the main job of |
52 | an `NFA` is to serve as a byte-code of sorts for a virtual machine. It can be |
53 | seen as a sequence of instructions for how to match a regex. |
54 | */ |
55 | |
56 | #[cfg (feature = "nfa-backtrack" )] |
57 | pub mod backtrack; |
58 | mod builder; |
59 | #[cfg (feature = "syntax" )] |
60 | mod compiler; |
61 | mod error; |
62 | #[cfg (feature = "syntax" )] |
63 | mod literal_trie; |
64 | #[cfg (feature = "syntax" )] |
65 | mod map; |
66 | mod nfa; |
67 | #[cfg (feature = "nfa-pikevm" )] |
68 | pub mod pikevm; |
69 | #[cfg (feature = "syntax" )] |
70 | mod range_trie; |
71 | |
72 | pub use self::{ |
73 | builder::Builder, |
74 | error::BuildError, |
75 | nfa::{ |
76 | DenseTransitions, PatternIter, SparseTransitions, State, Transition, |
77 | NFA, |
78 | }, |
79 | }; |
80 | #[cfg (feature = "syntax" )] |
81 | pub use compiler::{Compiler, Config, WhichCaptures}; |
82 | |