| 1 | /*! |
| 2 | Provides non-deterministic finite automata (NFA) and regex engines that use |
| 3 | them. |
| 4 | |
| 5 | While NFAs and DFAs (deterministic finite automata) have equivalent *theoretical* |
| 6 | power, their usage in practice tends to result in different engineering trade |
| 7 | offs. While this isn't meant to be a comprehensive treatment of the topic, here |
| 8 | are a few key trade offs that are, at minimum, true for this crate: |
| 9 | |
| 10 | * NFAs tend to be represented sparsely where as DFAs are represented densely. |
| 11 | Sparse representations use less memory, but are slower to traverse. Conversely, |
| 12 | dense representations use more memory, but are faster to traverse. (Sometimes |
| 13 | these lines are blurred. For example, an `NFA` might choose to represent a |
| 14 | particular state in a dense fashion, and a DFA can be built using a sparse |
| 15 | representation via [`sparse::DFA`](crate::dfa::sparse::DFA). |
| 16 | * NFAs have espilon transitions and DFAs don't. In practice, this means that |
| 17 | handling a single byte in a haystack with an NFA at search time may require |
| 18 | visiting multiple NFA states. In a DFA, each byte only requires visiting |
| 19 | a single state. Stated differently, NFAs require a variable number of CPU |
| 20 | instructions to process one byte in a haystack where as a DFA uses a constant |
| 21 | number of CPU instructions to process one byte. |
| 22 | * NFAs are generally easier to amend with secondary storage. For example, the |
| 23 | [`thompson::pikevm::PikeVM`] uses an NFA to match, but also uses additional |
| 24 | memory beyond the model of a finite state machine to track offsets for matching |
| 25 | capturing groups. Conversely, the most a DFA can do is report the offset (and |
| 26 | pattern ID) at which a match occurred. This is generally why we also compile |
| 27 | DFAs in reverse, so that we can run them after finding the end of a match to |
| 28 | also find the start of a match. |
| 29 | * NFAs take worst case linear time to build, but DFAs take worst case |
| 30 | exponential time to build. The [hybrid NFA/DFA](crate::hybrid) mitigates this |
| 31 | challenge for DFAs in many practical cases. |
| 32 | |
| 33 | There are likely other differences, but the bottom line is that NFAs tend to be |
| 34 | more memory efficient and give easier opportunities for increasing expressive |
| 35 | power, where as DFAs are faster to search with. |
| 36 | |
| 37 | # Why only a Thompson NFA? |
| 38 | |
| 39 | Currently, the only kind of NFA we support in this crate is a [Thompson |
| 40 | NFA](https://en.wikipedia.org/wiki/Thompson%27s_construction). This refers |
| 41 | to a specific construction algorithm that takes the syntax of a regex |
| 42 | pattern and converts it to an NFA. Specifically, it makes gratuitous use of |
| 43 | epsilon transitions in order to keep its structure simple. In exchange, its |
| 44 | construction time is linear in the size of the regex. A Thompson NFA also makes |
| 45 | the guarantee that given any state and a character in a haystack, there is at |
| 46 | most one transition defined for it. (Although there may be many epsilon |
| 47 | transitions.) |
| 48 | |
| 49 | It possible that other types of NFAs will be added in the future, such as a |
| 50 | [Glushkov NFA](https://en.wikipedia.org/wiki/Glushkov%27s_construction_algorithm). |
| 51 | But currently, this crate only provides a Thompson NFA. |
| 52 | */ |
| 53 | |
| 54 | #[cfg (feature = "nfa-thompson" )] |
| 55 | pub mod thompson; |
| 56 | |