1 | /*! |
2 | Provides non-deterministic finite automata (NFA) and regex engines that use |
3 | them. |
4 | |
5 | While NFAs and DFAs (deterministic finite automata) have equivalent *theoretical* |
6 | power, their usage in practice tends to result in different engineering trade |
7 | offs. While this isn't meant to be a comprehensive treatment of the topic, here |
8 | are a few key trade offs that are, at minimum, true for this crate: |
9 | |
10 | * NFAs tend to be represented sparsely where as DFAs are represented densely. |
11 | Sparse representations use less memory, but are slower to traverse. Conversely, |
12 | dense representations use more memory, but are faster to traverse. (Sometimes |
13 | these lines are blurred. For example, an `NFA` might choose to represent a |
14 | particular state in a dense fashion, and a DFA can be built using a sparse |
15 | representation via [`sparse::DFA`](crate::dfa::sparse::DFA). |
16 | * NFAs have espilon transitions and DFAs don't. In practice, this means that |
17 | handling a single byte in a haystack with an NFA at search time may require |
18 | visiting multiple NFA states. In a DFA, each byte only requires visiting |
19 | a single state. Stated differently, NFAs require a variable number of CPU |
20 | instructions to process one byte in a haystack where as a DFA uses a constant |
21 | number of CPU instructions to process one byte. |
22 | * NFAs are generally easier to amend with secondary storage. For example, the |
23 | [`thompson::pikevm::PikeVM`] uses an NFA to match, but also uses additional |
24 | memory beyond the model of a finite state machine to track offsets for matching |
25 | capturing groups. Conversely, the most a DFA can do is report the offset (and |
26 | pattern ID) at which a match occurred. This is generally why we also compile |
27 | DFAs in reverse, so that we can run them after finding the end of a match to |
28 | also find the start of a match. |
29 | * NFAs take worst case linear time to build, but DFAs take worst case |
30 | exponential time to build. The [hybrid NFA/DFA](crate::hybrid) mitigates this |
31 | challenge for DFAs in many practical cases. |
32 | |
33 | There are likely other differences, but the bottom line is that NFAs tend to be |
34 | more memory efficient and give easier opportunities for increasing expressive |
35 | power, where as DFAs are faster to search with. |
36 | |
37 | # Why only a Thompson NFA? |
38 | |
39 | Currently, the only kind of NFA we support in this crate is a [Thompson |
40 | NFA](https://en.wikipedia.org/wiki/Thompson%27s_construction). This refers |
41 | to a specific construction algorithm that takes the syntax of a regex |
42 | pattern and converts it to an NFA. Specifically, it makes gratuitous use of |
43 | epsilon transitions in order to keep its structure simple. In exchange, its |
44 | construction time is linear in the size of the regex. A Thompson NFA also makes |
45 | the guarantee that given any state and a character in a haystack, there is at |
46 | most one transition defined for it. (Although there may be many epsilon |
47 | transitions.) |
48 | |
49 | It possible that other types of NFAs will be added in the future, such as a |
50 | [Glushkov NFA](https://en.wikipedia.org/wiki/Glushkov%27s_construction_algorithm). |
51 | But currently, this crate only provides a Thompson NFA. |
52 | */ |
53 | |
54 | #[cfg (feature = "nfa-thompson" )] |
55 | pub mod thompson; |
56 | |