2Provides non-deterministic finite automata (NFA) and regex engines that use
5While NFAs and DFAs (deterministic finite automata) have equivalent *theoretical*
6power, their usage in practice tends to result in different engineering trade
7offs. While this isn't meant to be a comprehensive treatment of the topic, here
8are a few key trade offs that are, at minimum, true for this crate:
10* NFAs tend to be represented sparsely where as DFAs are represented densely.
11Sparse representations use less memory, but are slower to traverse. Conversely,
12dense representations use more memory, but are faster to traverse. (Sometimes
13these lines are blurred. For example, an `NFA` might choose to represent a
14particular state in a dense fashion, and a DFA can be built using a sparse
15representation via [`sparse::DFA`](crate::dfa::sparse::DFA).
16* NFAs have espilon transitions and DFAs don't. In practice, this means that
17handling a single byte in a haystack with an NFA at search time may require
18visiting multiple NFA states. In a DFA, each byte only requires visiting
19a single state. Stated differently, NFAs require a variable number of CPU
20instructions to process one byte in a haystack where as a DFA uses a constant
21number of CPU instructions to process one byte.
22* NFAs are generally easier to amend with secondary storage. For example, the
23[`thompson::pikevm::PikeVM`] uses an NFA to match, but also uses additional
24memory beyond the model of a finite state machine to track offsets for matching
25capturing groups. Conversely, the most a DFA can do is report the offset (and
26pattern ID) at which a match occurred. This is generally why we also compile
27DFAs in reverse, so that we can run them after finding the end of a match to
28also find the start of a match.
29* NFAs take worst case linear time to build, but DFAs take worst case
30exponential time to build. The [hybrid NFA/DFA](crate::hybrid) mitigates this
31challenge for DFAs in many practical cases.
33There are likely other differences, but the bottom line is that NFAs tend to be
34more memory efficient and give easier opportunities for increasing expressive
35power, where as DFAs are faster to search with.
37# Why only a Thompson NFA?
39Currently, the only kind of NFA we support in this crate is a [Thompson
40NFA](https://en.wikipedia.org/wiki/Thompson%27s_construction). This refers
41to a specific construction algorithm that takes the syntax of a regex
42pattern and converts it to an NFA. Specifically, it makes gratuitous use of
43epsilon transitions in order to keep its structure simple. In exchange, its
44construction time is linear in the size of the regex. A Thompson NFA also makes
45the guarantee that given any state and a character in a haystack, there is at
46most one transition defined for it. (Although there may be many epsilon
49It possible that other types of NFAs will be added in the future, such as a
50[Glushkov NFA](https://en.wikipedia.org/wiki/Glushkov%27s_construction_algorithm).
51But currently, this crate only provides a Thompson NFA.
54#[cfg(feature = "nfa-thompson")]
55pub mod thompson;