1 | /*! |
2 | Provides direct access to NFA implementations of Aho-Corasick. |
3 | |
4 | The principle characteristic of an NFA in this crate is that it may |
5 | transition through multiple states per byte of haystack. In Aho-Corasick |
6 | parlance, NFAs follow failure transitions during a search. In contrast, |
7 | a [`DFA`](crate::dfa::DFA) pre-computes all failure transitions during |
8 | compilation at the expense of a much bigger memory footprint. |
9 | |
10 | Currently, there are two NFA implementations provided: noncontiguous and |
11 | contiguous. The names reflect their internal representation, and consequently, |
12 | the trade offs associated with them: |
13 | |
14 | * A [`noncontiguous::NFA`] uses a separate allocation for every NFA state to |
15 | represent its transitions in a sparse format. This is ideal for building an |
16 | NFA, since it cheaply permits different states to have a different number of |
17 | transitions. A noncontiguous NFA is where the main Aho-Corasick construction |
18 | algorithm is implemented. All other Aho-Corasick implementations are built by |
19 | first constructing a noncontiguous NFA. |
20 | * A [`contiguous::NFA`] is uses a single allocation to represent all states, |
21 | while still encoding most states as sparse states but permitting states near |
22 | the starting state to have a dense representation. The dense representation |
23 | uses more memory, but permits computing transitions during a search more |
24 | quickly. By only making the most active states dense (the states near the |
25 | starting state), a contiguous NFA better balances memory usage with search |
26 | speed. The single contiguous allocation also uses less overhead per state and |
27 | enables compression tricks where most states only use 8 bytes of heap memory. |
28 | |
29 | When given the choice between these two, you almost always want to pick a |
30 | contiguous NFA. It takes only a little longer to build, but both its memory |
31 | usage and search speed are typically much better than a noncontiguous NFA. A |
32 | noncontiguous NFA is useful when prioritizing build times, or when there are |
33 | so many patterns that a contiguous NFA could not be built. (Currently, because |
34 | of both memory and search speed improvements, a contiguous NFA has a smaller |
35 | internal limit on the total number of NFA states it can represent. But you |
36 | would likely need to have hundreds of thousands or even millions of patterns |
37 | before you hit this limit.) |
38 | */ |
39 | pub mod contiguous; |
40 | pub mod noncontiguous; |
41 | |