| 1 | /*! |
| 2 | Provides direct access to NFA implementations of Aho-Corasick. |
| 3 | |
| 4 | The principle characteristic of an NFA in this crate is that it may |
| 5 | transition through multiple states per byte of haystack. In Aho-Corasick |
| 6 | parlance, NFAs follow failure transitions during a search. In contrast, |
| 7 | a [`DFA`](crate::dfa::DFA) pre-computes all failure transitions during |
| 8 | compilation at the expense of a much bigger memory footprint. |
| 9 | |
| 10 | Currently, there are two NFA implementations provided: noncontiguous and |
| 11 | contiguous. The names reflect their internal representation, and consequently, |
| 12 | the trade offs associated with them: |
| 13 | |
| 14 | * A [`noncontiguous::NFA`] uses a separate allocation for every NFA state to |
| 15 | represent its transitions in a sparse format. This is ideal for building an |
| 16 | NFA, since it cheaply permits different states to have a different number of |
| 17 | transitions. A noncontiguous NFA is where the main Aho-Corasick construction |
| 18 | algorithm is implemented. All other Aho-Corasick implementations are built by |
| 19 | first constructing a noncontiguous NFA. |
| 20 | * A [`contiguous::NFA`] is uses a single allocation to represent all states, |
| 21 | while still encoding most states as sparse states but permitting states near |
| 22 | the starting state to have a dense representation. The dense representation |
| 23 | uses more memory, but permits computing transitions during a search more |
| 24 | quickly. By only making the most active states dense (the states near the |
| 25 | starting state), a contiguous NFA better balances memory usage with search |
| 26 | speed. The single contiguous allocation also uses less overhead per state and |
| 27 | enables compression tricks where most states only use 8 bytes of heap memory. |
| 28 | |
| 29 | When given the choice between these two, you almost always want to pick a |
| 30 | contiguous NFA. It takes only a little longer to build, but both its memory |
| 31 | usage and search speed are typically much better than a noncontiguous NFA. A |
| 32 | noncontiguous NFA is useful when prioritizing build times, or when there are |
| 33 | so many patterns that a contiguous NFA could not be built. (Currently, because |
| 34 | of both memory and search speed improvements, a contiguous NFA has a smaller |
| 35 | internal limit on the total number of NFA states it can represent. But you |
| 36 | would likely need to have hundreds of thousands or even millions of patterns |
| 37 | before you hit this limit.) |
| 38 | */ |
| 39 | pub mod contiguous; |
| 40 | pub mod noncontiguous; |
| 41 | |