1 | /*! |
---|---|

2 | Provides non-deterministic finite automata (NFA) and regex engines that use |

3 | them. |

4 | |

5 | While NFAs and DFAs (deterministic finite automata) have equivalent *theoretical* |

6 | power, their usage in practice tends to result in different engineering trade |

7 | offs. While this isn't meant to be a comprehensive treatment of the topic, here |

8 | are a few key trade offs that are, at minimum, true for this crate: |

9 | |

10 | * NFAs tend to be represented sparsely where as DFAs are represented densely. |

11 | Sparse representations use less memory, but are slower to traverse. Conversely, |

12 | dense representations use more memory, but are faster to traverse. (Sometimes |

13 | these lines are blurred. For example, an `NFA` might choose to represent a |

14 | particular state in a dense fashion, and a DFA can be built using a sparse |

15 | representation via [`sparse::DFA`](crate::dfa::sparse::DFA). |

16 | * NFAs have espilon transitions and DFAs don't. In practice, this means that |

17 | handling a single byte in a haystack with an NFA at search time may require |

18 | visiting multiple NFA states. In a DFA, each byte only requires visiting |

19 | a single state. Stated differently, NFAs require a variable number of CPU |

20 | instructions to process one byte in a haystack where as a DFA uses a constant |

21 | number of CPU instructions to process one byte. |

22 | * NFAs are generally easier to amend with secondary storage. For example, the |

23 | [`thompson::pikevm::PikeVM`] uses an NFA to match, but also uses additional |

24 | memory beyond the model of a finite state machine to track offsets for matching |

25 | capturing groups. Conversely, the most a DFA can do is report the offset (and |

26 | pattern ID) at which a match occurred. This is generally why we also compile |

27 | DFAs in reverse, so that we can run them after finding the end of a match to |

28 | also find the start of a match. |

29 | * NFAs take worst case linear time to build, but DFAs take worst case |

30 | exponential time to build. The [hybrid NFA/DFA](crate::hybrid) mitigates this |

31 | challenge for DFAs in many practical cases. |

32 | |

33 | There are likely other differences, but the bottom line is that NFAs tend to be |

34 | more memory efficient and give easier opportunities for increasing expressive |

35 | power, where as DFAs are faster to search with. |

36 | |

37 | # Why only a Thompson NFA? |

38 | |

39 | Currently, the only kind of NFA we support in this crate is a [Thompson |

40 | NFA](https://en.wikipedia.org/wiki/Thompson%27s_construction). This refers |

41 | to a specific construction algorithm that takes the syntax of a regex |

42 | pattern and converts it to an NFA. Specifically, it makes gratuitous use of |

43 | epsilon transitions in order to keep its structure simple. In exchange, its |

44 | construction time is linear in the size of the regex. A Thompson NFA also makes |

45 | the guarantee that given any state and a character in a haystack, there is at |

46 | most one transition defined for it. (Although there may be many epsilon |

47 | transitions.) |

48 | |

49 | It possible that other types of NFAs will be added in the future, such as a |

50 | [Glushkov NFA](https://en.wikipedia.org/wiki/Glushkov%27s_construction_algorithm). |

51 | But currently, this crate only provides a Thompson NFA. |

52 | */ |

53 | |

54 | #[cfg(feature = "nfa-thompson")] |

55 | pub mod thompson; |

56 |