1/*!
2A module for building and searching with lazy deterministic finite automata
3(DFAs).
4
5Like other modules in this crate, lazy DFAs support a rich regex syntax with
6Unicode features. The key feature of a lazy DFA is that it builds itself
7incrementally during search, and never uses more than a configured capacity of
8memory. Thus, when searching with a lazy DFA, one must supply a mutable "cache"
9in which the actual DFA's transition table is stored.
10
11If you're looking for fully compiled DFAs, then please see the top-level
12[`dfa` module](crate::dfa).
13
14# Overview
15
16This section gives a brief overview of the primary types in this module:
17
18* A [`regex::Regex`] provides a way to search for matches of a regular
19expression using lazy DFAs. This includes iterating over matches with both the
20start and end positions of each match.
21* A [`dfa::DFA`] provides direct low level access to a lazy DFA.
22
23# Example: basic regex searching
24
25This example shows how to compile a regex using the default configuration
26and then use it to find matches in a byte string:
27
28```
29use regex_automata::{hybrid::regex::Regex, Match};
30
31let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
32let mut cache = re.create_cache();
33
34let haystack = "2018-12-24 2016-10-08";
35let matches: Vec<Match> = re.find_iter(&mut cache, haystack).collect();
36assert_eq!(matches, vec![
37 Match::must(0, 0..10),
38 Match::must(0, 11..21),
39]);
40# Ok::<(), Box<dyn std::error::Error>>(())
41```
42
43# Example: searching with multiple regexes
44
45The lazy DFAs in this module all fully support searching with multiple regexes
46simultaneously. You can use this support with standard leftmost-first style
47searching to find non-overlapping matches:
48
49```
50# if cfg!(miri) { return Ok(()); } // miri takes too long
51use regex_automata::{hybrid::regex::Regex, Match};
52
53let re = Regex::new_many(&[r"\w+", r"\S+"])?;
54let mut cache = re.create_cache();
55
56let haystack = "@foo bar";
57let matches: Vec<Match> = re.find_iter(&mut cache, haystack).collect();
58assert_eq!(matches, vec![
59 Match::must(1, 0..4),
60 Match::must(0, 5..8),
61]);
62# Ok::<(), Box<dyn std::error::Error>>(())
63```
64
65# When should I use this?
66
67Generally speaking, if you can abide the use of mutable state during search,
68and you don't need things like capturing groups or Unicode word boundary
69support in non-ASCII text, then a lazy DFA is likely a robust choice with
70respect to both search speed and memory usage. Note however that its speed
71may be worse than a general purpose regex engine if you don't select a good
72[prefilter](crate::util::prefilter).
73
74If you know ahead of time that your pattern would result in a very large DFA
75if it was fully compiled, it may be better to use an NFA simulation instead
76of a lazy DFA. Either that, or increase the cache capacity of your lazy DFA
77to something that is big enough to hold the state machine (likely through
78experimentation). The issue here is that if the cache is too small, then it
79could wind up being reset too frequently and this might decrease searching
80speed significantly.
81
82# Differences with fully compiled DFAs
83
84A [`hybrid::regex::Regex`](crate::hybrid::regex::Regex) and a
85[`dfa::regex::Regex`](crate::dfa::regex::Regex) both have the same capabilities
86(and similarly for their underlying DFAs), but they achieve them through
87different means. The main difference is that a hybrid or "lazy" regex builds
88its DFA lazily during search, where as a fully compiled regex will build its
89DFA at construction time. While building a DFA at search time might sound like
90it's slow, it tends to work out where most bytes seen during a search will
91reuse pre-built parts of the DFA and thus can be almost as fast as a fully
92compiled DFA. The main downside is that searching requires mutable space to
93store the DFA, and, in the worst case, a search can result in a new state being
94created for each byte seen, which would make searching quite a bit slower.
95
96A fully compiled DFA never has to worry about searches being slower once
97it's built. (Aside from, say, the transition table being so large that it
98is subject to harsh CPU cache effects.) However, of course, building a full
99DFA can be quite time consuming and memory hungry. Particularly when large
100Unicode character classes are used, which tend to translate into very large
101DFAs.
102
103A lazy DFA strikes a nice balance _in practice_, particularly in the
104presence of Unicode mode, by only building what is needed. It avoids the
105worst case exponential time complexity of DFA compilation by guaranteeing that
106it will only build at most one state per byte searched. While the worst
107case here can lead to a very high constant, it will never be exponential.
108
109# Syntax
110
111This module supports the same syntax as the `regex` crate, since they share the
112same parser. You can find an exhaustive list of supported syntax in the
113[documentation for the `regex` crate](https://docs.rs/regex/1/regex/#syntax).
114
115There are two things that are not supported by the lazy DFAs in this module:
116
117* Capturing groups. The DFAs (and [`Regex`](regex::Regex)es built on top
118of them) can only find the offsets of an entire match, but cannot resolve
119the offsets of each capturing group. This is because DFAs do not have the
120expressive power necessary. Note that it is okay to build a lazy DFA from an
121NFA that contains capture groups. The capture groups will simply be ignored.
122* Unicode word boundaries. These present particularly difficult challenges for
123DFA construction and would result in an explosion in the number of states.
124One can enable [`dfa::Config::unicode_word_boundary`] though, which provides
125heuristic support for Unicode word boundaries that only works on ASCII text.
126Otherwise, one can use `(?-u:\b)` for an ASCII word boundary, which will work
127on any input.
128
129There are no plans to lift either of these limitations.
130
131Note that these restrictions are identical to the restrictions on fully
132compiled DFAs.
133*/
134
135pub use self::{
136 error::{BuildError, CacheError, StartError},
137 id::LazyStateID,
138};
139
140pub mod dfa;
141mod error;
142mod id;
143pub mod regex;
144mod search;
145