lib.rs - Codebrowser

1	/!*
2	This crate exposes a variety of regex engines used by the `regex` crate.
3	It provides a vast, sprawling and "expert" level API to each regex engine.
4	The regex engines provided by this crate focus heavily on finite automata
5	implementations and specifically guarantee worst case `O(m n)` time*
6	complexity for all searches. (Where `m ~ len(regex)` and `n ~ len(haystack)`.)
7
8	The primary goal of this crate is to serve as an implementation detail for the
9	`regex` crate. A secondary goal is to make its internals available for use by
10	others.
11
12	# Table of contents
13
14	* [Should I be using this crate?](#should-i-be-using-this-crate) gives some
15	reasons for and against using this crate.
16	* [Examples](#examples) provides a small selection of things you can do with
17	this crate.
18	* [Available regex engines](#available-regex-engines) provides a hyperlinked
19	list of all regex engines in this crate.
20	* [API themes](#api-themes) discusses common elements used throughout this
21	crate.
22	* [Crate features](#crate-features) documents the extensive list of Cargo
23	features available.
24
25	# Should I be using this crate?
26
27	If you find yourself here because you just want to use regexes, then you should
28	first check out whether the [`regex` crate](https://docs.rs/regex) meets
29	your needs. It provides a streamlined and difficult-to-misuse API for regex
30	searching.
31
32	If you're here because there is something specific you want to do that can't
33	be easily done with `regex` crate, then you are perhaps in the right place.
34	It's most likely that the first stop you'll want to make is to explore the
35	[`meta` regex APIs](meta). Namely, the `regex` crate is just a light wrapper
36	over a [`meta::Regex`], so its API will probably be the easiest to transition
37	to. In contrast to the `regex` crate, the `meta::Regex` API supports more
38	search parameters and does multi-pattern searches. However, it isn't quite as
39	ergonomic.
40
41	Otherwise, the following is an inexhaustive list of reasons to use this crate:
42
43	* You want to analyze or use a [Thompson `NFA`](nfa::thompson::NFA) directly.
44	* You want more powerful multi-pattern search than what is provided by
45	`RegexSet` in the `regex` crate. All regex engines in this crate support
46	multi-pattern searches.
47	* You want to use one of the `regex` crate's internal engines directly because
48	of some interesting configuration that isn't possible via the `regex` crate.
49	For example, a [lazy DFA's configuration](hybrid::dfa::Config) exposes a
50	dizzying number of options for controlling its execution.
51	* You want to use the lower level search APIs. For example, both the [lazy
52	DFA](hybrid::dfa) and [fully compiled DFAs](dfa) support searching by exploring
53	the automaton one state at a time. This might be useful, for example, for
54	stream searches or searches of strings stored in non-contiguous in memory.
55	* You want to build a fully compiled DFA and then [use zero-copy
56	deserialization](dfa::dense::DFA::from_bytes) to load it into memory and use
57	it for searching. This use case is supported in core-only no-std/no-alloc
58	environments.
59	* You want to run [anchored searches](Input::anchored) without using the `^`
60	anchor in your regex pattern.
61	* You need to work-around contention issues with
62	sharing a regex across multiple threads. The
63	[`meta::Regex::search_with`](meta::Regex::search_with) API permits bypassing
64	any kind of synchronization at all by requiring the caller to provide the
65	mutable scratch spaced needed during a search.
66	* You want to build your own regex engine on top of the `regex` crate's
67	infrastructure.
68
69	# Examples
70
71	This section tries to identify a few interesting things you can do with this
72	crate and demonstrates them.
73
74	### Multi-pattern searches with capture groups
75
76	One of the more frustrating limitations of `RegexSet` in the `regex` crate
77	(at the time of writing) is that it doesn't report match positions. With this
78	crate, multi-pattern support was intentionally designed in from the beginning,
79	which means it works in all regex engines and even for capture groups as well.
80
81	This example shows how to search for matches of multiple regexes, where each
82	regex uses the same capture group names to parse different key-value formats.
83
84	```
85	use regex_automata::{meta::Regex, PatternID};
86
87	let re = Regex::new_many(&[
88	r#"(?m)^(?<key>[[:word:]]+)=(?<val>[[:word:]]+)$"#,
89	r#"(?m)^(?<key>[[:word:]]+)="(?<val>[^"]+)"$"#,
90	r#"(?m)^(?<key>[[:word:]]+)='(?<val>[^']+)'$"#,
91	r#"(?m)^(?<key>[[:word:]]+):\s*(?<val>[[:word:]]+)$"#,
92	])?;
93	let hay = r#"
94	best_album="Blow Your Face Out"
95	best_quote='"then as it was, then again it will be"'
96	best_year=1973
97	best_simpsons_episode: HOMR
98	"#;
99	let mut kvs = vec![];
100	for caps in re.captures_iter(hay) {
101	// N.B. One could use capture indices '1' and '2' here
102	// as well. Capture indices are local to each pattern.
103	// (Just like names are.)
104	let key = &hay[caps.get_group_by_name("key").unwrap()];
105	let val = &hay[caps.get_group_by_name("val").unwrap()];
106	kvs.push((key, val));
107	}
108	assert_eq!(kvs, vec![
109	("best_album", "Blow Your Face Out"),
110	("best_quote", "`\"`then as it was, then again it will be`\"`"),
111	("best_year", "1973"),
112	("best_simpsons_episode", "HOMR"),
113	]);
114
115	# Ok::<(), Box<dyn std::error::Error>>(())
116	```
117
118	### Build a full DFA and walk it manually
119
120	One of the regex engines in this crate is a fully compiled DFA. It takes worst
121	case exponential time to build, but once built, it can be easily explored and
122	used for searches. Here's a simple example that uses its lower level APIs to
123	implement a simple anchored search by hand.
124
125	```
126	use regex_automata::{dfa::{Automaton, dense}, Input};
127
128	let dfa = dense::DFA::new(r"(?-u)\b[A-Z]\w+z\b")?;
129	let haystack = "Quartz";
130
131	// The start state is determined by inspecting the position and the
132	// initial bytes of the haystack.
133	let mut state = dfa.start_state_forward(&Input::new(haystack))?;
134	// Walk all the bytes in the haystack.
135	for &b in haystack.as_bytes().iter() {
136	state = dfa.next_state(state, b);
137	}
138	// DFAs in this crate require an explicit
139	// end-of-input transition if a search reaches
140	// the end of a haystack.
141	state = dfa.next_eoi_state(state);
142	assert!(dfa.is_match_state(state));
143
144	# Ok::<(), Box<dyn std::error::Error>>(())
145	```
146
147	Or do the same with a lazy DFA that avoids exponential worst case compile time,
148	but requires mutable scratch space to lazily build the DFA during the search.
149
150	```
151	use regex_automata::{hybrid::dfa::DFA, Input};
152
153	let dfa = DFA::new(r"(?-u)\b[A-Z]\w+z\b")?;
154	let mut cache = dfa.create_cache();
155	let hay = "Quartz";
156
157	// The start state is determined by inspecting the position and the
158	// initial bytes of the haystack.
159	let mut state = dfa.start_state_forward(&mut cache, &Input::new(hay))?;
160	// Walk all the bytes in the haystack.
161	for &b in hay.as_bytes().iter() {
162	state = dfa.next_state(&mut cache, state, b)?;
163	}
164	// DFAs in this crate require an explicit
165	// end-of-input transition if a search reaches
166	// the end of a haystack.
167	state = dfa.next_eoi_state(&mut cache, state)?;
168	assert!(state.is_match());
169
170	# Ok::<(), Box<dyn std::error::Error>>(())
171	```
172
173	### Find all overlapping matches
174
175	This example shows how to build a DFA and use it to find all possible matches,
176	including overlapping matches. A similar example will work with a lazy DFA as
177	well. This also works with multiple patterns and will report all matches at the
178	same position where multiple patterns match.
179
180	```
181	use regex_automata::{
182	dfa::{dense, Automaton, OverlappingState},
183	Input, MatchKind,
184	};
185
186	let dfa = dense::DFA::builder()
187	.configure(dense::DFA::config().match_kind(MatchKind::All))
188	.build(r"(?-u)\w{3,}")?;
189	let input = Input::new("homer marge bart lisa maggie");
190	let mut state = OverlappingState::start();
191
192	let mut matches = vec![];
193	while let Some(hm) = {
194	dfa.try_search_overlapping_fwd(&input, &mut state)?;
195	state.get_match()
196	} {
197	matches.push(hm.offset());
198	}
199	assert_eq!(matches, vec![
200	`3`, `4`, `5`, // hom, home, homer
201	`9`, `10`, `11`, // mar, marg, marge
202	`15`, `16`, // bar, bart
203	`20`, `21`, // lis, lisa
204	`25`, `26`, `27`, `28`, // mag, magg, maggi, maggie
205	]);
206
207	# Ok::<(), Box<dyn std::error::Error>>(())
208	```
209
210	# Available regex engines
211
212	The following is a complete list of all regex engines provided by this crate,
213	along with a very brief description of it and why you might want to use it.
214
215	* [`dfa::regex::Regex`] is a regex engine that works on top of either
216	[dense](dfa::dense) or [sparse](dfa::sparse) fully compiled DFAs. You might
217	use a DFA if you need the fastest possible regex engine in this crate and can
218	afford the exorbitant memory usage usually required by DFAs. Low level APIs on
219	fully compiled DFAs are provided by the [`Automaton` trait](dfa::Automaton).
220	Fully compiled dense DFAs can handle all regexes except for searching a regex
221	with a Unicode word boundary on non-ASCII haystacks. A fully compiled DFA based
222	regex can only report the start and end of each match.
223	* [`hybrid::regex::Regex`] is a regex engine that works on top of a lazily
224	built DFA. Its performance profile is very similar to that of fully compiled
225	DFAs, but can be slower in some pathological cases. Fully compiled DFAs are
226	also amenable to more optimizations, such as state acceleration, that aren't
227	available in a lazy DFA. You might use this lazy DFA if you can't abide the
228	worst case exponential compile time of a full DFA, but still want the DFA
229	search performance in the vast majority of cases. A lazy DFA based regex can
230	only report the start and end of each match.
231	* [`dfa::onepass::DFA`] is a regex engine that is implemented as a DFA, but
232	can report the matches of each capture group in addition to the start and end
233	of each match. The catch is that it only works on a somewhat small subset of
234	regexes known as "one-pass." You'll want to use this for cases when you need
235	capture group matches and the regex is one-pass since it is likely to be faster
236	than any alternative. A one-pass DFA can handle all types of regexes, but does
237	have some reasonable limits on the number of capture groups it can handle.
238	* [`nfa::thompson::backtrack::BoundedBacktracker`] is a regex engine that uses
239	backtracking, but keeps track of the work it has done to avoid catastrophic
240	backtracking. Like the one-pass DFA, it provides the matches of each capture
241	group. It retains the `O(m n)` worst case time bound. This tends to be slower*
242	than the one-pass DFA regex engine, but faster than the PikeVM. It can handle
243	all types of regexes, but usually only works well with small haystacks and
244	small regexes due to the memory required to avoid redoing work.
245	* [`nfa::thompson::pikevm::PikeVM`] is a regex engine that can handle all
246	regexes, of all sizes and provides capture group matches. It tends to be a tool
247	of last resort because it is also usually the slowest regex engine.
248	* [`meta::Regex`] is the meta regex engine that combines all* of the above*
249	engines into one. The reason for this is that each of the engines above have
250	their own caveats such as, "only handles a subset of regexes" or "is generally
251	slow." The meta regex engine accounts for all of these caveats and composes
252	the engines in a way that attempts to mitigate each engine's weaknesses while
253	emphasizing its strengths. For example, it will attempt to run a lazy DFA even
254	if it might fail. In which case, it will restart the search with a likely
255	slower but more capable regex engine. The meta regex engine is what you should
256	default to. Use one of the above engines directly only if you have a specific
257	reason to.
258
259	# API themes
260
261	While each regex engine has its own APIs and configuration options, there are
262	some general themes followed by all of them.
263
264	### The `Input` abstraction
265
266	Most search routines in this crate accept anything that implements
267	`Into<Input>`. Both `&str` and `&[u8]` haystacks satisfy this constraint, which
268	means that things like `engine.search("foo")` will work as you would expect.
269
270	By virtue of accepting an `Into<Input>` though, callers can provide more than
271	just a haystack. Indeed, the [`Input`] type has more details, but briefly,
272	callers can use it to configure various aspects of the search:
273
274	* The span of the haystack to search via [`Input::span`] or [`Input::range`],
275	which might be a substring of the haystack.
276	* Whether to run an anchored search or not via [`Input::anchored`]. This
277	permits one to require matches to start at the same offset that the search
278	started.
279	* Whether to ask the regex engine to stop as soon as a match is seen via
280	[`Input::earliest`]. This can be used to find the offset of a match as soon
281	as it is known without waiting for the full leftmost-first match to be found.
282	This can also be used to avoid the worst case `O(m n^2)` time complexity*
283	of iteration.
284
285	Some lower level search routines accept an `&Input` for performance reasons.
286	In which case, `&Input::new("haystack")` can be used for a simple search.
287
288	### Error reporting
289
290	Most, but not all, regex engines in this crate can fail to execute a search.
291	When a search fails, callers cannot determine whether or not a match exists.
292	That is, the result is indeterminate.
293
294	Search failure, in all cases in this crate, is represented by a [`MatchError`].
295	Routines that can fail start with the `try_` prefix in their name. For example,
296	[`hybrid::regex::Regex::try_search`] can fail for a number of reasons.
297	Conversely, routines that either can't fail or can panic on failure lack the
298	`try_` prefix. For example, [`hybrid::regex::Regex::find`] will panic in
299	cases where [`hybrid::regex::Regex::try_search`] would return an error, and
300	[`meta::Regex::find`] will never panic. Therefore, callers need to pay close
301	attention to the panicking conditions in the documentation.
302
303	In most cases, the reasons that a search fails are either predictable or
304	configurable, albeit at some additional cost.
305
306	An example of predictable failure is
307	[`BoundedBacktracker::try_search`](nfa::thompson::backtrack::BoundedBacktracker::try_search).
308	Namely, it fails whenever the multiplication of the haystack, the regex and some
309	constant exceeds the
310	[configured visited capacity](nfa::thompson::backtrack::Config::visited_capacity).
311	Callers can predict the failure in terms of haystack length via the
312	[`BoundedBacktracker::max_haystack_len`](nfa::thompson::backtrack::BoundedBacktracker::max_haystack_len)
313	method. While this form of failure is technically avoidable by increasing the
314	visited capacity, it isn't practical to do so for all inputs because the
315	memory usage required for larger haystacks becomes impractically large. So in
316	practice, if one is using the bounded backtracker, you really do have to deal
317	with the failure.
318
319	An example of configurable failure happens when one enables heuristic support
320	for Unicode word boundaries in a DFA. Namely, since the DFAs in this crate
321	(except for the one-pass DFA) do not support Unicode word boundaries on
322	non-ASCII haystacks, building a DFA from an NFA that contains a Unicode word
323	boundary will itself fail. However, one can configure DFAs to still be built in
324	this case by
325	[configuring heuristic support for Unicode word boundaries](hybrid::dfa::Config::unicode_word_boundary).
326	If the NFA the DFA is built from contains a Unicode word boundary, then the
327	DFA will still be built, but special transitions will be added to every state
328	that cause the DFA to fail if any non-ASCII byte is seen. This failure happens
329	at search time and it requires the caller to opt into this.
330
331	There are other ways for regex engines to fail in this crate, but the above
332	two should represent the general theme of failures one can find. Dealing
333	with these failures is, in part, one the responsibilities of the [meta regex
334	engine](meta). Notice, for example, that the meta regex engine exposes an API
335	that never returns an error nor panics. It carefully manages all of the ways
336	in which the regex engines can fail and either avoids the predictable ones
337	entirely (e.g., the bounded backtracker) or reacts to configured failures by
338	falling back to a different engine (e.g., the lazy DFA quitting because it saw
339	a non-ASCII byte).
340
341	### Configuration and Builders
342
343	Most of the regex engines in this crate come with two types to facilitate
344	building the regex engine: a `Config` and a `Builder`. A `Config` is usually
345	specific to that particular regex engine, but other objects such as parsing and
346	NFA compilation have `Config` types too. A `Builder` is the thing responsible
347	for taking inputs (either pattern strings or already-parsed patterns or even
348	NFAs directly) and turning them into an actual regex engine that can be used
349	for searching.
350
351	The main reason why building a regex engine is a bit complicated is because
352	of the desire to permit composition with de-coupled components. For example,
353	you might want to [manually construct a Thompson NFA](nfa::thompson::Builder)
354	and then build a regex engine from it without ever using a regex parser
355	at all. On the other hand, you might also want to build a regex engine directly
356	from the concrete syntax. This demonstrates why regex engine construction is
357	so flexible: it needs to support not just convenient construction, but also
358	construction from parts built elsewhere.
359
360	This is also in turn why there are many different `Config` structs in this
361	crate. Let's look more closely at an example: [`hybrid::regex::Builder`]. It
362	accepts three different `Config` types for configuring construction of a lazy
363	DFA regex:
364
365	* [`hybrid::regex::Builder::syntax`] accepts a
366	[`util::syntax::Config`] for configuring the options found in the
367	[`regex-syntax`](regex_syntax) crate. For example, whether to match
368	case insensitively.
369	* [`hybrid::regex::Builder::thompson`] accepts a [`nfa::thompson::Config`] for
370	configuring construction of a [Thompson NFA](nfa::thompson::NFA). For example,
371	whether to build an NFA that matches the reverse language described by the
372	regex.
373	* [`hybrid::regex::Builder::dfa`] accept a [`hybrid::dfa::Config`] for
374	configuring construction of the pair of underlying lazy DFAs that make up the
375	lazy DFA regex engine. For example, changing the capacity of the cache used to
376	store the transition table.
377
378	The lazy DFA regex engine uses all three of those configuration objects for
379	methods like [`hybrid::regex::Builder::build`], which accepts a pattern
380	string containing the concrete syntax of your regex. It uses the syntax
381	configuration to parse it into an AST and translate it into an HIR. Then the
382	NFA configuration when compiling the HIR into an NFA. And then finally the DFA
383	configuration when lazily determinizing the NFA into a DFA.
384
385	Notice though that the builder also has a
386	[`hybrid::regex::Builder::build_from_dfas`] constructor. This permits callers
387	to build the underlying pair of lazy DFAs themselves (one for the forward
388	searching to find the end of a match and one for the reverse searching to find
389	the start of a match), and then build the regex engine from them. The lazy
390	DFAs, in turn, have their own builder that permits [construction directly from
391	a Thompson NFA](hybrid::dfa::Builder::build_from_nfa). Continuing down the
392	rabbit hole, a Thompson NFA has its own compiler that permits [construction
393	directly from an HIR](nfa::thompson::Compiler::build_from_hir). The lazy DFA
394	regex engine builder lets you follow this rabbit hole all the way down, but
395	also provides convenience routines that do it for you when you don't need
396	precise control over every component.
397
398	The [meta regex engine](meta) is a good example of something that utilizes the
399	full flexibility of these builders. It often needs not only precise control
400	over each component, but also shares them across multiple regex engines.
401	(Most sharing is done by internal reference accounting. For example, an
402	[`NFA`](nfa::thompson::NFA) is reference counted internally which makes cloning
403	cheap.)
404
405	### Size limits
406
407	Unlike the `regex` crate, the `regex-automata` crate specifically does not
408	enable any size limits by default. That means users of this crate need to
409	be quite careful when using untrusted patterns. Namely, because bounded
410	repetitions can grow exponentially by stacking them, it is possible to build a
411	very large internal regex object from just a small pattern string. For example,
412	the NFA built from the pattern `a{10}{10}{10}{10}{10}{10}{10}` is over 240MB.
413
414	There are multiple size limit options in this crate. If one or more size limits
415	are relevant for the object you're building, they will be configurable via
416	methods on a corresponding `Config` type.
417
418	# Crate features
419
420	This crate has a dizzying number of features. The main idea is to be able to
421	control how much stuff you pull in for your specific use case, since the full
422	crate is quite large and can dramatically increase compile times and binary
423	size.
424
425	The most barebones but useful configuration is to disable all default features
426	and enable only `dfa-search`. This will bring in just the DFA deserialization
427	and search routines without any dependency on `std` or `alloc`. This does
428	require generating and serializing a DFA, and then storing it somewhere, but
429	it permits regex searches in freestanding or embedded environments.
430
431	Because there are so many features, they are split into a few groups.
432
433	The default set of features is: `std`, `syntax`, `perf`, `unicode`, `meta`,
434	`nfa`, `dfa` and `hybrid`. Basically, the default is to enable everything
435	except for development related features like `logging`.
436
437	### Ecosystem features
438
439	* std - Enables use of the standard library. In terms of APIs, this usually
440	just means that error types implement the `std::error::Error` trait. Otherwise,
441	`std` sometimes enables the code to be faster, for example, using a `HashMap`
442	instead of a `BTreeMap`. (The `std` feature matters more for dependencies like
443	`aho-corasick` and `memchr`, where `std` is required to enable certain classes
444	of SIMD optimizations.) Enabling `std` automatically enables `alloc`.
445	* alloc - Enables use of the `alloc` library. This is required for most
446	APIs in this crate. The main exception is deserializing and searching with
447	fully compiled DFAs.
448	* logging - Adds a dependency on the `log` crate and makes this crate emit
449	log messages of varying degrees of utility. The log messages are especially
450	useful in trying to understand what the meta regex engine is doing.
451
452	### Performance features
453
454	* perf - Enables all of the below features.
455	* perf-inline - When enabled, `inline(always)` is used in (many) strategic
456	locations to help performance at the expense of longer compile times and
457	increased binary size.
458	* perf-literal - Enables all literal related optimizations.
459	* perf-literal-substring - Enables all single substring literal
460	optimizations. This includes adding a dependency on the `memchr` crate.
461	* perf-literal-multisubstring - Enables all multiple substring literal
462	optimizations. This includes adding a dependency on the `aho-corasick`
463	crate.
464
465	### Unicode features
466
467	* unicode -
468	Enables all Unicode features. This feature is enabled by default, and will
469	always cover all Unicode features, even if more are added in the future.
470	* unicode-age -
471	Provide the data for the
472	[Unicode `Age` property](https://www.unicode.org/reports/tr44/tr44-24.html#Character_Age).
473	This makes it possible to use classes like `\p{Age:6.0}` to refer to all
474	codepoints first introduced in Unicode 6.0
475	* unicode-bool -
476	Provide the data for numerous Unicode boolean properties. The full list
477	is not included here, but contains properties like `Alphabetic`, `Emoji`,
478	`Lowercase`, `Math`, `Uppercase` and `White_Space`.
479	* unicode-case -
480	Provide the data for case insensitive matching using
481	[Unicode's "simple loose matches" specification](https://www.unicode.org/reports/tr18/#Simple_Loose_Matches).
482	* unicode-gencat -
483	Provide the data for
484	[Unicode general categories](https://www.unicode.org/reports/tr44/tr44-24.html#General_Category_Values).
485	This includes, but is not limited to, `Decimal_Number`, `Letter`,
486	`Math_Symbol`, `Number` and `Punctuation`.
487	* unicode-perl -
488	Provide the data for supporting the Unicode-aware Perl character classes,
489	corresponding to `\w`, `\s` and `\d`. This is also necessary for using
490	Unicode-aware word boundary assertions. Note that if this feature is
491	disabled, the `\s` and `\d` character classes are still available if the
492	`unicode-bool` and `unicode-gencat` features are enabled, respectively.
493	* unicode-script -
494	Provide the data for
495	[Unicode scripts and script extensions](https://www.unicode.org/reports/tr24/).
496	This includes, but is not limited to, `Arabic`, `Cyrillic`, `Hebrew`,
497	`Latin` and `Thai`.
498	* unicode-segment -
499	Provide the data necessary to provide the properties used to implement the
500	[Unicode text segmentation algorithms](https://www.unicode.org/reports/tr29/).
501	This enables using classes like `\p{gcb=Extend}`, `\p{wb=Katakana}` and
502	`\p{sb=ATerm}`.
503	* unicode-word-boundary -
504	Enables support for Unicode word boundaries, i.e., `\b`, in regexes. When
505	this and `unicode-perl` are enabled, then data tables from `regex-syntax` are
506	used to implement Unicode word boundaries. However, if `regex-syntax` isn't
507	enabled as a dependency then one can still enable this feature. It will
508	cause `regex-automata` to bundle its own data table that would otherwise be
509	redundant with `regex-syntax`'s table.
510
511	### Regex engine features
512
513	* syntax - Enables a dependency on `regex-syntax`. This makes APIs
514	for building regex engines from pattern strings available. Without the
515	`regex-syntax` dependency, the only way to build a regex engine is generally
516	to deserialize a previously built DFA or to hand assemble an NFA using its
517	[builder API](nfa::thompson::Builder). Once you have an NFA, you can build any
518	of the regex engines in this crate. The `syntax` feature also enables `alloc`.
519	* meta - Enables the meta regex engine. This also enables the `syntax` and
520	`nfa-pikevm` features, as both are the minimal requirements needed. The meta
521	regex engine benefits from enabling any of the other regex engines and will
522	use them automatically when appropriate.
523	* nfa - Enables all NFA related features below.
524	* nfa-thompson - Enables the Thompson NFA APIs. This enables `alloc`.
525	* nfa-pikevm - Enables the PikeVM regex engine. This enables
526	`nfa-thompson`.
527	* nfa-backtrack - Enables the bounded backtracker regex engine. This
528	enables `nfa-thompson`.
529	* dfa - Enables all DFA related features below.
530	* dfa-build - Enables APIs for determinizing DFAs from NFAs. This
531	enables `nfa-thompson` and `dfa-search`.
532	* dfa-search - Enables APIs for searching with DFAs.
533	* dfa-onepass - Enables the one-pass DFA API. This enables
534	`nfa-thompson`.
535	* hybrid - Enables the hybrid NFA/DFA or "lazy DFA" regex engine. This
536	enables `alloc` and `nfa-thompson`.
537
538	*/
539
540	// We are no_std.
541	#![no_std]
542	// All APIs need docs!
543	#![deny(missing_docs)]
544	// Some intra-doc links are broken when certain features are disabled, so we
545	// only bleat about it when most (all?) features are enabled. But when we do,
546	// we block the build. Links need to work.
547	#![cfg_attr(
548	all(
549	feature = "std",
550	feature = "nfa",
551	feature = "dfa",
552	feature = "hybrid"
553	),
554	deny(rustdoc::broken_intra_doc_links)
555	)]
556	// Broken rustdoc links are very easy to come by when you start disabling
557	// features. Namely, features tend to change imports, and imports change what's
558	// available to link to.
559	//
560	// Basically, we just don't support rustdoc for anything other than the maximal
561	// feature configuration. Other configurations will work, they just won't be
562	// perfect.
563	//
564	// So here, we specifically allow them so we don't even get warned about them.
565	#![cfg_attr(
566	not(all(
567	feature = "std",
568	feature = "nfa",
569	feature = "dfa",
570	feature = "hybrid"
571	)),
572	allow(rustdoc::broken_intra_doc_links)
573	)]
574	// Kinda similar, but eliminating all of the dead code and unused import
575	// warnings for every feature combo is a fool's errand. Instead, we just
576	// suppress those, but still let them through in a common configuration when we
577	// build most of everything.
578	//
579	// This does actually suggest that when features are disabled, we are actually
580	// compiling more code than we need to be. And this is perhaps not so great
581	// because disabling features is usually done in order to reduce compile times
582	// by reducing the amount of code one compiles... However, usually, most of the
583	// time this dead code is a relatively small amount from the 'util' module.
584	// But... I confess... There isn't a ton of visibility on this.
585	//
586	// I'm happy to try to address this in a different way, but "let's annotate
587	// every function in 'util' with some non-local combination of features" just
588	// cannot be the way forward.
589	#![cfg_attr(
590	not(all(
591	feature = "std",
592	feature = "nfa",
593	feature = "dfa",
594	feature = "hybrid",
595	feature = "perf-literal-substring",
596	feature = "perf-literal-multisubstring",
597	)),
598	allow(dead_code, unused_imports, unused_variables)
599	)]
600	// We generally want all types to impl Debug.
601	#![warn(missing_debug_implementations)]
602	// No clue why this thing is still unstable because it's pretty amazing. This
603	// adds Cargo feature annotations to items in the rustdoc output. Which is
604	// sadly hugely beneficial for this crate due to the number of features.
605	#![cfg_attr(docsrs, feature(doc_auto_cfg))]
606
607	// I have literally never tested this crate on 16-bit, so it is quite
608	// suspicious to advertise support for it. But... the regex crate, at time
609	// of writing, at least claims to support it by not doing any conditional
610	// compilation based on the target pointer width. So I guess I remain
611	// consistent with that here.
612	//
613	// If you are here because you're on a 16-bit system and you were somehow using
614	// the regex crate previously, please file an issue. Please be prepared to
615	// provide some kind of reproduction or carve out some path to getting 16-bit
616	// working in CI. (Via qemu?)
617	#[cfg(not(any(
618	target_pointer_width = "16",
619	target_pointer_width = "32",
620	target_pointer_width = "64"
621	)))]
622	compile_error!("not supported on non-{16,32,64}, please file an issue");
623
624	#[cfg(any(test, feature = "std"))]
625	extern crate std;
626
627	#[cfg(feature = "alloc")]
628	extern crate alloc;
629
630	#[cfg(doctest)]
631	doc_comment::doctest!("../README.md");
632
633	#[doc(inline)]
634	pub use crate::util::primitives::PatternID;
635	pub use crate::util::search::*;
636
637	#[macro_use]
638	mod macros;
639
640	#[cfg(any(feature = "dfa-search", feature = "dfa-onepass"))]
641	pub mod dfa;
642	#[cfg(feature = "hybrid")]
643	pub mod hybrid;
644	#[cfg(feature = "meta")]
645	pub mod meta;
646	#[cfg(feature = "nfa-thompson")]
647	pub mod nfa;
648	pub mod util;
649

Provided by KDAB