1 | /*! |
2 | Provides packed multiple substring search, principally for a small number of |
3 | patterns. |
4 | |
5 | This sub-module provides vectorized routines for quickly finding |
6 | matches of a small number of patterns. In general, users of this crate |
7 | shouldn't need to interface with this module directly, as the primary |
8 | [`AhoCorasick`](crate::AhoCorasick) searcher will use these routines |
9 | automatically as a prefilter when applicable. However, in some cases, callers |
10 | may want to bypass the Aho-Corasick machinery entirely and use this vectorized |
11 | searcher directly. |
12 | |
13 | # Overview |
14 | |
15 | The primary types in this sub-module are: |
16 | |
17 | * [`Searcher`] executes the actual search algorithm to report matches in a |
18 | haystack. |
19 | * [`Builder`] accumulates patterns incrementally and can construct a |
20 | `Searcher`. |
21 | * [`Config`] permits tuning the searcher, and itself will produce a `Builder` |
22 | (which can then be used to build a `Searcher`). Currently, the only tuneable |
23 | knob are the match semantics, but this may be expanded in the future. |
24 | |
25 | # Examples |
26 | |
27 | This example shows how to create a searcher from an iterator of patterns. |
28 | By default, leftmost-first match semantics are used. (See the top-level |
29 | [`MatchKind`] type for more details about match semantics, which apply |
30 | similarly to packed substring search.) |
31 | |
32 | ``` |
33 | use aho_corasick::{packed::{MatchKind, Searcher}, PatternID}; |
34 | |
35 | # fn example() -> Option<()> { |
36 | let searcher = Searcher::new(["foobar" , "foo" ].iter().cloned())?; |
37 | let matches: Vec<PatternID> = searcher |
38 | .find_iter("foobar" ) |
39 | .map(|mat| mat.pattern()) |
40 | .collect(); |
41 | assert_eq!(vec![PatternID::ZERO], matches); |
42 | # Some(()) } |
43 | # if cfg!(all(feature = "std" , any( |
44 | # target_arch = "x86_64" , target_arch = "aarch64" , |
45 | # ))) { |
46 | # example().unwrap() |
47 | # } else { |
48 | # assert!(example().is_none()); |
49 | # } |
50 | ``` |
51 | |
52 | This example shows how to use [`Config`] to change the match semantics to |
53 | leftmost-longest: |
54 | |
55 | ``` |
56 | use aho_corasick::{packed::{Config, MatchKind}, PatternID}; |
57 | |
58 | # fn example() -> Option<()> { |
59 | let searcher = Config::new() |
60 | .match_kind(MatchKind::LeftmostLongest) |
61 | .builder() |
62 | .add("foo" ) |
63 | .add("foobar" ) |
64 | .build()?; |
65 | let matches: Vec<PatternID> = searcher |
66 | .find_iter("foobar" ) |
67 | .map(|mat| mat.pattern()) |
68 | .collect(); |
69 | assert_eq!(vec![PatternID::must(1)], matches); |
70 | # Some(()) } |
71 | # if cfg!(all(feature = "std" , any( |
72 | # target_arch = "x86_64" , target_arch = "aarch64" , |
73 | # ))) { |
74 | # example().unwrap() |
75 | # } else { |
76 | # assert!(example().is_none()); |
77 | # } |
78 | ``` |
79 | |
80 | # Packed substring searching |
81 | |
82 | Packed substring searching refers to the use of SIMD (Single Instruction, |
83 | Multiple Data) to accelerate the detection of matches in a haystack. Unlike |
84 | conventional algorithms, such as Aho-Corasick, SIMD algorithms for substring |
85 | search tend to do better with a small number of patterns, where as Aho-Corasick |
86 | generally maintains reasonably consistent performance regardless of the number |
87 | of patterns you give it. Because of this, the vectorized searcher in this |
88 | sub-module cannot be used as a general purpose searcher, since building the |
89 | searcher may fail even when given a small number of patterns. However, in |
90 | exchange, when searching for a small number of patterns, searching can be quite |
91 | a bit faster than Aho-Corasick (sometimes by an order of magnitude). |
92 | |
93 | The key take away here is that constructing a searcher from a list of patterns |
94 | is a fallible operation with no clear rules for when it will fail. While the |
95 | precise conditions under which building a searcher can fail is specifically an |
96 | implementation detail, here are some common reasons: |
97 | |
98 | * Too many patterns were given. Typically, the limit is on the order of 100 or |
99 | so, but this limit may fluctuate based on available CPU features. |
100 | * The available packed algorithms require CPU features that aren't available. |
101 | For example, currently, this crate only provides packed algorithms for |
102 | `x86_64` and `aarch64`. Therefore, constructing a packed searcher on any |
103 | other target will always fail. |
104 | * Zero patterns were given, or one of the patterns given was empty. Packed |
105 | searchers require at least one pattern and that all patterns are non-empty. |
106 | * Something else about the nature of the patterns (typically based on |
107 | heuristics) suggests that a packed searcher would perform very poorly, so |
108 | no searcher is built. |
109 | */ |
110 | |
111 | pub use crate::packed::api::{Builder, Config, FindIter, MatchKind, Searcher}; |
112 | |
113 | mod api; |
114 | mod ext; |
115 | mod pattern; |
116 | mod rabinkarp; |
117 | mod teddy; |
118 | #[cfg (all(feature = "std" , test))] |
119 | mod tests; |
120 | mod vector; |
121 | |