1 | /*! |
2 | Provides packed multiple substring search, principally for a small number of |
3 | patterns. |
4 | |
5 | This sub-module provides vectorized routines for quickly finding |
6 | matches of a small number of patterns. In general, users of this crate |
7 | shouldn't need to interface with this module directly, as the primary |
8 | [`AhoCorasick`](crate::AhoCorasick) searcher will use these routines |
9 | automatically as a prefilter when applicable. However, in some cases, callers |
10 | may want to bypass the Aho-Corasick machinery entirely and use this vectorized |
11 | searcher directly. |
12 | |
13 | # Overview |
14 | |
15 | The primary types in this sub-module are: |
16 | |
17 | * [`Searcher`] executes the actual search algorithm to report matches in a |
18 | haystack. |
19 | * [`Builder`] accumulates patterns incrementally and can construct a |
20 | `Searcher`. |
21 | * [`Config`] permits tuning the searcher, and itself will produce a `Builder` |
22 | (which can then be used to build a `Searcher`). Currently, the only tuneable |
23 | knob are the match semantics, but this may be expanded in the future. |
24 | |
25 | # Examples |
26 | |
27 | This example shows how to create a searcher from an iterator of patterns. |
28 | By default, leftmost-first match semantics are used. (See the top-level |
29 | [`MatchKind`] type for more details about match semantics, which apply |
30 | similarly to packed substring search.) |
31 | |
32 | ``` |
33 | use aho_corasick::{packed::{MatchKind, Searcher}, PatternID}; |
34 | |
35 | # fn example() -> Option<()> { |
36 | let searcher = Searcher::new(["foobar" , "foo" ].iter().cloned())?; |
37 | let matches: Vec<PatternID> = searcher |
38 | .find_iter("foobar" ) |
39 | .map(|mat| mat.pattern()) |
40 | .collect(); |
41 | assert_eq!(vec![PatternID::ZERO], matches); |
42 | # Some(()) } |
43 | # if cfg!(all(feature = "std" , target_arch = "x86_64" )) { |
44 | # example().unwrap() |
45 | # } else { |
46 | # assert!(example().is_none()); |
47 | # } |
48 | ``` |
49 | |
50 | This example shows how to use [`Config`] to change the match semantics to |
51 | leftmost-longest: |
52 | |
53 | ``` |
54 | use aho_corasick::{packed::{Config, MatchKind}, PatternID}; |
55 | |
56 | # fn example() -> Option<()> { |
57 | let searcher = Config::new() |
58 | .match_kind(MatchKind::LeftmostLongest) |
59 | .builder() |
60 | .add("foo" ) |
61 | .add("foobar" ) |
62 | .build()?; |
63 | let matches: Vec<PatternID> = searcher |
64 | .find_iter("foobar" ) |
65 | .map(|mat| mat.pattern()) |
66 | .collect(); |
67 | assert_eq!(vec![PatternID::must(1)], matches); |
68 | # Some(()) } |
69 | # if cfg!(all(feature = "std" , target_arch = "x86_64" )) { |
70 | # example().unwrap() |
71 | # } else { |
72 | # assert!(example().is_none()); |
73 | # } |
74 | ``` |
75 | |
76 | # Packed substring searching |
77 | |
78 | Packed substring searching refers to the use of SIMD (Single Instruction, |
79 | Multiple Data) to accelerate the detection of matches in a haystack. Unlike |
80 | conventional algorithms, such as Aho-Corasick, SIMD algorithms for substring |
81 | search tend to do better with a small number of patterns, where as Aho-Corasick |
82 | generally maintains reasonably consistent performance regardless of the number |
83 | of patterns you give it. Because of this, the vectorized searcher in this |
84 | sub-module cannot be used as a general purpose searcher, since building the |
85 | searcher may fail even when given a small number of patterns. However, in |
86 | exchange, when searching for a small number of patterns, searching can be quite |
87 | a bit faster than Aho-Corasick (sometimes by an order of magnitude). |
88 | |
89 | The key take away here is that constructing a searcher from a list of patterns |
90 | is a fallible operation with no clear rules for when it will fail. While the |
91 | precise conditions under which building a searcher can fail is specifically an |
92 | implementation detail, here are some common reasons: |
93 | |
94 | * Too many patterns were given. Typically, the limit is on the order of 100 or |
95 | so, but this limit may fluctuate based on available CPU features. |
96 | * The available packed algorithms require CPU features that aren't available. |
97 | For example, currently, this crate only provides packed algorithms for |
98 | `x86_64`. Therefore, constructing a packed searcher on any other target |
99 | (e.g., ARM) will always fail. |
100 | * Zero patterns were given, or one of the patterns given was empty. Packed |
101 | searchers require at least one pattern and that all patterns are non-empty. |
102 | * Something else about the nature of the patterns (typically based on |
103 | heuristics) suggests that a packed searcher would perform very poorly, so |
104 | no searcher is built. |
105 | */ |
106 | |
107 | pub use crate::packed::api::{Builder, Config, FindIter, MatchKind, Searcher}; |
108 | |
109 | mod api; |
110 | mod pattern; |
111 | mod rabinkarp; |
112 | mod teddy; |
113 | #[cfg (all(feature = "std" , test))] |
114 | mod tests; |
115 | #[cfg (all(feature = "std" , target_arch = "x86_64" ))] |
116 | mod vector; |
117 | |