1 | /*! |
2 | A lower level API for packed multiple substring search, principally for a small |
3 | number of patterns. |
4 | |
5 | This sub-module provides vectorized routines for quickly finding matches of a |
6 | small number of patterns. In general, users of this crate shouldn't need to |
7 | interface with this module directly, as the primary |
8 | [`AhoCorasick`](../struct.AhoCorasick.html) |
9 | searcher will use these routines automatically as a prefilter when applicable. |
10 | However, in some cases, callers may want to bypass the Aho-Corasick machinery |
11 | entirely and use this vectorized searcher directly. |
12 | |
13 | # Overview |
14 | |
15 | The primary types in this sub-module are: |
16 | |
17 | * [`Searcher`](struct.Searcher.html) executes the actual search algorithm to |
18 | report matches in a haystack. |
19 | * [`Builder`](struct.Builder.html) accumulates patterns incrementally and can |
20 | construct a `Searcher`. |
21 | * [`Config`](struct.Config.html) permits tuning the searcher, and itself will |
22 | produce a `Builder` (which can then be used to build a `Searcher`). |
23 | Currently, the only tuneable knob are the match semantics, but this may be |
24 | expanded in the future. |
25 | |
26 | # Examples |
27 | |
28 | This example shows how to create a searcher from an iterator of patterns. |
29 | By default, leftmost-first match semantics are used. (See the top-level |
30 | [`MatchKind`](../enum.MatchKind.html) type for more details about match |
31 | semantics, which apply similarly to packed substring search.) |
32 | |
33 | ``` |
34 | use aho_corasick::packed::{MatchKind, Searcher}; |
35 | |
36 | # fn example() -> Option<()> { |
37 | let searcher = Searcher::new(["foobar" , "foo" ].iter().cloned())?; |
38 | let matches: Vec<usize> = searcher |
39 | .find_iter("foobar" ) |
40 | .map(|mat| mat.pattern()) |
41 | .collect(); |
42 | assert_eq!(vec![0], matches); |
43 | # Some(()) } |
44 | # if cfg!(target_arch = "x86_64" ) { |
45 | # example().unwrap() |
46 | # } else { |
47 | # assert!(example().is_none()); |
48 | # } |
49 | ``` |
50 | |
51 | This example shows how to use [`Config`](struct.Config.html) to change the |
52 | match semantics to leftmost-longest: |
53 | |
54 | ``` |
55 | use aho_corasick::packed::{Config, MatchKind}; |
56 | |
57 | # fn example() -> Option<()> { |
58 | let searcher = Config::new() |
59 | .match_kind(MatchKind::LeftmostLongest) |
60 | .builder() |
61 | .add("foo" ) |
62 | .add("foobar" ) |
63 | .build()?; |
64 | let matches: Vec<usize> = searcher |
65 | .find_iter("foobar" ) |
66 | .map(|mat| mat.pattern()) |
67 | .collect(); |
68 | assert_eq!(vec![1], matches); |
69 | # Some(()) } |
70 | # if cfg!(target_arch = "x86_64" ) { |
71 | # example().unwrap() |
72 | # } else { |
73 | # assert!(example().is_none()); |
74 | # } |
75 | ``` |
76 | |
77 | # Packed substring searching |
78 | |
79 | Packed substring searching refers to the use of SIMD (Single Instruction, |
80 | Multiple Data) to accelerate the detection of matches in a haystack. Unlike |
81 | conventional algorithms, such as Aho-Corasick, SIMD algorithms for substring |
82 | search tend to do better with a small number of patterns, where as Aho-Corasick |
83 | generally maintains reasonably consistent performance regardless of the number |
84 | of patterns you give it. Because of this, the vectorized searcher in this |
85 | sub-module cannot be used as a general purpose searcher, since building the |
86 | searcher may fail. However, in exchange, when searching for a small number of |
87 | patterns, searching can be quite a bit faster than Aho-Corasick (sometimes by |
88 | an order of magnitude). |
89 | |
90 | The key take away here is that constructing a searcher from a list of patterns |
91 | is a fallible operation. While the precise conditions under which building a |
92 | searcher can fail is specifically an implementation detail, here are some |
93 | common reasons: |
94 | |
95 | * Too many patterns were given. Typically, the limit is on the order of 100 or |
96 | so, but this limit may fluctuate based on available CPU features. |
97 | * The available packed algorithms require CPU features that aren't available. |
98 | For example, currently, this crate only provides packed algorithms for |
99 | `x86_64`. Therefore, constructing a packed searcher on any other target |
100 | (e.g., ARM) will always fail. |
101 | * Zero patterns were given, or one of the patterns given was empty. Packed |
102 | searchers require at least one pattern and that all patterns are non-empty. |
103 | * Something else about the nature of the patterns (typically based on |
104 | heuristics) suggests that a packed searcher would perform very poorly, so |
105 | no searcher is built. |
106 | */ |
107 | |
108 | pub use crate::packed::api::{Builder, Config, FindIter, MatchKind, Searcher}; |
109 | |
110 | mod api; |
111 | mod pattern; |
112 | mod rabinkarp; |
113 | mod teddy; |
114 | #[cfg (test)] |
115 | mod tests; |
116 | #[cfg (target_arch = "x86_64" )] |
117 | mod vector; |
118 | |