| 1 | /*! |
| 2 | Provides packed multiple substring search, principally for a small number of |
| 3 | patterns. |
| 4 | |
| 5 | This sub-module provides vectorized routines for quickly finding |
| 6 | matches of a small number of patterns. In general, users of this crate |
| 7 | shouldn't need to interface with this module directly, as the primary |
| 8 | [`AhoCorasick`](crate::AhoCorasick) searcher will use these routines |
| 9 | automatically as a prefilter when applicable. However, in some cases, callers |
| 10 | may want to bypass the Aho-Corasick machinery entirely and use this vectorized |
| 11 | searcher directly. |
| 12 | |
| 13 | # Overview |
| 14 | |
| 15 | The primary types in this sub-module are: |
| 16 | |
| 17 | * [`Searcher`] executes the actual search algorithm to report matches in a |
| 18 | haystack. |
| 19 | * [`Builder`] accumulates patterns incrementally and can construct a |
| 20 | `Searcher`. |
| 21 | * [`Config`] permits tuning the searcher, and itself will produce a `Builder` |
| 22 | (which can then be used to build a `Searcher`). Currently, the only tuneable |
| 23 | knob are the match semantics, but this may be expanded in the future. |
| 24 | |
| 25 | # Examples |
| 26 | |
| 27 | This example shows how to create a searcher from an iterator of patterns. |
| 28 | By default, leftmost-first match semantics are used. (See the top-level |
| 29 | [`MatchKind`] type for more details about match semantics, which apply |
| 30 | similarly to packed substring search.) |
| 31 | |
| 32 | ``` |
| 33 | use aho_corasick::{packed::{MatchKind, Searcher}, PatternID}; |
| 34 | |
| 35 | # fn example() -> Option<()> { |
| 36 | let searcher = Searcher::new(["foobar" , "foo" ].iter().cloned())?; |
| 37 | let matches: Vec<PatternID> = searcher |
| 38 | .find_iter("foobar" ) |
| 39 | .map(|mat| mat.pattern()) |
| 40 | .collect(); |
| 41 | assert_eq!(vec![PatternID::ZERO], matches); |
| 42 | # Some(()) } |
| 43 | # if cfg!(all(feature = "std" , any( |
| 44 | # target_arch = "x86_64" , target_arch = "aarch64" , |
| 45 | # ))) { |
| 46 | # example().unwrap() |
| 47 | # } else { |
| 48 | # assert!(example().is_none()); |
| 49 | # } |
| 50 | ``` |
| 51 | |
| 52 | This example shows how to use [`Config`] to change the match semantics to |
| 53 | leftmost-longest: |
| 54 | |
| 55 | ``` |
| 56 | use aho_corasick::{packed::{Config, MatchKind}, PatternID}; |
| 57 | |
| 58 | # fn example() -> Option<()> { |
| 59 | let searcher = Config::new() |
| 60 | .match_kind(MatchKind::LeftmostLongest) |
| 61 | .builder() |
| 62 | .add("foo" ) |
| 63 | .add("foobar" ) |
| 64 | .build()?; |
| 65 | let matches: Vec<PatternID> = searcher |
| 66 | .find_iter("foobar" ) |
| 67 | .map(|mat| mat.pattern()) |
| 68 | .collect(); |
| 69 | assert_eq!(vec![PatternID::must(1)], matches); |
| 70 | # Some(()) } |
| 71 | # if cfg!(all(feature = "std" , any( |
| 72 | # target_arch = "x86_64" , target_arch = "aarch64" , |
| 73 | # ))) { |
| 74 | # example().unwrap() |
| 75 | # } else { |
| 76 | # assert!(example().is_none()); |
| 77 | # } |
| 78 | ``` |
| 79 | |
| 80 | # Packed substring searching |
| 81 | |
| 82 | Packed substring searching refers to the use of SIMD (Single Instruction, |
| 83 | Multiple Data) to accelerate the detection of matches in a haystack. Unlike |
| 84 | conventional algorithms, such as Aho-Corasick, SIMD algorithms for substring |
| 85 | search tend to do better with a small number of patterns, where as Aho-Corasick |
| 86 | generally maintains reasonably consistent performance regardless of the number |
| 87 | of patterns you give it. Because of this, the vectorized searcher in this |
| 88 | sub-module cannot be used as a general purpose searcher, since building the |
| 89 | searcher may fail even when given a small number of patterns. However, in |
| 90 | exchange, when searching for a small number of patterns, searching can be quite |
| 91 | a bit faster than Aho-Corasick (sometimes by an order of magnitude). |
| 92 | |
| 93 | The key take away here is that constructing a searcher from a list of patterns |
| 94 | is a fallible operation with no clear rules for when it will fail. While the |
| 95 | precise conditions under which building a searcher can fail is specifically an |
| 96 | implementation detail, here are some common reasons: |
| 97 | |
| 98 | * Too many patterns were given. Typically, the limit is on the order of 100 or |
| 99 | so, but this limit may fluctuate based on available CPU features. |
| 100 | * The available packed algorithms require CPU features that aren't available. |
| 101 | For example, currently, this crate only provides packed algorithms for |
| 102 | `x86_64` and `aarch64`. Therefore, constructing a packed searcher on any |
| 103 | other target will always fail. |
| 104 | * Zero patterns were given, or one of the patterns given was empty. Packed |
| 105 | searchers require at least one pattern and that all patterns are non-empty. |
| 106 | * Something else about the nature of the patterns (typically based on |
| 107 | heuristics) suggests that a packed searcher would perform very poorly, so |
| 108 | no searcher is built. |
| 109 | */ |
| 110 | |
| 111 | pub use crate::packed::api::{Builder, Config, FindIter, MatchKind, Searcher}; |
| 112 | |
| 113 | mod api; |
| 114 | mod ext; |
| 115 | mod pattern; |
| 116 | mod rabinkarp; |
| 117 | mod teddy; |
| 118 | #[cfg (all(feature = "std" , test))] |
| 119 | mod tests; |
| 120 | mod vector; |
| 121 | |