1 | /*! |
2 | This library provides heavily optimized routines for string search primitives. |
3 | |
4 | # Overview |
5 | |
6 | This section gives a brief high level overview of what this crate offers. |
7 | |
8 | * The top-level module provides routines for searching for 1, 2 or 3 bytes |
9 | in the forward or reverse direction. When searching for more than one byte, |
10 | positions are considered a match if the byte at that position matches any |
11 | of the bytes. |
12 | * The [`memmem`] sub-module provides forward and reverse substring search |
13 | routines. |
14 | |
15 | In all such cases, routines operate on `&[u8]` without regard to encoding. This |
16 | is exactly what you want when searching either UTF-8 or arbitrary bytes. |
17 | |
18 | # Example: using `memchr` |
19 | |
20 | This example shows how to use `memchr` to find the first occurrence of `z` in |
21 | a haystack: |
22 | |
23 | ``` |
24 | use memchr::memchr; |
25 | |
26 | let haystack = b"foo bar baz quuz" ; |
27 | assert_eq!(Some(10), memchr(b'z' , haystack)); |
28 | ``` |
29 | |
30 | # Example: matching one of three possible bytes |
31 | |
32 | This examples shows how to use `memrchr3` to find occurrences of `a`, `b` or |
33 | `c`, starting at the end of the haystack. |
34 | |
35 | ``` |
36 | use memchr::memchr3_iter; |
37 | |
38 | let haystack = b"xyzaxyzbxyzc" ; |
39 | |
40 | let mut it = memchr3_iter(b'a' , b'b' , b'c' , haystack).rev(); |
41 | assert_eq!(Some(11), it.next()); |
42 | assert_eq!(Some(7), it.next()); |
43 | assert_eq!(Some(3), it.next()); |
44 | assert_eq!(None, it.next()); |
45 | ``` |
46 | |
47 | # Example: iterating over substring matches |
48 | |
49 | This example shows how to use the [`memmem`] sub-module to find occurrences of |
50 | a substring in a haystack. |
51 | |
52 | ``` |
53 | use memchr::memmem; |
54 | |
55 | let haystack = b"foo bar foo baz foo" ; |
56 | |
57 | let mut it = memmem::find_iter(haystack, "foo" ); |
58 | assert_eq!(Some(0), it.next()); |
59 | assert_eq!(Some(8), it.next()); |
60 | assert_eq!(Some(16), it.next()); |
61 | assert_eq!(None, it.next()); |
62 | ``` |
63 | |
64 | # Example: repeating a search for the same needle |
65 | |
66 | It may be possible for the overhead of constructing a substring searcher to be |
67 | measurable in some workloads. In cases where the same needle is used to search |
68 | many haystacks, it is possible to do construction once and thus to avoid it for |
69 | subsequent searches. This can be done with a [`memmem::Finder`]: |
70 | |
71 | ``` |
72 | use memchr::memmem; |
73 | |
74 | let finder = memmem::Finder::new("foo" ); |
75 | |
76 | assert_eq!(Some(4), finder.find(b"baz foo quux" )); |
77 | assert_eq!(None, finder.find(b"quux baz bar" )); |
78 | ``` |
79 | |
80 | # Why use this crate? |
81 | |
82 | At first glance, the APIs provided by this crate might seem weird. Why provide |
83 | a dedicated routine like `memchr` for something that could be implemented |
84 | clearly and trivially in one line: |
85 | |
86 | ``` |
87 | fn memchr(needle: u8, haystack: &[u8]) -> Option<usize> { |
88 | haystack.iter().position(|&b| b == needle) |
89 | } |
90 | ``` |
91 | |
92 | Or similarly, why does this crate provide substring search routines when Rust's |
93 | core library already provides them? |
94 | |
95 | ``` |
96 | fn search(haystack: &str, needle: &str) -> Option<usize> { |
97 | haystack.find(needle) |
98 | } |
99 | ``` |
100 | |
101 | The primary reason for both of them to exist is performance. When it comes to |
102 | performance, at a high level at least, there are two primary ways to look at |
103 | it: |
104 | |
105 | * **Throughput**: For this, think about it as, "given some very large haystack |
106 | and a byte that never occurs in that haystack, how long does it take to |
107 | search through it and determine that it, in fact, does not occur?" |
108 | * **Latency**: For this, think about it as, "given a tiny haystack---just a |
109 | few bytes---how long does it take to determine if a byte is in it?" |
110 | |
111 | The `memchr` routine in this crate has _slightly_ worse latency than the |
112 | solution presented above, however, its throughput can easily be over an |
113 | order of magnitude faster. This is a good general purpose trade off to make. |
114 | You rarely lose, but often gain big. |
115 | |
116 | **NOTE:** The name `memchr` comes from the corresponding routine in libc. A key |
117 | advantage of using this library is that its performance is not tied to its |
118 | quality of implementation in the libc you happen to be using, which can vary |
119 | greatly from platform to platform. |
120 | |
121 | But what about substring search? This one is a bit more complicated. The |
122 | primary reason for its existence is still indeed performance, but it's also |
123 | useful because Rust's core library doesn't actually expose any substring |
124 | search routine on arbitrary bytes. The only substring search routine that |
125 | exists works exclusively on valid UTF-8. |
126 | |
127 | So if you have valid UTF-8, is there a reason to use this over the standard |
128 | library substring search routine? Yes. This routine is faster on almost every |
129 | metric, including latency. The natural question then, is why isn't this |
130 | implementation in the standard library, even if only for searching on UTF-8? |
131 | The reason is that the implementation details for using SIMD in the standard |
132 | library haven't quite been worked out yet. |
133 | |
134 | **NOTE:** Currently, only `x86_64` targets have highly accelerated |
135 | implementations of substring search. For `memchr`, all targets have |
136 | somewhat-accelerated implementations, while only `x86_64` targets have highly |
137 | accelerated implementations. This limitation is expected to be lifted once the |
138 | standard library exposes a platform independent SIMD API. |
139 | |
140 | # Crate features |
141 | |
142 | * **std** - When enabled (the default), this will permit this crate to use |
143 | features specific to the standard library. Currently, the only thing used |
144 | from the standard library is runtime SIMD CPU feature detection. This means |
145 | that this feature must be enabled to get AVX accelerated routines. When |
146 | `std` is not enabled, this crate will still attempt to use SSE2 accelerated |
147 | routines on `x86_64`. |
148 | * **libc** - When enabled (**not** the default), this library will use your |
149 | platform's libc implementation of `memchr` (and `memrchr` on Linux). This |
150 | can be useful on non-`x86_64` targets where the fallback implementation in |
151 | this crate is not as good as the one found in your libc. All other routines |
152 | (e.g., `memchr[23]` and substring search) unconditionally use the |
153 | implementation in this crate. |
154 | */ |
155 | |
156 | #![deny (missing_docs)] |
157 | #![cfg_attr (not(feature = "std" ), no_std)] |
158 | // It's not worth trying to gate all code on just miri, so turn off relevant |
159 | // dead code warnings. |
160 | #![cfg_attr (miri, allow(dead_code, unused_macros))] |
161 | |
162 | // Supporting 8-bit (or others) would be fine. If you need it, please submit a |
163 | // bug report at https://github.com/BurntSushi/memchr |
164 | #[cfg (not(any( |
165 | target_pointer_width = "16" , |
166 | target_pointer_width = "32" , |
167 | target_pointer_width = "64" |
168 | )))] |
169 | compile_error!("memchr currently not supported on non-{16,32,64}" ); |
170 | |
171 | pub use crate::memchr::{ |
172 | memchr, memchr2, memchr2_iter, memchr3, memchr3_iter, memchr_iter, |
173 | memrchr, memrchr2, memrchr2_iter, memrchr3, memrchr3_iter, memrchr_iter, |
174 | Memchr, Memchr2, Memchr3, |
175 | }; |
176 | |
177 | mod cow; |
178 | mod memchr; |
179 | pub mod memmem; |
180 | #[cfg (test)] |
181 | mod tests; |
182 | |