lib.rs source code [crates/memchr-2.7.1/src/lib.rs]

1	/!*
2	This library provides heavily optimized routines for string search primitives.
3
4	# Overview
5
6	This section gives a brief high level overview of what this crate offers.
7
8	* The top-level module provides routines for searching for 1, 2 or 3 bytes
9	in the forward or reverse direction. When searching for more than one byte,
10	positions are considered a match if the byte at that position matches any
11	of the bytes.
12	* The [`memmem`] sub-module provides forward and reverse substring search
13	routines.
14
15	In all such cases, routines operate on `&[u8]` without regard to encoding. This
16	is exactly what you want when searching either UTF-8 or arbitrary bytes.
17
18	# Example: using `memchr`
19
20	This example shows how to use `memchr` to find the first occurrence of `z` in
21	a haystack:
22
23	```
24	use memchr::memchr;
25
26	let haystack = b"foo bar baz quuz";
27	assert_eq!(Some(`10`), memchr(b'z', haystack));
28	```
29
30	# Example: matching one of three possible bytes
31
32	This examples shows how to use `memrchr3` to find occurrences of `a`, `b` or
33	`c`, starting at the end of the haystack.
34
35	```
36	use memchr::memchr3_iter;
37
38	let haystack = b"xyzaxyzbxyzc";
39
40	let mut it = memchr3_iter(b'a', b'b', b'c', haystack).rev();
41	assert_eq!(Some(`11`), it.next());
42	assert_eq!(Some(`7`), it.next());
43	assert_eq!(Some(`3`), it.next());
44	assert_eq!(None, it.next());
45	```
46
47	# Example: iterating over substring matches
48
49	This example shows how to use the [`memmem`] sub-module to find occurrences of
50	a substring in a haystack.
51
52	```
53	use memchr::memmem;
54
55	let haystack = b"foo bar foo baz foo";
56
57	let mut it = memmem::find_iter(haystack, "foo");
58	assert_eq!(Some(`0`), it.next());
59	assert_eq!(Some(`8`), it.next());
60	assert_eq!(Some(`16`), it.next());
61	assert_eq!(None, it.next());
62	```
63
64	# Example: repeating a search for the same needle
65
66	It may be possible for the overhead of constructing a substring searcher to be
67	measurable in some workloads. In cases where the same needle is used to search
68	many haystacks, it is possible to do construction once and thus to avoid it for
69	subsequent searches. This can be done with a [`memmem::Finder`]:
70
71	```
72	use memchr::memmem;
73
74	let finder = memmem::Finder::new("foo");
75
76	assert_eq!(Some(`4`), finder.find(b"baz foo quux"));
77	assert_eq!(None, finder.find(b"quux baz bar"));
78	```
79
80	# Why use this crate?
81
82	At first glance, the APIs provided by this crate might seem weird. Why provide
83	a dedicated routine like `memchr` for something that could be implemented
84	clearly and trivially in one line:
85
86	```
87	fn memchr(needle: u8, haystack: &[u8]) -> Option<usize> {
88	haystack.iter().position(\|&b\| b == needle)
89	}
90	```
91
92	Or similarly, why does this crate provide substring search routines when Rust's
93	core library already provides them?
94
95	```
96	fn search(haystack: &str, needle: &str) -> Option<usize> {
97	haystack.find(needle)
98	}
99	```
100
101	The primary reason for both of them to exist is performance. When it comes to
102	performance, at a high level at least, there are two primary ways to look at
103	it:
104
105	* Throughput: For this, think about it as, "given some very large haystack
106	and a byte that never occurs in that haystack, how long does it take to
107	search through it and determine that it, in fact, does not occur?"
108	* Latency: For this, think about it as, "given a tiny haystack---just a
109	few bytes---how long does it take to determine if a byte is in it?"
110
111	The `memchr` routine in this crate has _slightly_ worse latency than the
112	solution presented above, however, its throughput can easily be over an
113	order of magnitude faster. This is a good general purpose trade off to make.
114	You rarely lose, but often gain big.
115
116	NOTE: The name `memchr` comes from the corresponding routine in `libc`. A
117	key advantage of using this library is that its performance is not tied to its
118	quality of implementation in the `libc` you happen to be using, which can vary
119	greatly from platform to platform.
120
121	But what about substring search? This one is a bit more complicated. The
122	primary reason for its existence is still indeed performance, but it's also
123	useful because Rust's core library doesn't actually expose any substring
124	search routine on arbitrary bytes. The only substring search routine that
125	exists works exclusively on valid UTF-8.
126
127	So if you have valid UTF-8, is there a reason to use this over the standard
128	library substring search routine? Yes. This routine is faster on almost every
129	metric, including latency. The natural question then, is why isn't this
130	implementation in the standard library, even if only for searching on UTF-8?
131	The reason is that the implementation details for using SIMD in the standard
132	library haven't quite been worked out yet.
133
134	NOTE: Currently, only `x86_64`, `wasm32` and `aarch64` targets have vector
135	accelerated implementations of `memchr` (and friends) and `memmem`.
136
137	# Crate features
138
139	* std - When enabled (the default), this will permit features specific to
140	the standard library. Currently, the only thing used from the standard library
141	is runtime SIMD CPU feature detection. This means that this feature must be
142	enabled to get AVX2 accelerated routines on `x86_64` targets without enabling
143	the `avx2` feature at compile time, for example. When `std` is not enabled,
144	this crate will still attempt to use SSE2 accelerated routines on `x86_64`. It
145	will also use AVX2 accelerated routines when the `avx2` feature is enabled at
146	compile time. In general, enable this feature if you can.
147	* alloc - When enabled (the default), APIs in this crate requiring some
148	kind of allocation will become available. For example, the
149	[`memmem::Finder::into_owned`](crate::memmem::Finder::into_owned) API and the
150	[`arch::all::shiftor`](crate::arch::all::shiftor) substring search
151	implementation. Otherwise, this crate is designed from the ground up to be
152	usable in core-only contexts, so the `alloc` feature doesn't add much
153	currently. Notably, disabling `std` but enabling `alloc` will not* result*
154	in the use of AVX2 on `x86_64` targets unless the `avx2` feature is enabled
155	at compile time. (With `std` enabled, AVX2 can be used even without the `avx2`
156	feature enabled at compile time by way of runtime CPU feature detection.)
157	* logging - When enabled (disabled by default), the `log` crate is used
158	to emit log messages about what kinds of `memchr` and `memmem` algorithms
159	are used. Namely, both `memchr` and `memmem` have a number of different
160	implementation choices depending on the target and CPU, and the log messages
161	can help show what specific implementations are being used. Generally, this is
162	useful for debugging performance issues.
163	* libc - DEPRECATED. Previously, this enabled the use of the target's
164	`memchr` function from whatever `libc` was linked into the program. This
165	feature is now a no-op because this crate's implementation of `memchr` should
166	now be sufficiently fast on a number of platforms that `libc` should no longer
167	be needed. (This feature is somewhat of a holdover from this crate's origins.
168	Originally, this crate was literally just a safe wrapper function around the
169	`memchr` function from `libc`.)
170	*/
171
172	#![deny(missing_docs)]
173	#![no_std]
174	// It's just not worth trying to squash all dead code warnings. Pretty
175	// unfortunate IMO. Not really sure how to fix this other than to either
176	// live with it or sprinkle a whole mess of `cfg` annotations everywhere.
177	#![cfg_attr(
178	not(any(
179	all(target_arch = "x86_64", target_feature = "sse2"),
180	target_arch = "wasm32",
181	target_arch = "aarch64",
182	)),
183	allow(dead_code)
184	)]
185	// Same deal for miri.
186	#![cfg_attr(miri, allow(dead_code, unused_macros))]
187
188	// Supporting 8-bit (or others) would be fine. If you need it, please submit a
189	// bug report at https://github.com/BurntSushi/memchr
190	#[cfg(not(any(
191	target_pointer_width = "16",
192	target_pointer_width = "32",
193	target_pointer_width = "64"
194	)))]
195	compile_error!("memchr currently not supported on non-{16,32,64}");
196
197	#[cfg(any(test, feature = "std"))]
198	extern crate std;
199
200	#[cfg(any(test, feature = "alloc"))]
201	extern crate alloc;
202
203	pub use crate::memchr::{
204	memchr, memchr2, memchr2_iter, memchr3, memchr3_iter, memchr_iter,
205	memrchr, memrchr2, memrchr2_iter, memrchr3, memrchr3_iter, memrchr_iter,
206	Memchr, Memchr2, Memchr3,
207	};
208
209	#[macro_use]
210	mod macros;
211
212	#[cfg(test)]
213	#[macro_use]
214	mod tests;
215
216	pub mod arch;
217	mod cow;
218	mod ext;
219	mod memchr;
220	pub mod memmem;
221	mod vector;
222