1 | //! Utilities related to FFI bindings. |
2 | //! |
3 | //! This module provides utilities to handle data across non-Rust |
4 | //! interfaces, like other programming languages and the underlying |
5 | //! operating system. It is mainly of use for FFI (Foreign Function |
6 | //! Interface) bindings and code that needs to exchange C-like strings |
7 | //! with other languages. |
8 | //! |
9 | //! # Overview |
10 | //! |
11 | //! Rust represents owned strings with the [`String`] type, and |
12 | //! borrowed slices of strings with the [`str`] primitive. Both are |
13 | //! always in UTF-8 encoding, and may contain nul bytes in the middle, |
14 | //! i.e., if you look at the bytes that make up the string, there may |
15 | //! be a `\0` among them. Both `String` and `str` store their length |
16 | //! explicitly; there are no nul terminators at the end of strings |
17 | //! like in C. |
18 | //! |
19 | //! C strings are different from Rust strings: |
20 | //! |
21 | //! * **Encodings** - Rust strings are UTF-8, but C strings may use |
22 | //! other encodings. If you are using a string from C, you should |
23 | //! check its encoding explicitly, rather than just assuming that it |
24 | //! is UTF-8 like you can do in Rust. |
25 | //! |
26 | //! * **Character size** - C strings may use `char` or `wchar_t`-sized |
27 | //! characters; please **note** that C's `char` is different from Rust's. |
28 | //! The C standard leaves the actual sizes of those types open to |
29 | //! interpretation, but defines different APIs for strings made up of |
30 | //! each character type. Rust strings are always UTF-8, so different |
31 | //! Unicode characters will be encoded in a variable number of bytes |
32 | //! each. The Rust type [`char`] represents a '[Unicode scalar |
33 | //! value]', which is similar to, but not the same as, a '[Unicode |
34 | //! code point]'. |
35 | //! |
36 | //! * **Nul terminators and implicit string lengths** - Often, C |
37 | //! strings are nul-terminated, i.e., they have a `\0` character at the |
38 | //! end. The length of a string buffer is not stored, but has to be |
39 | //! calculated; to compute the length of a string, C code must |
40 | //! manually call a function like `strlen()` for `char`-based strings, |
41 | //! or `wcslen()` for `wchar_t`-based ones. Those functions return |
42 | //! the number of characters in the string excluding the nul |
43 | //! terminator, so the buffer length is really `len+1` characters. |
44 | //! Rust strings don't have a nul terminator; their length is always |
45 | //! stored and does not need to be calculated. While in Rust |
46 | //! accessing a string's length is an *O*(1) operation (because the |
47 | //! length is stored); in C it is an *O*(*n*) operation because the |
48 | //! length needs to be computed by scanning the string for the nul |
49 | //! terminator. |
50 | //! |
51 | //! * **Internal nul characters** - When C strings have a nul |
52 | //! terminator character, this usually means that they cannot have nul |
53 | //! characters in the middle — a nul character would essentially |
54 | //! truncate the string. Rust strings *can* have nul characters in |
55 | //! the middle, because nul does not have to mark the end of the |
56 | //! string in Rust. |
57 | //! |
58 | //! # Representations of non-Rust strings |
59 | //! |
60 | //! [`CString`] and [`CStr`] are useful when you need to transfer |
61 | //! UTF-8 strings to and from languages with a C ABI, like Python. |
62 | //! |
63 | //! * **From Rust to C:** [`CString`] represents an owned, C-friendly |
64 | //! string: it is nul-terminated, and has no internal nul characters. |
65 | //! Rust code can create a [`CString`] out of a normal string (provided |
66 | //! that the string doesn't have nul characters in the middle), and |
67 | //! then use a variety of methods to obtain a raw <code>\*mut [u8]</code> that can |
68 | //! then be passed as an argument to functions which use the C |
69 | //! conventions for strings. |
70 | //! |
71 | //! * **From C to Rust:** [`CStr`] represents a borrowed C string; it |
72 | //! is what you would use to wrap a raw <code>\*const [u8]</code> that you got from |
73 | //! a C function. A [`CStr`] is guaranteed to be a nul-terminated array |
74 | //! of bytes. Once you have a [`CStr`], you can convert it to a Rust |
75 | //! <code>&[str]</code> if it's valid UTF-8, or lossily convert it by adding |
76 | //! replacement characters. |
77 | //! |
78 | //! [`String`]: crate::string::String |
79 | //! [`CStr`]: core::ffi::CStr |
80 | |
81 | #![stable (feature = "alloc_ffi" , since = "1.64.0" )] |
82 | |
83 | #[stable (feature = "alloc_c_string" , since = "1.64.0" )] |
84 | pub use self::c_str::FromVecWithNulError; |
85 | #[stable (feature = "alloc_c_string" , since = "1.64.0" )] |
86 | pub use self::c_str::{CString, IntoStringError, NulError}; |
87 | |
88 | mod c_str; |
89 | |