lib.rs source code [crates/similar/src/lib.rs]

1	//! This crate implements diffing utilities. It attempts to provide an abstraction
2	//! interface over different types of diffing algorithms. The design of the
3	//! library is inspired by pijul's diff library by Pierre-Étienne Meunier and
4	//! also inherits the patience diff algorithm from there.
5	//!
6	//! The API of the crate is split into high and low level functionality. Most
7	//! of what you probably want to use is available top level. Additionally the
8	//! following sub modules exist:
9	//!
10	//! * [`algorithms`]: This implements the different types of diffing algorithms.
11	//! It provides both low level access to the algorithms with the minimal
12	//! trait bounds necessary, as well as a generic interface.
13	//! * [`udiff`]: Unified diff functionality.
14	//! * [`utils`]: utilities for common diff related operations. This module
15	//! provides additional diffing functions for working with text diffs.
16	//!
17	//! # Sequence Diffing
18	//!
19	//! If you want to diff sequences generally indexable things you can use the
20	//! [`capture_diff`] and [`capture_diff_slices`] functions. They will directly
21	//! diff an indexable object or slice and return a vector of [`DiffOp`] objects.
22	//!
23	//! ```rust
24	//! use similar::{Algorithm, capture_diff_slices};
25	//!
26	//! let a = vec![`1`, `2`, `3`, `4`, `5`];
27	//! let b = vec![`1`, `2`, `3`, `4`, `7`];
28	//! let ops = capture_diff_slices(Algorithm::Myers, &a, &b);
29	//! ```
30	//!
31	//! # Text Diffing
32	//!
33	//! Similar provides helpful utilities for text (and more specifically line) diff
34	//! operations. The main type you want to work with is [`TextDiff`] which
35	//! uses the underlying diff algorithms to expose a convenient API to work with
36	//! texts:
37	//!
38	//! ```rust
39	//! # #[cfg(feature = "text")] {
40	//! use similar::{ChangeTag, TextDiff};
41	//!
42	//! let diff = TextDiff::from_lines(
43	//! "Hello World`\n`This is the second line.`\n`This is the third.",
44	//! "Hallo Welt`\n`This is the second line.`\n`This is life.`\n`Moar and more",
45	//! );
46	//!
47	//! for change in diff.iter_all_changes() {
48	//! let sign = match change.tag() {
49	//! ChangeTag::Delete => "-",
50	//! ChangeTag::Insert => "+",
51	//! ChangeTag::Equal => " ",
52	//! };
53	//! print!("{}{}", sign, change);
54	//! }
55	//! # }
56	//! ```
57	//!
58	//! ## Trailing Newlines
59	//!
60	//! When working with line diffs (and unified diffs in general) there are two
61	//! "philosophies" to look at lines. One is to diff lines without their newline
62	//! character, the other is to diff with the newline character. Typically the
63	//! latter is done because text files do not _have_ to end in a newline character.
64	//! As a result there is a difference between `foo\n` and `foo` as far as diffs
65	//! are concerned.
66	//!
67	//! In similar this is handled on the [`Change`] or [`InlineChange`] level. If
68	//! a diff was created via [`TextDiff::from_lines`] the text diffing system is
69	//! instructed to check if there are missing newlines encountered
70	//! ([`TextDiff::newline_terminated`] returns true).
71	//!
72	//! In any case the [`Change`] object has a convenience method called
73	//! [`Change::missing_newline`] which returns `true` if the change is missing
74	//! a trailing newline. Armed with that information the caller knows to handle
75	//! this by either rendering a virtual newline at that position or to indicate
76	//! it in different ways. For instance the unified diff code will render the
77	//! special `\ No newline at end of file` marker.
78	//!
79	//! ## Bytes vs Unicode
80	//!
81	//! Similar module concerns itself with a looser definition of "text" than you would
82	//! normally see in Rust. While by default it can only operate on [`str`] types,
83	//! by enabling the `bytes` feature it gains support for byte slices with some
84	//! caveats.
85	//!
86	//! A lot of text diff functionality assumes that what is being diffed constitutes
87	//! text, but in the real world it can often be challenging to ensure that this is
88	//! all valid utf-8. Because of this the crate is built so that most functionality
89	//! also still works with bytes for as long as they are roughly ASCII compatible.
90	//!
91	//! This means you will be successful in creating a unified diff from latin1
92	//! encoded bytes but if you try to do the same with EBCDIC encoded bytes you
93	//! will only get garbage.
94	//!
95	//! # Ops vs Changes
96	//!
97	//! Because very commonly two compared sequences will largely match this module
98	//! splits its functionality into two layers:
99	//!
100	//! Changes are encoded as [diff operations](crate::DiffOp). These are
101	//! ranges of the differences by index in the source sequence. Because this
102	//! can be cumbersome to work with, a separate method [`DiffOp::iter_changes`]
103	//! (and [`TextDiff::iter_changes`] when working with text diffs) is provided
104	//! which expands all the changes on an item by item level encoded in an operation.
105	//!
106	//! As the [`TextDiff::grouped_ops`] method can isolate clusters of changes
107	//! this even works for very long files if paired with this method.
108	//!
109	//! # Deadlines and Performance
110	//!
111	//! For large and very distinct inputs the algorithms as implemented can take
112	//! a very, very long time to execute. Too long to make sense in practice.
113	//! To work around this issue all diffing algorithms also provide a version
114	//! that accepts a deadline which is the point in time as defined by an
115	//! [`Instant`](std::time::Instant) after which the algorithm should give up.
116	//! What giving up means depends on the algorithm. For instance due to the
117	//! recursive, divide and conquer nature of Myer's diff you will still get a
118	//! pretty decent diff in many cases when a deadline is reached. Whereas on the
119	//! other hand the LCS diff is unlikely to give any decent results in such a
120	//! situation.
121	//!
122	//! The [`TextDiff`] type also lets you configure a deadline and/or timeout
123	//! when performing a text diff.
124	//!
125	//! # Feature Flags
126	//!
127	//! The crate by default does not have any dependencies however for some use
128	//! cases it's useful to pull in extra functionality. Likewise you can turn
129	//! off some functionality.
130	//!
131	//! `text`: this feature is enabled by default and enables the text based*
132	//! diffing types such as [`TextDiff`].
133	//! If the crate is used without default features it's removed.
134	//! `unicode`: when this feature is enabled the text diffing functionality*
135	//! gains the ability to diff on a grapheme instead of character level. This
136	//! is particularly useful when working with text containing emojis. This
137	//! pulls in some relatively complex dependencies for working with the unicode
138	//! database.
139	//! `bytes`: this feature adds support for working with byte slices in text*
140	//! APIs in addition to unicode strings. This pulls in the
141	//! [`bstr`] dependency.
142	//! `inline`: this feature gives access to additional functionality of the*
143	//! text diffing to provide inline information about which values changed
144	//! in a line diff. This currently also enables the `unicode` feature.
145	//! `serde`: this feature enables serialization to some types in this*
146	//! crate. For enums without payload deserialization is then also supported.
147	#![warn(missing_docs)]
148	pub mod algorithms;
149	pub mod iter;
150	#[cfg(feature = "text")]
151	pub mod udiff;
152	#[cfg(feature = "text")]
153	pub mod utils;
154
155	mod common;
156	#[cfg(feature = "text")]
157	mod text;
158	mod types;
159
160	pub use self::common::*;
161	#[cfg(feature = "text")]
162	pub use self::text::*;
163	pub use self::types::*;
164