1 | //! This crate implements diffing utilities. It attempts to provide an abstraction |
2 | //! interface over different types of diffing algorithms. The design of the |
3 | //! library is inspired by pijul's diff library by Pierre-Étienne Meunier and |
4 | //! also inherits the patience diff algorithm from there. |
5 | //! |
6 | //! The API of the crate is split into high and low level functionality. Most |
7 | //! of what you probably want to use is available top level. Additionally the |
8 | //! following sub modules exist: |
9 | //! |
10 | //! * [`algorithms`]: This implements the different types of diffing algorithms. |
11 | //! It provides both low level access to the algorithms with the minimal |
12 | //! trait bounds necessary, as well as a generic interface. |
13 | //! * [`udiff`]: Unified diff functionality. |
14 | //! * [`utils`]: utilities for common diff related operations. This module |
15 | //! provides additional diffing functions for working with text diffs. |
16 | //! |
17 | //! # Sequence Diffing |
18 | //! |
19 | //! If you want to diff sequences generally indexable things you can use the |
20 | //! [`capture_diff`] and [`capture_diff_slices`] functions. They will directly |
21 | //! diff an indexable object or slice and return a vector of [`DiffOp`] objects. |
22 | //! |
23 | //! ```rust |
24 | //! use similar::{Algorithm, capture_diff_slices}; |
25 | //! |
26 | //! let a = vec![1, 2, 3, 4, 5]; |
27 | //! let b = vec![1, 2, 3, 4, 7]; |
28 | //! let ops = capture_diff_slices(Algorithm::Myers, &a, &b); |
29 | //! ``` |
30 | //! |
31 | //! # Text Diffing |
32 | //! |
33 | //! Similar provides helpful utilities for text (and more specifically line) diff |
34 | //! operations. The main type you want to work with is [`TextDiff`] which |
35 | //! uses the underlying diff algorithms to expose a convenient API to work with |
36 | //! texts: |
37 | //! |
38 | //! ```rust |
39 | //! # #[cfg (feature = "text" )] { |
40 | //! use similar::{ChangeTag, TextDiff}; |
41 | //! |
42 | //! let diff = TextDiff::from_lines( |
43 | //! "Hello World \nThis is the second line. \nThis is the third." , |
44 | //! "Hallo Welt \nThis is the second line. \nThis is life. \nMoar and more" , |
45 | //! ); |
46 | //! |
47 | //! for change in diff.iter_all_changes() { |
48 | //! let sign = match change.tag() { |
49 | //! ChangeTag::Delete => "-" , |
50 | //! ChangeTag::Insert => "+" , |
51 | //! ChangeTag::Equal => " " , |
52 | //! }; |
53 | //! print!("{}{}" , sign, change); |
54 | //! } |
55 | //! # } |
56 | //! ``` |
57 | //! |
58 | //! ## Trailing Newlines |
59 | //! |
60 | //! When working with line diffs (and unified diffs in general) there are two |
61 | //! "philosophies" to look at lines. One is to diff lines without their newline |
62 | //! character, the other is to diff with the newline character. Typically the |
63 | //! latter is done because text files do not _have_ to end in a newline character. |
64 | //! As a result there is a difference between `foo\n` and `foo` as far as diffs |
65 | //! are concerned. |
66 | //! |
67 | //! In similar this is handled on the [`Change`] or [`InlineChange`] level. If |
68 | //! a diff was created via [`TextDiff::from_lines`] the text diffing system is |
69 | //! instructed to check if there are missing newlines encountered |
70 | //! ([`TextDiff::newline_terminated`] returns true). |
71 | //! |
72 | //! In any case the [`Change`] object has a convenience method called |
73 | //! [`Change::missing_newline`] which returns `true` if the change is missing |
74 | //! a trailing newline. Armed with that information the caller knows to handle |
75 | //! this by either rendering a virtual newline at that position or to indicate |
76 | //! it in different ways. For instance the unified diff code will render the |
77 | //! special `\ No newline at end of file` marker. |
78 | //! |
79 | //! ## Bytes vs Unicode |
80 | //! |
81 | //! Similar module concerns itself with a looser definition of "text" than you would |
82 | //! normally see in Rust. While by default it can only operate on [`str`] types, |
83 | //! by enabling the `bytes` feature it gains support for byte slices with some |
84 | //! caveats. |
85 | //! |
86 | //! A lot of text diff functionality assumes that what is being diffed constitutes |
87 | //! text, but in the real world it can often be challenging to ensure that this is |
88 | //! all valid utf-8. Because of this the crate is built so that most functionality |
89 | //! also still works with bytes for as long as they are roughly ASCII compatible. |
90 | //! |
91 | //! This means you will be successful in creating a unified diff from latin1 |
92 | //! encoded bytes but if you try to do the same with EBCDIC encoded bytes you |
93 | //! will only get garbage. |
94 | //! |
95 | //! # Ops vs Changes |
96 | //! |
97 | //! Because very commonly two compared sequences will largely match this module |
98 | //! splits its functionality into two layers: |
99 | //! |
100 | //! Changes are encoded as [diff operations](crate::DiffOp). These are |
101 | //! ranges of the differences by index in the source sequence. Because this |
102 | //! can be cumbersome to work with, a separate method [`DiffOp::iter_changes`] |
103 | //! (and [`TextDiff::iter_changes`] when working with text diffs) is provided |
104 | //! which expands all the changes on an item by item level encoded in an operation. |
105 | //! |
106 | //! As the [`TextDiff::grouped_ops`] method can isolate clusters of changes |
107 | //! this even works for very long files if paired with this method. |
108 | //! |
109 | //! # Deadlines and Performance |
110 | //! |
111 | //! For large and very distinct inputs the algorithms as implemented can take |
112 | //! a very, very long time to execute. Too long to make sense in practice. |
113 | //! To work around this issue all diffing algorithms also provide a version |
114 | //! that accepts a deadline which is the point in time as defined by an |
115 | //! [`Instant`](std::time::Instant) after which the algorithm should give up. |
116 | //! What giving up means depends on the algorithm. For instance due to the |
117 | //! recursive, divide and conquer nature of Myer's diff you will still get a |
118 | //! pretty decent diff in many cases when a deadline is reached. Whereas on the |
119 | //! other hand the LCS diff is unlikely to give any decent results in such a |
120 | //! situation. |
121 | //! |
122 | //! The [`TextDiff`] type also lets you configure a deadline and/or timeout |
123 | //! when performing a text diff. |
124 | //! |
125 | //! # Feature Flags |
126 | //! |
127 | //! The crate by default does not have any dependencies however for some use |
128 | //! cases it's useful to pull in extra functionality. Likewise you can turn |
129 | //! off some functionality. |
130 | //! |
131 | //! * `text`: this feature is enabled by default and enables the text based |
132 | //! diffing types such as [`TextDiff`]. |
133 | //! If the crate is used without default features it's removed. |
134 | //! * `unicode`: when this feature is enabled the text diffing functionality |
135 | //! gains the ability to diff on a grapheme instead of character level. This |
136 | //! is particularly useful when working with text containing emojis. This |
137 | //! pulls in some relatively complex dependencies for working with the unicode |
138 | //! database. |
139 | //! * `bytes`: this feature adds support for working with byte slices in text |
140 | //! APIs in addition to unicode strings. This pulls in the |
141 | //! [`bstr`] dependency. |
142 | //! * `inline`: this feature gives access to additional functionality of the |
143 | //! text diffing to provide inline information about which values changed |
144 | //! in a line diff. This currently also enables the `unicode` feature. |
145 | //! * `serde`: this feature enables serialization to some types in this |
146 | //! crate. For enums without payload deserialization is then also supported. |
147 | #![warn (missing_docs)] |
148 | pub mod algorithms; |
149 | pub mod iter; |
150 | #[cfg (feature = "text" )] |
151 | pub mod udiff; |
152 | #[cfg (feature = "text" )] |
153 | pub mod utils; |
154 | |
155 | mod common; |
156 | #[cfg (feature = "text" )] |
157 | mod text; |
158 | mod types; |
159 | |
160 | pub use self::common::*; |
161 | #[cfg (feature = "text" )] |
162 | pub use self::text::*; |
163 | pub use self::types::*; |
164 | |