| 1 | //! This crate implements diffing utilities. It attempts to provide an abstraction |
| 2 | //! interface over different types of diffing algorithms. The design of the |
| 3 | //! library is inspired by pijul's diff library by Pierre-Étienne Meunier and |
| 4 | //! also inherits the patience diff algorithm from there. |
| 5 | //! |
| 6 | //! The API of the crate is split into high and low level functionality. Most |
| 7 | //! of what you probably want to use is available top level. Additionally the |
| 8 | //! following sub modules exist: |
| 9 | //! |
| 10 | //! * [`algorithms`]: This implements the different types of diffing algorithms. |
| 11 | //! It provides both low level access to the algorithms with the minimal |
| 12 | //! trait bounds necessary, as well as a generic interface. |
| 13 | //! * [`udiff`]: Unified diff functionality. |
| 14 | //! * [`utils`]: utilities for common diff related operations. This module |
| 15 | //! provides additional diffing functions for working with text diffs. |
| 16 | //! |
| 17 | //! # Sequence Diffing |
| 18 | //! |
| 19 | //! If you want to diff sequences generally indexable things you can use the |
| 20 | //! [`capture_diff`] and [`capture_diff_slices`] functions. They will directly |
| 21 | //! diff an indexable object or slice and return a vector of [`DiffOp`] objects. |
| 22 | //! |
| 23 | //! ```rust |
| 24 | //! use similar::{Algorithm, capture_diff_slices}; |
| 25 | //! |
| 26 | //! let a = vec![1, 2, 3, 4, 5]; |
| 27 | //! let b = vec![1, 2, 3, 4, 7]; |
| 28 | //! let ops = capture_diff_slices(Algorithm::Myers, &a, &b); |
| 29 | //! ``` |
| 30 | //! |
| 31 | //! # Text Diffing |
| 32 | //! |
| 33 | //! Similar provides helpful utilities for text (and more specifically line) diff |
| 34 | //! operations. The main type you want to work with is [`TextDiff`] which |
| 35 | //! uses the underlying diff algorithms to expose a convenient API to work with |
| 36 | //! texts: |
| 37 | //! |
| 38 | //! ```rust |
| 39 | //! # #[cfg (feature = "text" )] { |
| 40 | //! use similar::{ChangeTag, TextDiff}; |
| 41 | //! |
| 42 | //! let diff = TextDiff::from_lines( |
| 43 | //! "Hello World \nThis is the second line. \nThis is the third." , |
| 44 | //! "Hallo Welt \nThis is the second line. \nThis is life. \nMoar and more" , |
| 45 | //! ); |
| 46 | //! |
| 47 | //! for change in diff.iter_all_changes() { |
| 48 | //! let sign = match change.tag() { |
| 49 | //! ChangeTag::Delete => "-" , |
| 50 | //! ChangeTag::Insert => "+" , |
| 51 | //! ChangeTag::Equal => " " , |
| 52 | //! }; |
| 53 | //! print!("{}{}" , sign, change); |
| 54 | //! } |
| 55 | //! # } |
| 56 | //! ``` |
| 57 | //! |
| 58 | //! ## Trailing Newlines |
| 59 | //! |
| 60 | //! When working with line diffs (and unified diffs in general) there are two |
| 61 | //! "philosophies" to look at lines. One is to diff lines without their newline |
| 62 | //! character, the other is to diff with the newline character. Typically the |
| 63 | //! latter is done because text files do not _have_ to end in a newline character. |
| 64 | //! As a result there is a difference between `foo\n` and `foo` as far as diffs |
| 65 | //! are concerned. |
| 66 | //! |
| 67 | //! In similar this is handled on the [`Change`] or [`InlineChange`] level. If |
| 68 | //! a diff was created via [`TextDiff::from_lines`] the text diffing system is |
| 69 | //! instructed to check if there are missing newlines encountered |
| 70 | //! ([`TextDiff::newline_terminated`] returns true). |
| 71 | //! |
| 72 | //! In any case the [`Change`] object has a convenience method called |
| 73 | //! [`Change::missing_newline`] which returns `true` if the change is missing |
| 74 | //! a trailing newline. Armed with that information the caller knows to handle |
| 75 | //! this by either rendering a virtual newline at that position or to indicate |
| 76 | //! it in different ways. For instance the unified diff code will render the |
| 77 | //! special `\ No newline at end of file` marker. |
| 78 | //! |
| 79 | //! ## Bytes vs Unicode |
| 80 | //! |
| 81 | //! Similar module concerns itself with a looser definition of "text" than you would |
| 82 | //! normally see in Rust. While by default it can only operate on [`str`] types, |
| 83 | //! by enabling the `bytes` feature it gains support for byte slices with some |
| 84 | //! caveats. |
| 85 | //! |
| 86 | //! A lot of text diff functionality assumes that what is being diffed constitutes |
| 87 | //! text, but in the real world it can often be challenging to ensure that this is |
| 88 | //! all valid utf-8. Because of this the crate is built so that most functionality |
| 89 | //! also still works with bytes for as long as they are roughly ASCII compatible. |
| 90 | //! |
| 91 | //! This means you will be successful in creating a unified diff from latin1 |
| 92 | //! encoded bytes but if you try to do the same with EBCDIC encoded bytes you |
| 93 | //! will only get garbage. |
| 94 | //! |
| 95 | //! # Ops vs Changes |
| 96 | //! |
| 97 | //! Because very commonly two compared sequences will largely match this module |
| 98 | //! splits its functionality into two layers: |
| 99 | //! |
| 100 | //! Changes are encoded as [diff operations](crate::DiffOp). These are |
| 101 | //! ranges of the differences by index in the source sequence. Because this |
| 102 | //! can be cumbersome to work with, a separate method [`DiffOp::iter_changes`] |
| 103 | //! (and [`TextDiff::iter_changes`] when working with text diffs) is provided |
| 104 | //! which expands all the changes on an item by item level encoded in an operation. |
| 105 | //! |
| 106 | //! As the [`TextDiff::grouped_ops`] method can isolate clusters of changes |
| 107 | //! this even works for very long files if paired with this method. |
| 108 | //! |
| 109 | //! # Deadlines and Performance |
| 110 | //! |
| 111 | //! For large and very distinct inputs the algorithms as implemented can take |
| 112 | //! a very, very long time to execute. Too long to make sense in practice. |
| 113 | //! To work around this issue all diffing algorithms also provide a version |
| 114 | //! that accepts a deadline which is the point in time as defined by an |
| 115 | //! [`Instant`](std::time::Instant) after which the algorithm should give up. |
| 116 | //! What giving up means depends on the algorithm. For instance due to the |
| 117 | //! recursive, divide and conquer nature of Myer's diff you will still get a |
| 118 | //! pretty decent diff in many cases when a deadline is reached. Whereas on the |
| 119 | //! other hand the LCS diff is unlikely to give any decent results in such a |
| 120 | //! situation. |
| 121 | //! |
| 122 | //! The [`TextDiff`] type also lets you configure a deadline and/or timeout |
| 123 | //! when performing a text diff. |
| 124 | //! |
| 125 | //! # Feature Flags |
| 126 | //! |
| 127 | //! The crate by default does not have any dependencies however for some use |
| 128 | //! cases it's useful to pull in extra functionality. Likewise you can turn |
| 129 | //! off some functionality. |
| 130 | //! |
| 131 | //! * `text`: this feature is enabled by default and enables the text based |
| 132 | //! diffing types such as [`TextDiff`]. |
| 133 | //! If the crate is used without default features it's removed. |
| 134 | //! * `unicode`: when this feature is enabled the text diffing functionality |
| 135 | //! gains the ability to diff on a grapheme instead of character level. This |
| 136 | //! is particularly useful when working with text containing emojis. This |
| 137 | //! pulls in some relatively complex dependencies for working with the unicode |
| 138 | //! database. |
| 139 | //! * `bytes`: this feature adds support for working with byte slices in text |
| 140 | //! APIs in addition to unicode strings. This pulls in the |
| 141 | //! [`bstr`] dependency. |
| 142 | //! * `inline`: this feature gives access to additional functionality of the |
| 143 | //! text diffing to provide inline information about which values changed |
| 144 | //! in a line diff. This currently also enables the `unicode` feature. |
| 145 | //! * `serde`: this feature enables serialization to some types in this |
| 146 | //! crate. For enums without payload deserialization is then also supported. |
| 147 | #![warn (missing_docs)] |
| 148 | pub mod algorithms; |
| 149 | pub mod iter; |
| 150 | #[cfg (feature = "text" )] |
| 151 | pub mod udiff; |
| 152 | #[cfg (feature = "text" )] |
| 153 | pub mod utils; |
| 154 | |
| 155 | mod common; |
| 156 | #[cfg (feature = "text" )] |
| 157 | mod text; |
| 158 | mod types; |
| 159 | |
| 160 | pub use self::common::*; |
| 161 | #[cfg (feature = "text" )] |
| 162 | pub use self::text::*; |
| 163 | pub use self::types::*; |
| 164 | |