implementation.rs - Codebrowser

1	// This module exists only to provide a separate page for the implementation
2	// documentation.
3
4	//! Notes on `sharded-slab`'s implementation and design.
5	//!
6	//! # Design
7	//!
8	//! The sharded slab's design is strongly inspired by the ideas presented by
9	//! Leijen, Zorn, and de Moura in [Mimalloc: Free List Sharding in
10	//! Action][mimalloc]. In this report, the authors present a novel design for a
11	//! memory allocator based on a concept of _free list sharding_.
12	//!
13	//! Memory allocators must keep track of what memory regions are not currently
14	//! allocated ("free") in order to provide them to future allocation requests.
15	//! The term [_free list_][freelist] refers to a technique for performing this
16	//! bookkeeping, where each free block stores a pointer to the next free block,
17	//! forming a linked list. The memory allocator keeps a pointer to the most
18	//! recently freed block, the _head_ of the free list. To allocate more memory,
19	//! the allocator pops from the free list by setting the head pointer to the
20	//! next free block of the current head block, and returning the previous head.
21	//! To deallocate a block, the block is pushed to the free list by setting its
22	//! first word to the current head pointer, and the head pointer is set to point
23	//! to the deallocated block. Most implementations of slab allocators backed by
24	//! arrays or vectors use a similar technique, where pointers are replaced by
25	//! indices into the backing array.
26	//!
27	//! When allocations and deallocations can occur concurrently across threads,
28	//! they must synchronize accesses to the free list; either by putting the
29	//! entire allocator state inside of a lock, or by using atomic operations to
30	//! treat the free list as a lock-free structure (such as a [Treiber stack]). In
31	//! both cases, there is a significant performance cost — even when the free
32	//! list is lock-free, it is likely that a noticeable amount of time will be
33	//! spent in compare-and-swap loops. Ideally, the global synchronzation point
34	//! created by the single global free list could be avoided as much as possible.
35	//!
36	//! The approach presented by Leijen, Zorn, and de Moura is to introduce
37	//! sharding and thus increase the granularity of synchronization significantly.
38	//! In mimalloc, the heap is _sharded_ so that each thread has its own
39	//! thread-local heap. Objects are always allocated from the local heap of the
40	//! thread where the allocation is performed. Because allocations are always
41	//! done from a thread's local heap, they need not be synchronized.
42	//!
43	//! However, since objects can move between threads before being deallocated,
44	//! _deallocations_ may still occur concurrently. Therefore, Leijen et al.
45	//! introduce a concept of _local_ and _global_ free lists. When an object is
46	//! deallocated on the same thread it was originally allocated on, it is placed
47	//! on the local free list; if it is deallocated on another thread, it goes on
48	//! the global free list for the heap of the thread from which it originated. To
49	//! allocate, the local free list is used first; if it is empty, the entire
50	//! global free list is popped onto the local free list. Since the local free
51	//! list is only ever accessed by the thread it belongs to, it does not require
52	//! synchronization at all, and because the global free list is popped from
53	//! infrequently, the cost of synchronization has a reduced impact. A majority
54	//! of allocations can occur without any synchronization at all; and
55	//! deallocations only require synchronization when an object has left its
56	//! parent thread (a relatively uncommon case).
57	//!
58	//! [mimalloc]: https://www.microsoft.com/en-us/research/uploads/prod/2019/06/mimalloc-tr-v1.pdf
59	//! [freelist]: https://en.wikipedia.org/wiki/Free_list
60	//! [Treiber stack]: https://en.wikipedia.org/wiki/Treiber_stack
61	//!
62	//! # Implementation
63	//!
64	//! A slab is represented as an array of [`MAX_THREADS`] _shards_. A shard
65	//! consists of a vector of one or more _pages_ plus associated metadata.
66	//! Finally, a page consists of an array of _slots_, head indices for the local
67	//! and remote free lists.
68	//!
69	//! ```text
70	//! ┌─────────────┐
71	//! │ shard 1 │
72	//! │ │ ┌─────────────┐ ┌────────┐
73	//! │ pages───────┼───▶│ page 1 │ │ │
74	//! ├─────────────┤ ├─────────────┤ ┌────▶│ next──┼─┐
75	//! │ shard 2 │ │ page 2 │ │ ├────────┤ │
76	//! ├─────────────┤ │ │ │ │XXXXXXXX│ │
77	//! │ shard 3 │ │ local_head──┼──┘ ├────────┤ │
78	//! └─────────────┘ │ remote_head─┼──┐ │ │◀┘
79	//! ... ├─────────────┤ │ │ next──┼─┐
80	//! ┌─────────────┐ │ page 3 │ │ ├────────┤ │
81	//! │ shard n │ └─────────────┘ │ │XXXXXXXX│ │
82	//! └─────────────┘ ... │ ├────────┤ │
83	//! ┌─────────────┐ │ │XXXXXXXX│ │
84	//! │ page n │ │ ├────────┤ │
85	//! └─────────────┘ │ │ │◀┘
86	//! └────▶│ next──┼───▶ ...
87	//! ├────────┤
88	//! │XXXXXXXX│
89	//! └────────┘
90	//! ```
91	//!
92	//!
93	//! The size of the first page in a shard is always a power of two, and every
94	//! subsequent page added after the first is twice as large as the page that
95	//! preceeds it.
96	//!
97	//! ```text
98	//!
99	//! pg.
100	//! ┌───┐ ┌─┬─┐
101	//! │ 0 │───▶ │ │
102	//! ├───┤ ├─┼─┼─┬─┐
103	//! │ 1 │───▶ │ │ │ │
104	//! ├───┤ ├─┼─┼─┼─┼─┬─┬─┬─┐
105	//! │ 2 │───▶ │ │ │ │ │ │ │ │
106	//! ├───┤ ├─┼─┼─┼─┼─┼─┼─┼─┼─┬─┬─┬─┬─┬─┬─┬─┐
107	//! │ 3 │───▶ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
108	//! └───┘ └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
109	//! ```
110	//!
111	//! When searching for a free slot, the smallest page is searched first, and if
112	//! it is full, the search proceeds to the next page until either a free slot is
113	//! found or all available pages have been searched. If all available pages have
114	//! been searched and the maximum number of pages has not yet been reached, a
115	//! new page is then allocated.
116	//!
117	//! Since every page is twice as large as the previous page, and all page sizes
118	//! are powers of two, we can determine the page index that contains a given
119	//! address by shifting the address down by the smallest page size and
120	//! looking at how many twos places necessary to represent that number,
121	//! telling us what power of two page size it fits inside of. We can
122	//! determine the number of twos places by counting the number of leading
123	//! zeros (unused twos places) in the number's binary representation, and
124	//! subtracting that count from the total number of bits in a word.
125	//!
126	//! The formula for determining the page number that contains an offset is thus:
127	//!
128	//! ```rust,ignore
129	//! WIDTH - ((offset + INITIAL_PAGE_SIZE) >> INDEX_SHIFT).leading_zeros()
130	//! ```
131	//!
132	//! where `WIDTH` is the number of bits in a `usize`, and `INDEX_SHIFT` is
133	//!
134	//! ```rust,ignore
135	//! INITIAL_PAGE_SIZE.trailing_zeros() + `1`;
136	//! ```
137	//!
138	//! [`MAX_THREADS`]: https://docs.rs/sharded-slab/latest/sharded_slab/trait.Config.html#associatedconstant.MAX_THREADS
139