Fix some bugs when testing opensds ansible
[stor4nfv.git] / src / ceph / doc / dev / bluestore.rst
1 ===================
2 BlueStore Internals
3 ===================
4
5
6 Small write strategies
7 ----------------------
8
9 * *U*: Uncompressed write of a complete, new blob.
10
11   - write to new blob
12   - kv commit
13
14 * *P*: Uncompressed partial write to unused region of an existing
15   blob.
16
17   - write to unused chunk(s) of existing blob
18   - kv commit
19
20 * *W*: WAL overwrite: commit intent to overwrite, then overwrite
21   async.  Must be chunk_size = MAX(block_size, csum_block_size)
22   aligned.
23
24   - kv commit
25   - wal overwrite (chunk-aligned) of existing blob
26
27 * *N*: Uncompressed partial write to a new blob.  Initially sparsely
28   utilized.  Future writes will either be *P* or *W*.
29
30   - write into a new (sparse) blob
31   - kv commit
32
33 * *R+W*: Read partial chunk, then to WAL overwrite.
34
35   - read (out to chunk boundaries)
36   - kv commit
37   - wal overwrite (chunk-aligned) of existing blob
38
39 * *C*: Compress data, write to new blob.
40
41   - compress and write to new blob
42   - kv commit
43
44 Possible future modes
45 ---------------------
46
47 * *F*: Fragment lextent space by writing small piece of data into a
48   piecemeal blob (that collects random, noncontiguous bits of data we
49   need to write).
50
51   - write to a piecemeal blob (min_alloc_size or larger, but we use just one block of it)
52   - kv commit
53
54 * *X*: WAL read/modify/write on a single block (like legacy
55   bluestore).  No checksum.
56
57   - kv commit
58   - wal read/modify/write
59
60 Mapping
61 -------
62
63 This very roughly maps the type of write onto what we do when we
64 encounter a given blob.  In practice it's a bit more complicated since there
65 might be several blobs to consider (e.g., we might be able to *W* into one or
66 *P* into another), but it should communicate a rough idea of strategy.
67
68 +--------------------------+--------+--------------+-------------+--------------+---------------+
69 |                          | raw    | raw (cached) | csum (4 KB) | csum (16 KB) | comp (128 KB) |
70 +--------------------------+--------+--------------+-------------+--------------+---------------+
71 | 128+ KB (over)write      | U      | U            | U           | U            | C             |
72 +--------------------------+--------+--------------+-------------+--------------+---------------+
73 | 64 KB (over)write        | U      | U            | U           | U            | U or C        |
74 +--------------------------+--------+--------------+-------------+--------------+---------------+
75 | 4 KB overwrite           | W      | P | W        | P | W       | P | R+W      | P | N (F?)    |
76 +--------------------------+--------+--------------+-------------+--------------+---------------+
77 | 100 byte overwrite       | R+W    | P | W        | P | R+W     | P | R+W      | P | N (F?)    |
78 +--------------------------+--------+--------------+-------------+--------------+---------------+
79 | 100 byte append          | R+W    | P | W        | P | R+W     | P | R+W      | P | N (F?)    |
80 +--------------------------+--------+--------------+-------------+--------------+---------------+
81 +--------------------------+--------+--------------+-------------+--------------+---------------+
82 | 4 KB clone overwrite     | P | N  | P | N        | P | N       | P | N        | N (F?)        |
83 +--------------------------+--------+--------------+-------------+--------------+---------------+
84 | 100 byte clone overwrite | P | N  | P | N        | P | N       | P | N        | N (F?)        |
85 +--------------------------+--------+--------------+-------------+--------------+---------------+