X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=blobdiff_plain;f=src%2Fceph%2Fdoc%2Frados%2Foperations%2Ferasure-code-lrc.rst;fp=src%2Fceph%2Fdoc%2Frados%2Foperations%2Ferasure-code-lrc.rst;h=0000000000000000000000000000000000000000;hb=7da45d65be36d36b880cc55c5036e96c24b53f00;hp=447ce23abafc4c8885af754c9f09da62a313efa1;hpb=691462d09d0987b47e112d6ee8740375df3c51b2;p=stor4nfv.git diff --git a/src/ceph/doc/rados/operations/erasure-code-lrc.rst b/src/ceph/doc/rados/operations/erasure-code-lrc.rst deleted file mode 100644 index 447ce23..0000000 --- a/src/ceph/doc/rados/operations/erasure-code-lrc.rst +++ /dev/null @@ -1,371 +0,0 @@ -====================================== -Locally repairable erasure code plugin -====================================== - -With the *jerasure* plugin, when an erasure coded object is stored on -multiple OSDs, recovering from the loss of one OSD requires reading -from all the others. For instance if *jerasure* is configured with -*k=8* and *m=4*, losing one OSD requires reading from the eleven -others to repair. - -The *lrc* erasure code plugin creates local parity chunks to be able -to recover using less OSDs. For instance if *lrc* is configured with -*k=8*, *m=4* and *l=4*, it will create an additional parity chunk for -every four OSDs. When a single OSD is lost, it can be recovered with -only four OSDs instead of eleven. - -Erasure code profile examples -============================= - -Reduce recovery bandwidth between hosts ---------------------------------------- - -Although it is probably not an interesting use case when all hosts are -connected to the same switch, reduced bandwidth usage can actually be -observed.:: - - $ ceph osd erasure-code-profile set LRCprofile \ - plugin=lrc \ - k=4 m=2 l=3 \ - crush-failure-domain=host - $ ceph osd pool create lrcpool 12 12 erasure LRCprofile - - -Reduce recovery bandwidth between racks ---------------------------------------- - -In Firefly the reduced bandwidth will only be observed if the primary -OSD is in the same rack as the lost chunk.:: - - $ ceph osd erasure-code-profile set LRCprofile \ - plugin=lrc \ - k=4 m=2 l=3 \ - crush-locality=rack \ - crush-failure-domain=host - $ ceph osd pool create lrcpool 12 12 erasure LRCprofile - - -Create an lrc profile -===================== - -To create a new lrc erasure code profile:: - - ceph osd erasure-code-profile set {name} \ - plugin=lrc \ - k={data-chunks} \ - m={coding-chunks} \ - l={locality} \ - [crush-root={root}] \ - [crush-locality={bucket-type}] \ - [crush-failure-domain={bucket-type}] \ - [crush-device-class={device-class}] \ - [directory={directory}] \ - [--force] - -Where: - -``k={data chunks}`` - -:Description: Each object is split in **data-chunks** parts, - each stored on a different OSD. - -:Type: Integer -:Required: Yes. -:Example: 4 - -``m={coding-chunks}`` - -:Description: Compute **coding chunks** for each object and store them - on different OSDs. The number of coding chunks is also - the number of OSDs that can be down without losing data. - -:Type: Integer -:Required: Yes. -:Example: 2 - -``l={locality}`` - -:Description: Group the coding and data chunks into sets of size - **locality**. For instance, for **k=4** and **m=2**, - when **locality=3** two groups of three are created. - Each set can be recovered without reading chunks - from another set. - -:Type: Integer -:Required: Yes. -:Example: 3 - -``crush-root={root}`` - -:Description: The name of the crush bucket used for the first step of - the ruleset. For intance **step take default**. - -:Type: String -:Required: No. -:Default: default - -``crush-locality={bucket-type}`` - -:Description: The type of the crush bucket in which each set of chunks - defined by **l** will be stored. For instance, if it is - set to **rack**, each group of **l** chunks will be - placed in a different rack. It is used to create a - ruleset step such as **step choose rack**. If it is not - set, no such grouping is done. - -:Type: String -:Required: No. - -``crush-failure-domain={bucket-type}`` - -:Description: Ensure that no two chunks are in a bucket with the same - failure domain. For instance, if the failure domain is - **host** no two chunks will be stored on the same - host. It is used to create a ruleset step such as **step - chooseleaf host**. - -:Type: String -:Required: No. -:Default: host - -``crush-device-class={device-class}`` - -:Description: Restrict placement to devices of a specific class (e.g., - ``ssd`` or ``hdd``), using the crush device class names - in the CRUSH map. - -:Type: String -:Required: No. -:Default: - -``directory={directory}`` - -:Description: Set the **directory** name from which the erasure code - plugin is loaded. - -:Type: String -:Required: No. -:Default: /usr/lib/ceph/erasure-code - -``--force`` - -:Description: Override an existing profile by the same name. - -:Type: String -:Required: No. - -Low level plugin configuration -============================== - -The sum of **k** and **m** must be a multiple of the **l** parameter. -The low level configuration parameters do not impose such a -restriction and it may be more convienient to use it for specific -purposes. It is for instance possible to define two groups, one with 4 -chunks and another with 3 chunks. It is also possible to recursively -define locality sets, for instance datacenters and racks into -datacenters. The **k/m/l** are implemented by generating a low level -configuration. - -The *lrc* erasure code plugin recursively applies erasure code -techniques so that recovering from the loss of some chunks only -requires a subset of the available chunks, most of the time. - -For instance, when three coding steps are described as:: - - chunk nr 01234567 - step 1 _cDD_cDD - step 2 cDDD____ - step 3 ____cDDD - -where *c* are coding chunks calculated from the data chunks *D*, the -loss of chunk *7* can be recovered with the last four chunks. And the -loss of chunk *2* chunk can be recovered with the first four -chunks. - -Erasure code profile examples using low level configuration -=========================================================== - -Minimal testing ---------------- - -It is strictly equivalent to using the default erasure code profile. The *DD* -implies *K=2*, the *c* implies *M=1* and the *jerasure* plugin is used -by default.:: - - $ ceph osd erasure-code-profile set LRCprofile \ - plugin=lrc \ - mapping=DD_ \ - layers='[ [ "DDc", "" ] ]' - $ ceph osd pool create lrcpool 12 12 erasure LRCprofile - -Reduce recovery bandwidth between hosts ---------------------------------------- - -Although it is probably not an interesting use case when all hosts are -connected to the same switch, reduced bandwidth usage can actually be -observed. It is equivalent to **k=4**, **m=2** and **l=3** although -the layout of the chunks is different:: - - $ ceph osd erasure-code-profile set LRCprofile \ - plugin=lrc \ - mapping=__DD__DD \ - layers='[ - [ "_cDD_cDD", "" ], - [ "cDDD____", "" ], - [ "____cDDD", "" ], - ]' - $ ceph osd pool create lrcpool 12 12 erasure LRCprofile - - -Reduce recovery bandwidth between racks ---------------------------------------- - -In Firefly the reduced bandwidth will only be observed if the primary -OSD is in the same rack as the lost chunk.:: - - $ ceph osd erasure-code-profile set LRCprofile \ - plugin=lrc \ - mapping=__DD__DD \ - layers='[ - [ "_cDD_cDD", "" ], - [ "cDDD____", "" ], - [ "____cDDD", "" ], - ]' \ - crush-steps='[ - [ "choose", "rack", 2 ], - [ "chooseleaf", "host", 4 ], - ]' - $ ceph osd pool create lrcpool 12 12 erasure LRCprofile - -Testing with different Erasure Code backends --------------------------------------------- - -LRC now uses jerasure as the default EC backend. It is possible to -specify the EC backend/algorithm on a per layer basis using the low -level configuration. The second argument in layers='[ [ "DDc", "" ] ]' -is actually an erasure code profile to be used for this level. The -example below specifies the ISA backend with the cauchy technique to -be used in the lrcpool.:: - - $ ceph osd erasure-code-profile set LRCprofile \ - plugin=lrc \ - mapping=DD_ \ - layers='[ [ "DDc", "plugin=isa technique=cauchy" ] ]' - $ ceph osd pool create lrcpool 12 12 erasure LRCprofile - -You could also use a different erasure code profile for for each -layer.:: - - $ ceph osd erasure-code-profile set LRCprofile \ - plugin=lrc \ - mapping=__DD__DD \ - layers='[ - [ "_cDD_cDD", "plugin=isa technique=cauchy" ], - [ "cDDD____", "plugin=isa" ], - [ "____cDDD", "plugin=jerasure" ], - ]' - $ ceph osd pool create lrcpool 12 12 erasure LRCprofile - - - -Erasure coding and decoding algorithm -===================================== - -The steps found in the layers description:: - - chunk nr 01234567 - - step 1 _cDD_cDD - step 2 cDDD____ - step 3 ____cDDD - -are applied in order. For instance, if a 4K object is encoded, it will -first go thru *step 1* and be divided in four 1K chunks (the four -uppercase D). They are stored in the chunks 2, 3, 6 and 7, in -order. From these, two coding chunks are calculated (the two lowercase -c). The coding chunks are stored in the chunks 1 and 5, respectively. - -The *step 2* re-uses the content created by *step 1* in a similar -fashion and stores a single coding chunk *c* at position 0. The last four -chunks, marked with an underscore (*_*) for readability, are ignored. - -The *step 3* stores a single coding chunk *c* at position 4. The three -chunks created by *step 1* are used to compute this coding chunk, -i.e. the coding chunk from *step 1* becomes a data chunk in *step 3*. - -If chunk *2* is lost:: - - chunk nr 01234567 - - step 1 _c D_cDD - step 2 cD D____ - step 3 __ _cDDD - -decoding will attempt to recover it by walking the steps in reverse -order: *step 3* then *step 2* and finally *step 1*. - -The *step 3* knows nothing about chunk *2* (i.e. it is an underscore) -and is skipped. - -The coding chunk from *step 2*, stored in chunk *0*, allows it to -recover the content of chunk *2*. There are no more chunks to recover -and the process stops, without considering *step 1*. - -Recovering chunk *2* requires reading chunks *0, 1, 3* and writing -back chunk *2*. - -If chunk *2, 3, 6* are lost:: - - chunk nr 01234567 - - step 1 _c _c D - step 2 cD __ _ - step 3 __ cD D - -The *step 3* can recover the content of chunk *6*:: - - chunk nr 01234567 - - step 1 _c _cDD - step 2 cD ____ - step 3 __ cDDD - -The *step 2* fails to recover and is skipped because there are two -chunks missing (*2, 3*) and it can only recover from one missing -chunk. - -The coding chunk from *step 1*, stored in chunk *1, 5*, allows it to -recover the content of chunk *2, 3*:: - - chunk nr 01234567 - - step 1 _cDD_cDD - step 2 cDDD____ - step 3 ____cDDD - -Controlling crush placement -=========================== - -The default crush ruleset provides OSDs that are on different hosts. For instance:: - - chunk nr 01234567 - - step 1 _cDD_cDD - step 2 cDDD____ - step 3 ____cDDD - -needs exactly *8* OSDs, one for each chunk. If the hosts are in two -adjacent racks, the first four chunks can be placed in the first rack -and the last four in the second rack. So that recovering from the loss -of a single OSD does not require using bandwidth between the two -racks. - -For instance:: - - crush-steps='[ [ "choose", "rack", 2 ], [ "chooseleaf", "host", 4 ] ]' - -will create a ruleset that will select two crush buckets of type -*rack* and for each of them choose four OSDs, each of them located in -different buckets of type *host*. - -The ruleset can also be manually crafted for finer control.