X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=blobdiff_plain;f=src%2Fceph%2Fdoc%2Frados%2Foperations%2Fpg-concepts.rst;fp=src%2Fceph%2Fdoc%2Frados%2Foperations%2Fpg-concepts.rst;h=636d6bf9a1e5b5b933769e1424488842a198c551;hb=812ff6ca9fcd3e629e49d4328905f33eee8ca3f5;hp=0000000000000000000000000000000000000000;hpb=15280273faafb77777eab341909a3f495cf248d9;p=stor4nfv.git diff --git a/src/ceph/doc/rados/operations/pg-concepts.rst b/src/ceph/doc/rados/operations/pg-concepts.rst new file mode 100644 index 0000000..636d6bf --- /dev/null +++ b/src/ceph/doc/rados/operations/pg-concepts.rst @@ -0,0 +1,102 @@ +========================== + Placement Group Concepts +========================== + +When you execute commands like ``ceph -w``, ``ceph osd dump``, and other +commands related to placement groups, Ceph may return values using some +of the following terms: + +*Peering* + The process of bringing all of the OSDs that store + a Placement Group (PG) into agreement about the state + of all of the objects (and their metadata) in that PG. + Note that agreeing on the state does not mean that + they all have the latest contents. + +*Acting Set* + The ordered list of OSDs who are (or were as of some epoch) + responsible for a particular placement group. + +*Up Set* + The ordered list of OSDs responsible for a particular placement + group for a particular epoch according to CRUSH. Normally this + is the same as the *Acting Set*, except when the *Acting Set* has + been explicitly overridden via ``pg_temp`` in the OSD Map. + +*Current Interval* or *Past Interval* + A sequence of OSD map epochs during which the *Acting Set* and *Up + Set* for particular placement group do not change. + +*Primary* + The member (and by convention first) of the *Acting Set*, + that is responsible for coordination peering, and is + the only OSD that will accept client-initiated + writes to objects in a placement group. + +*Replica* + A non-primary OSD in the *Acting Set* for a placement group + (and who has been recognized as such and *activated* by the primary). + +*Stray* + An OSD that is not a member of the current *Acting Set*, but + has not yet been told that it can delete its copies of a + particular placement group. + +*Recovery* + Ensuring that copies of all of the objects in a placement group + are on all of the OSDs in the *Acting Set*. Once *Peering* has + been performed, the *Primary* can start accepting write operations, + and *Recovery* can proceed in the background. + +*PG Info* + Basic metadata about the placement group's creation epoch, the version + for the most recent write to the placement group, *last epoch started*, + *last epoch clean*, and the beginning of the *current interval*. Any + inter-OSD communication about placement groups includes the *PG Info*, + such that any OSD that knows a placement group exists (or once existed) + also has a lower bound on *last epoch clean* or *last epoch started*. + +*PG Log* + A list of recent updates made to objects in a placement group. + Note that these logs can be truncated after all OSDs + in the *Acting Set* have acknowledged up to a certain + point. + +*Missing Set* + Each OSD notes update log entries and if they imply updates to + the contents of an object, adds that object to a list of needed + updates. This list is called the *Missing Set* for that ````. + +*Authoritative History* + A complete, and fully ordered set of operations that, if + performed, would bring an OSD's copy of a placement group + up to date. + +*Epoch* + A (monotonically increasing) OSD map version number + +*Last Epoch Start* + The last epoch at which all nodes in the *Acting Set* + for a particular placement group agreed on an + *Authoritative History*. At this point, *Peering* is + deemed to have been successful. + +*up_thru* + Before a *Primary* can successfully complete the *Peering* process, + it must inform a monitor that is alive through the current + OSD map *Epoch* by having the monitor set its *up_thru* in the osd + map. This helps *Peering* ignore previous *Acting Sets* for which + *Peering* never completed after certain sequences of failures, such as + the second interval below: + + - *acting set* = [A,B] + - *acting set* = [A] + - *acting set* = [] very shortly after (e.g., simultaneous failure, but staggered detection) + - *acting set* = [B] (B restarts, A does not) + +*Last Epoch Clean* + The last *Epoch* at which all nodes in the *Acting set* + for a particular placement group were completely + up to date (both placement group logs and object contents). + At this point, *recovery* is deemed to have been + completed.