X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=blobdiff_plain;f=src%2Fceph%2Fdoc%2Fdev%2Fosd_internals%2Fpg_removal.rst;fp=src%2Fceph%2Fdoc%2Fdev%2Fosd_internals%2Fpg_removal.rst;h=d968eccc631d7be48a128895213383ab8047d2a9;hb=812ff6ca9fcd3e629e49d4328905f33eee8ca3f5;hp=0000000000000000000000000000000000000000;hpb=15280273faafb77777eab341909a3f495cf248d9;p=stor4nfv.git diff --git a/src/ceph/doc/dev/osd_internals/pg_removal.rst b/src/ceph/doc/dev/osd_internals/pg_removal.rst new file mode 100644 index 0000000..d968ecc --- /dev/null +++ b/src/ceph/doc/dev/osd_internals/pg_removal.rst @@ -0,0 +1,56 @@ +========== +PG Removal +========== + +See OSD::_remove_pg, OSD::RemoveWQ + +There are two ways for a pg to be removed from an OSD: + + 1. MOSDPGRemove from the primary + 2. OSD::advance_map finds that the pool has been removed + +In either case, our general strategy for removing the pg is to +atomically set the metadata objects (pg->log_oid, pg->biginfo_oid) to +backfill and asynronously remove the pg collections. We do not do +this inline because scanning the collections to remove the objects is +an expensive operation. + +OSDService::deleting_pgs tracks all pgs in the process of being +deleted. Each DeletingState object in deleting_pgs lives while at +least one reference to it remains. Each item in RemoveWQ carries a +reference to the DeletingState for the relevant pg such that +deleting_pgs.lookup(pgid) will return a null ref only if there are no +collections currently being deleted for that pg. + +The DeletingState for a pg also carries information about the status +of the current deletion and allows the deletion to be cancelled. +The possible states are: + + 1. QUEUED: the PG is in the RemoveWQ + 2. CLEARING_DIR: the PG's contents are being removed synchronously + 3. DELETING_DIR: the PG's directories and metadata being queued for removal + 4. DELETED_DIR: the final removal transaction has been queued + 5. CANCELED: the deletion has been canceled + +In 1 and 2, the deletion can be canceled. Each state transition +method (and check_canceled) returns false if deletion has been +canceled and true if the state transition was successful. Similarly, +try_stop_deletion() returns true if it succeeds in canceling the +deletion. Additionally, try_stop_deletion() in the event that it +fails to stop the deletion will not return until the final removal +transaction is queued. This ensures that any operations queued after +that point will be ordered after the pg deletion. + +OSD::_create_lock_pg must handle two cases: + + 1. Either there is no DeletingStateRef for the pg, or it failed to cancel + 2. We succeeded in canceling the deletion. + +In case 1., we proceed as if there were no deletion occurring, except that +we avoid writing to the PG until the deletion finishes. In case 2., we +proceed as in case 1., except that we first mark the PG as backfilling. + +Similarly, OSD::osr_registry ensures that the OpSequencers for those +pgs can be reused for a new pg if created before the old one is fully +removed, ensuring that operations on the new pg are sequenced properly +with respect to operations on the old one.