X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=blobdiff_plain;f=src%2Fceph%2Fdoc%2Fdev%2Fosd_internals%2Fmap_message_handling.rst;fp=src%2Fceph%2Fdoc%2Fdev%2Fosd_internals%2Fmap_message_handling.rst;h=0000000000000000000000000000000000000000;hb=7da45d65be36d36b880cc55c5036e96c24b53f00;hp=a5013c22f446d7deafa5008199b7ed1cb10974ae;hpb=691462d09d0987b47e112d6ee8740375df3c51b2;p=stor4nfv.git diff --git a/src/ceph/doc/dev/osd_internals/map_message_handling.rst b/src/ceph/doc/dev/osd_internals/map_message_handling.rst deleted file mode 100644 index a5013c2..0000000 --- a/src/ceph/doc/dev/osd_internals/map_message_handling.rst +++ /dev/null @@ -1,131 +0,0 @@ -=========================== -Map and PG Message handling -=========================== - -Overview --------- -The OSD handles routing incoming messages to PGs, creating the PG if necessary -in some cases. - -PG messages generally come in two varieties: - - 1. Peering Messages - 2. Ops/SubOps - -There are several ways in which a message might be dropped or delayed. It is -important that the message delaying does not result in a violation of certain -message ordering requirements on the way to the relevant PG handling logic: - - 1. Ops referring to the same object must not be reordered. - 2. Peering messages must not be reordered. - 3. Subops must not be reordered. - -MOSDMap -------- -MOSDMap messages may come from either monitors or other OSDs. Upon receipt, the -OSD must perform several tasks: - - 1. Persist the new maps to the filestore. - Several PG operations rely on having access to maps dating back to the last - time the PG was clean. - 2. Update and persist the superblock. - 3. Update OSD state related to the current map. - 4. Expose new maps to PG processes via *OSDService*. - 5. Remove PGs due to pool removal. - 6. Queue dummy events to trigger PG map catchup. - -Each PG asynchronously catches up to the currently published map during -process_peering_events before processing the event. As a result, different -PGs may have different views as to the "current" map. - -One consequence of this design is that messages containing submessages from -multiple PGs (MOSDPGInfo, MOSDPGQuery, MOSDPGNotify) must tag each submessage -with the PG's epoch as well as tagging the message as a whole with the OSD's -current published epoch. - -MOSDPGOp/MOSDPGSubOp --------------------- -See OSD::dispatch_op, OSD::handle_op, OSD::handle_sub_op - -MOSDPGOps are used by clients to initiate rados operations. MOSDSubOps are used -between OSDs to coordinate most non peering activities including replicating -MOSDPGOp operations. - -OSD::require_same_or_newer map checks that the current OSDMap is at least -as new as the map epoch indicated on the message. If not, the message is -queued in OSD::waiting_for_osdmap via OSD::wait_for_new_map. Note, this -cannot violate the above conditions since any two messages will be queued -in order of receipt and if a message is received with epoch e0, a later message -from the same source must be at epoch at least e0. Note that two PGs from -the same OSD count for these purposes as different sources for single PG -messages. That is, messages from different PGs may be reordered. - - -MOSDPGOps follow the following process: - - 1. OSD::handle_op: validates permissions and crush mapping. - discard the request if they are not connected and the client cannot get the reply ( See OSD::op_is_discardable ) - See OSDService::handle_misdirected_op - See PG::op_has_sufficient_caps - See OSD::require_same_or_newer_map - 2. OSD::enqueue_op - -MOSDSubOps follow the following process: - - 1. OSD::handle_sub_op checks that sender is an OSD - 2. OSD::enqueue_op - -OSD::enqueue_op calls PG::queue_op which checks waiting_for_map before calling OpWQ::queue which adds the op to the queue of the PG responsible for handling it. - -OSD::dequeue_op is then eventually called, with a lock on the PG. At -this time, the op is passed to PG::do_request, which checks that: - - 1. the PG map is new enough (PG::must_delay_op) - 2. the client requesting the op has enough permissions (PG::op_has_sufficient_caps) - 3. the op is not to be discarded (PG::can_discard_{request,op,subop,scan,backfill}) - 4. the PG is active (PG::flushed boolean) - 5. the op is a CEPH_MSG_OSD_OP and the PG is in PG_STATE_ACTIVE state and not in PG_STATE_REPLAY - -If these conditions are not met, the op is either discarded or queued for later processing. If all conditions are met, the op is processed according to its type: - - 1. CEPH_MSG_OSD_OP is handled by PG::do_op - 2. MSG_OSD_SUBOP is handled by PG::do_sub_op - 3. MSG_OSD_SUBOPREPLY is handled by PG::do_sub_op_reply - 4. MSG_OSD_PG_SCAN is handled by PG::do_scan - 5. MSG_OSD_PG_BACKFILL is handled by PG::do_backfill - -CEPH_MSG_OSD_OP processing --------------------------- - -PrimaryLogPG::do_op handles CEPH_MSG_OSD_OP op and will queue it - - 1. in wait_for_all_missing if it is a CEPH_OSD_OP_PGLS for a designated snapid and some object updates are still missing - 2. in waiting_for_active if the op may write but the scrubber is working - 3. in waiting_for_missing_object if the op requires an object or a snapdir or a specific snap that is still missing - 4. in waiting_for_degraded_object if the op may write an object or a snapdir that is degraded, or if another object blocks it ("blocked_by") - 5. in waiting_for_backfill_pos if the op requires an object that will be available after the backfill is complete - 6. in waiting_for_ack if an ack from another OSD is expected - 7. in waiting_for_ondisk if the op is waiting for a write to complete - -Peering Messages ----------------- -See OSD::handle_pg_(notify|info|log|query) - -Peering messages are tagged with two epochs: - - 1. epoch_sent: map epoch at which the message was sent - 2. query_epoch: map epoch at which the message triggering the message was sent - -These are the same in cases where there was no triggering message. We discard -a peering message if the message's query_epoch if the PG in question has entered -a new epoch (See PG::old_peering_evt, PG::queue_peering_event). Notifies, -infos, notifies, and logs are all handled as PG::RecoveryMachine events and -are wrapped by PG::queue_* by PG::CephPeeringEvts, which include the created -state machine event along with epoch_sent and query_epoch in order to -generically check PG::old_peering_message upon insertion and removal from the -queue. - -Note, notifies, logs, and infos can trigger the creation of a PG. See -OSD::get_or_create_pg. - -