X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=blobdiff_plain;f=src%2Fceph%2Fdoc%2Fdev%2Frados-client-protocol.rst;fp=src%2Fceph%2Fdoc%2Fdev%2Frados-client-protocol.rst;h=0000000000000000000000000000000000000000;hb=7da45d65be36d36b880cc55c5036e96c24b53f00;hp=b05e7534404b268b9dd1649c007b2e07f73286ef;hpb=691462d09d0987b47e112d6ee8740375df3c51b2;p=stor4nfv.git diff --git a/src/ceph/doc/dev/rados-client-protocol.rst b/src/ceph/doc/dev/rados-client-protocol.rst deleted file mode 100644 index b05e753..0000000 --- a/src/ceph/doc/dev/rados-client-protocol.rst +++ /dev/null @@ -1,117 +0,0 @@ -RADOS client protocol -===================== - -This is very incomplete, but one must start somewhere. - -Basics ------- - -Requests are MOSDOp messages. Replies are MOSDOpReply messages. - -An object request is targetted at an hobject_t, which includes a pool, -hash value, object name, placement key (usually empty), and snapid. - -The hash value is a 32-bit hash value, normally generated by hashing -the object name. The hobject_t can be arbitrarily constructed, -though, with any hash value and name. Note that in the MOSDOp these -components are spread across several fields and not logically -assembled in an actual hobject_t member (mainly historical reasons). - -A request can also target a PG. In this case, the *ps* value matches -a specific PG, the object name is empty, and (hopefully) the ops in -the request are PG ops. - -Either way, the request ultimately targets a PG, either by using the -explicit pgid or by folding the hash value onto the current number of -pgs in the pool. The client sends the request to the primary for the -assocated PG. - -Each request is assigned a unique tid. - -Resends -------- - -If there is a connection drop, the client will resend any outstanding -requets. - -Any time there is a PG mapping change such that the primary changes, -the client is responsible for resending the request. Note that -although there may be an interval change from the OSD's perspective -(triggering PG peering), if the primary doesn't change then the client -need not resend. - -There are a few exceptions to this rule: - - * There is a last_force_op_resend field in the pg_pool_t in the - OSDMap. If this changes, then the clients are forced to resend any - outstanding requests. (This happens when tiering is adjusted, for - example.) - * Some requests are such that they are resent on *any* PG interval - change, as defined by pg_interval_t's is_new_interval() (the same - criteria used by peering in the OSD). - * If the PAUSE OSDMap flag is set and unset. - -Each time a request is sent to the OSD the *attempt* field is incremented. The -first time it is 0, the next 1, etc. - -Backoff -------- - -Ordinarily the OSD will simply queue any requests it can't immeidately -process in memory until such time as it can. This can become -problematic because the OSD limits the total amount of RAM consumed by -incoming messages: if either of the thresholds for the number of -messages or the number of bytes is reached, new messages will not be -read off the network socket, causing backpressure through the network. - -In some cases, though, the OSD knows or expects that a PG or object -will be unavailable for some time and does not want to consume memory -by queuing requests. In these cases it can send a MOSDBackoff message -to the client. - -A backoff request has four properties: - -#. the op code (block, unblock, or ack-block) -#. *id*, a unique id assigned within this session -#. hobject_t begin -#. hobject_t end - -There are two types of backoff: a *PG* backoff will plug all requests -targetting an entire PG at the client, as described by a range of the -hash/hobject_t space [begin,end), while an *object* backoff will plug -all requests targetting a single object (begin == end). - -When the client receives a *block* backoff message, it is now -responsible for *not* sending any requests for hobject_ts described by -the backoff. The backoff remains in effect until the backoff is -cleared (via an 'unblock' message) or the OSD session is closed. A -*ack_block* message is sent back to the OSD immediately to acknowledge -receipt of the backoff. - -When an unblock is -received, it will reference a specific id that the client previous had -blocked. However, the range described by the unblock may be smaller -than the original range, as the PG may have split on the OSD. The unblock -should *only* unblock the range specified in the unblock message. Any requests -that fall within the unblock request range are reexamined and, if no other -installed backoff applies, resent. - -On the OSD, Backoffs are also tracked across ranges of the hash space, and -exist in three states: - -#. new -#. acked -#. deleting - -A newly installed backoff is set to *new* and a message is sent to the -client. When the *ack-block* message is received it is changed to the -*acked* state. The OSD may process other messages from the client that -are covered by the backoff in the *new* state, but once the backoff is -*acked* it should never see a blocked request unless there is a bug. - -If the OSD wants to a remove a backoff in the *acked* state it can -simply remove it and notify the client. If the backoff is in the -*new* state it must move it to the *deleting* state and continue to -use it to discard client requests until the *ack-block* message is -received, at which point it can finally be removed. This is necessary to -preserve the order of operations processed by the OSD.