4 This is a revision of the legacy Ceph on-wire protocol that was
5 implemented by the SimpleMessenger. It addresses performance and
11 * *client* (C): the party initiating a (TCP) connection
12 * *server* (S): the party accepting a (TCP) connection
13 * *connection*: an instance of a (TCP) connection between two processes.
14 * *entity*: a ceph entity instantiation, e.g. 'osd.0'. each entity
15 has one or more unique entity_addr_t's by virtue of the 'nonce'
16 field, which is typically a pid or random value.
17 * *stream*: an exchange, passed over a connection, between two unique
18 entities. in the future multiple entities may coexist within the
20 * *session*: a stateful session between two entities in which message
21 exchange is ordered and lossless. A session might span multiple
22 connections (and streams) if there is an interruption (TCP connection
24 * *frame*: a discrete message sent between the peers. Each frame
25 consists of a tag (type code), stream id, payload, and (if signing
26 or encryption is enabled) some other fields. See below for the
28 * *stream id*: a 32-bit value that uniquely identifies a stream within
29 a given connection. the stream id implicitly instantiated when the send
30 sends a frame using that id.
31 * *tag*: a single-byte type code associated with a frame. The tag
32 determines the structure of the payload.
37 A connection has two distinct phases:
40 #. frame exchange for one or more strams
42 A stream has three distinct phases:
45 #. message flow handshake
51 Both the client and server, upon connecting, send a banner::
53 "ceph %x %x\n", protocol_features_suppored, protocol_features_required
55 The protocol features are a new, distinct namespace. Initially no
56 features are defined or required, so this will be "ceph 0 0\n".
58 If the remote party advertises required features we don't support, we
64 All further data sent or received is contained by a frame. Each frame has
71 [payload padding -- only present after stream auth phase]
72 [signature -- only present after stream auth phase]
74 * frame_len includes everything after the frame_len le32 up to the end of the
75 frame (all payloads, signatures, and padding).
77 * The payload format and length is determined by the tag.
79 * The signature portion is only present in a given stream if the
80 authentication phase has completed (TAG_AUTH_DONE has been sent) and
81 signatures are enabled.
87 * TAG_AUTH_METHODS (server only): list authentication methods (none, cephx, ...)::
90 __le32 methods[num_methods]; // CEPH_AUTH_{NONE, CEPHX}
92 * TAG_AUTH_SET_METHOD (client only): set auth method for this connection::
96 - The selected auth method determines the sig_size and block_size in any
97 subsequent messages (TAG_AUTH_DONE and non-auth messages).
99 * TAG_AUTH_BAD_METHOD (server only): reject client-selected auth method::
103 * TAG_AUTH: client->server or server->client auth message::
106 method specific payload
110 confounder (block_size bytes of random garbage)
116 - The client first says AUTH_DONE, and the server replies to
123 The frame format is fixed (see above), but can take three different
124 forms, depending on the AUTH_DONE flags:
126 * If neither FLAG_SIGNED or FLAG_ENCRYPTED is specified, things are simple::
132 payload_padding (out to auth block_size)
134 * If FLAG_SIGNED has been specified::
140 payload_padding (out to auth block_size)
141 signature (sig_size bytes)
143 Here the padding just makes life easier for the signature. It can be
144 random data to add additional confounder. Note also that the
145 signature input must include some state from the session key and the
148 * If FLAG_ENCRYPTED has been specified::
155 payload_padding (out to auth block_size)
158 Note that the padding ensures that the total frame is a multiple of
159 the auth method's block_size so that the message can be sent out over
160 the wire without waiting for the next frame in the stream.
163 Message flow handshake
164 ----------------------
166 In this phase the peers identify each other and (if desired) reconnect to
167 an established session.
169 * TAG_IDENT: identify ourselves::
171 entity_addrvec_t addr(s)
172 __u8 my type (CEPH_ENTITY_TYPE_*)
173 __le32 protocol version
174 __le64 features supported (CEPH_FEATURE_* bitmask)
175 __le64 features required (CEPH_FEATURE_* bitmask)
176 __le64 flags (CEPH_MSG_CONNECT_* bitmask)
177 __le64 cookie (a client identifier, assigned by the sender. unique on the sender.)
179 - client will send first, server will reply with same.
181 * TAG_IDENT_MISSING_FEATURES (server only): complain about a TAG_IDENT with too few features::
183 __le64 features we require that peer didn't advertise
185 * TAG_IDENT_BAD_PROTOCOL (server only): complain about an old protocol version::
187 __le32 protocol_version (our protocol version)
189 * TAG_RECONNECT (client only): reconnect to an established session::
194 __le64 msg_seq (the last msg seq received)
196 * TAG_RECONNECT_OK (server only): acknowledge a reconnect attempt::
198 __le64 msg_seq (last msg seq received)
200 * TAG_RECONNECT_RETRY_SESSION (server only): fail reconnect due to stale connect_seq
202 * TAG_RECONNECT_RETRY_GLOBAL (server only): fail reconnect due to stale global_seq
204 * TAG_RECONNECT_WAIT (server only): fail reconnect due to connect race.
206 - Indicates that the server is already connecting to the client, and
207 that direction should win the race. The client should wait for that
208 connection to complete.
213 Once a session is stablished, we can exchange messages.
215 * TAG_MSG: a message::
222 - The ceph_msg_header is modified in ceph_msg_header2 to include an
223 ack_seq. This avoids the need for a TAG_ACK message most of the time.
225 * TAG_ACK: acknowledge receipt of message(s)::
229 - This is only used for stateful sessions.
231 * TAG_KEEPALIVE2: check for connection liveness::
235 - Time stamp is local to sender.
237 * TAG_KEEPALIVE2_ACK: reply to a keepalive2::
241 - Time stamp is from the TAG_KEEPALIVE2 we are responding to.
243 * TAG_CLOSE: terminate a stream
245 Indicates that a stream should be terminated. This is equivalent to
246 a hangup or reset (i.e., should trigger ms_handle_reset). It isn't
247 strictly necessary or useful if there is only a single stream as we
248 could just disconnect the TCP connection, although one could
249 certainly use it creatively (e.g., reset the stream state and retry
250 an authentication handshake).