1 =====================================
2 Configuring Monitor/OSD Interaction
3 =====================================
7 After you have completed your initial Ceph configuration, you may deploy and run
8 Ceph. When you execute a command such as ``ceph health`` or ``ceph -s``, the
9 :term:`Ceph Monitor` reports on the current state of the :term:`Ceph Storage
10 Cluster`. The Ceph Monitor knows about the Ceph Storage Cluster by requiring
11 reports from each :term:`Ceph OSD Daemon`, and by receiving reports from Ceph
12 OSD Daemons about the status of their neighboring Ceph OSD Daemons. If the Ceph
13 Monitor doesn't receive reports, or if it receives reports of changes in the
14 Ceph Storage Cluster, the Ceph Monitor updates the status of the :term:`Ceph
17 Ceph provides reasonable default settings for Ceph Monitor/Ceph OSD Daemon
18 interaction. However, you may override the defaults. The following sections
19 describe how Ceph Monitors and Ceph OSD Daemons interact for the purposes of
20 monitoring the Ceph Storage Cluster.
22 .. index:: heartbeat interval
27 Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6
28 seconds. You can change the heartbeat interval by adding an ``osd heartbeat
29 interval`` setting under the ``[osd]`` section of your Ceph configuration file,
30 or by setting the value at runtime. If a neighboring Ceph OSD Daemon doesn't
31 show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may
32 consider the neighboring Ceph OSD Daemon ``down`` and report it back to a Ceph
33 Monitor, which will update the Ceph Cluster Map. You may change this grace
34 period by adding an ``osd heartbeat grace`` setting under the ``[mon]``
35 and ``[osd]`` or ``[global]`` section of your Ceph configuration file,
36 or by setting the value at runtime.
39 .. ditaa:: +---------+ +---------+
41 +---------+ +---------+
49 |------------------->|
51 |<-------------------|
60 |------------------->|
71 .. index:: OSD down report
76 By default, two Ceph OSD Daemons from different hosts must report to the Ceph
77 Monitors that another Ceph OSD Daemon is ``down`` before the Ceph Monitors
78 acknowledge that the reported Ceph OSD Daemon is ``down``. But there is chance
79 that all the OSDs reporting the failure are hosted in a rack with a bad switch
80 which has trouble connecting to another OSD. To avoid this sort of false alarm,
81 we consider the peers reporting a failure a proxy for a potential "subcluster"
82 over the overall cluster that is similarly laggy. This is clearly not true in
83 all cases, but will sometimes help us localize the grace correction to a subset
84 of the system that is unhappy. ``mon osd reporter subtree level`` is used to
85 group the peers into the "subcluster" by their common ancestor type in CRUSH
86 map. By default, only two reports from different subtree are required to report
87 another Ceph OSD Daemon ``down``. You can change the number of reporters from
88 unique subtrees and the common ancestor type required to report a Ceph OSD
89 Daemon ``down`` to a Ceph Monitor by adding an ``mon osd min down reporters``
90 and ``mon osd reporter subtree level`` settings under the ``[mon]`` section of
91 your Ceph configuration file, or by setting the value at runtime.
94 .. ditaa:: +---------+ +---------+ +---------+
95 | OSD 1 | | OSD 2 | | Monitor |
96 +---------+ +---------+ +---------+
99 |---------------+--------------->|
111 .. index:: peering failure
113 OSDs Report Peering Failure
114 ===========================
116 If a Ceph OSD Daemon cannot peer with any of the Ceph OSD Daemons defined in its
117 Ceph configuration file (or the cluster map), it will ping a Ceph Monitor for
118 the most recent copy of the cluster map every 30 seconds. You can change the
119 Ceph Monitor heartbeat interval by adding an ``osd mon heartbeat interval``
120 setting under the ``[osd]`` section of your Ceph configuration file, or by
121 setting the value at runtime.
123 .. ditaa:: +---------+ +---------+ +-------+ +---------+
124 | OSD 1 | | OSD 2 | | OSD 3 | | Monitor |
125 +---------+ +---------+ +-------+ +---------+
129 |-------------->| | |
130 |<--------------| | |
135 |----------------------------->| |
139 |<---+ Interval Exceeded |
141 | Failed to Peer with OSD 3 |
142 |-------------------------------------------->|
143 |<--------------------------------------------|
144 | Receive New Cluster Map |
147 .. index:: OSD status
149 OSDs Report Their Status
150 ========================
152 If an Ceph OSD Daemon doesn't report to a Ceph Monitor, the Ceph Monitor will
153 consider the Ceph OSD Daemon ``down`` after the ``mon osd report timeout``
154 elapses. A Ceph OSD Daemon sends a report to a Ceph Monitor when a reportable
155 event such as a failure, a change in placement group stats, a change in
156 ``up_thru`` or when it boots within 5 seconds. You can change the Ceph OSD
157 Daemon minimum report interval by adding an ``osd mon report interval min``
158 setting under the ``[osd]`` section of your Ceph configuration file, or by
159 setting the value at runtime. A Ceph OSD Daemon sends a report to a Ceph
160 Monitor every 120 seconds irrespective of whether any notable changes occur.
161 You can change the Ceph Monitor report interval by adding an ``osd mon report
162 interval max`` setting under the ``[osd]`` section of your Ceph configuration
163 file, or by setting the value at runtime.
166 .. ditaa:: +---------+ +---------+
167 | OSD 1 | | Monitor |
168 +---------+ +---------+
180 |------------------->|
188 |------------------->|
204 Configuration Settings
205 ======================
207 When modifying heartbeat settings, you should include them in the ``[global]``
208 section of your configuration file.
210 .. index:: monitor heartbeat
215 ``mon osd min up ratio``
217 :Description: The minimum ratio of ``up`` Ceph OSD Daemons before Ceph will
218 mark Ceph OSD Daemons ``down``.
224 ``mon osd min in ratio``
226 :Description: The minimum ratio of ``in`` Ceph OSD Daemons before Ceph will
227 mark Ceph OSD Daemons ``out``.
233 ``mon osd laggy halflife``
235 :Description: The number of seconds laggy estimates will decay.
240 ``mon osd laggy weight``
242 :Description: The weight for new samples in laggy estimation decay.
248 ``mon osd laggy max interval``
250 :Description: Maximum value of ``laggy_interval`` in laggy estimations (in seconds).
251 Monitor uses an adaptive approach to evaluate the ``laggy_interval`` of
252 a certain OSD. This value will be used to calculate the grace time for
257 ``mon osd adjust heartbeat grace``
259 :Description: If set to ``true``, Ceph will scale based on laggy estimations.
264 ``mon osd adjust down out interval``
266 :Description: If set to ``true``, Ceph will scaled based on laggy estimations.
271 ``mon osd auto mark in``
273 :Description: Ceph will mark any booting Ceph OSD Daemons as ``in``
274 the Ceph Storage Cluster.
280 ``mon osd auto mark auto out in``
282 :Description: Ceph will mark booting Ceph OSD Daemons auto marked ``out``
283 of the Ceph Storage Cluster as ``in`` the cluster.
289 ``mon osd auto mark new in``
291 :Description: Ceph will mark booting new Ceph OSD Daemons as ``in`` the
292 Ceph Storage Cluster.
298 ``mon osd down out interval``
300 :Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon
301 ``down`` and ``out`` if it doesn't respond.
303 :Type: 32-bit Integer
307 ``mon osd down out subtree limit``
309 :Description: The smallest :term:`CRUSH` unit type that Ceph will **not**
310 automatically mark out. For instance, if set to ``host`` and if
311 all OSDs of a host are down, Ceph will not automatically mark out
318 ``mon osd report timeout``
320 :Description: The grace period in seconds before declaring
321 unresponsive Ceph OSD Daemons ``down``.
323 :Type: 32-bit Integer
326 ``mon osd min down reporters``
328 :Description: The minimum number of Ceph OSD Daemons required to report a
329 ``down`` Ceph OSD Daemon.
331 :Type: 32-bit Integer
335 ``mon osd reporter subtree level``
337 :Description: In which level of parent bucket the reporters are counted. The OSDs
338 send failure reports to monitor if they find its peer is not responsive.
339 And monitor mark the reported OSD out and then down after a grace period.
344 .. index:: OSD hearbeat
349 ``osd heartbeat address``
351 :Description: An Ceph OSD Daemon's network address for heartbeats.
353 :Default: The host address.
356 ``osd heartbeat interval``
358 :Description: How often an Ceph OSD Daemon pings its peers (in seconds).
359 :Type: 32-bit Integer
363 ``osd heartbeat grace``
365 :Description: The elapsed time when a Ceph OSD Daemon hasn't shown a heartbeat
366 that the Ceph Storage Cluster considers it ``down``.
367 This setting has to be set in both the [mon] and [osd] or [global]
368 section so that it is read by both the MON and OSD daemons.
369 :Type: 32-bit Integer
373 ``osd mon heartbeat interval``
375 :Description: How often the Ceph OSD Daemon pings a Ceph Monitor if it has no
376 Ceph OSD Daemon peers.
378 :Type: 32-bit Integer
382 ``osd mon report interval max``
384 :Description: The maximum time in seconds that a Ceph OSD Daemon can wait before
385 it must report to a Ceph Monitor.
387 :Type: 32-bit Integer
391 ``osd mon report interval min``
393 :Description: The minimum number of seconds a Ceph OSD Daemon may wait
394 from startup or another reportable event before reporting
397 :Type: 32-bit Integer
399 :Valid Range: Should be less than ``osd mon report interval max``
402 ``osd mon ack timeout``
404 :Description: The number of seconds to wait for a Ceph Monitor to acknowledge a
405 request for statistics.
407 :Type: 32-bit Integer