5 Once you have a running cluster, you may use the ``ceph`` tool to monitor your
6 cluster. Monitoring a cluster typically involves checking OSD status, monitor
7 status, placement group status and metadata server status.
10 ======================
15 To run the ``ceph`` tool in interactive mode, type ``ceph`` at the command line
16 with no arguments. For example::
27 If you specified non-default locations for your configuration or keyring,
28 you may specify their locations::
30 ceph -c /path/to/conf -k /path/to/keyring health
32 Checking a Cluster's Status
33 ===========================
35 After you start your cluster, and before you start reading and/or
36 writing data, check your cluster's status first.
38 To check a cluster's status, execute the following::
46 In interactive mode, type ``status`` and press **Enter**. ::
50 Ceph will print the cluster status. For example, a tiny Ceph demonstration
51 cluster with one of each service may print the following:
56 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
60 mon: 1 daemons, quorum a
62 mds: 1/1/1 up {0=a=up:active}
63 osd: 1 osds: 1 up, 1 in
66 pools: 2 pools, 16 pgs
67 objects: 21 objects, 2246 bytes
68 usage: 546 GB used, 384 GB / 931 GB avail
72 .. topic:: How Ceph Calculates Data Usage
74 The ``usage`` value reflects the *actual* amount of raw storage used. The
75 ``xxx GB / xxx GB`` value means the amount available (the lesser number)
76 of the overall storage capacity of the cluster. The notional number reflects
77 the size of the stored data before it is replicated, cloned or snapshotted.
78 Therefore, the amount of data actually stored typically exceeds the notional
79 amount stored, because Ceph creates replicas of the data and may also use
80 storage capacity for cloning and snapshotting.
86 In addition to local logging by each daemon, Ceph clusters maintain
87 a *cluster log* that records high level events about the whole system.
88 This is logged to disk on monitor servers (as ``/var/log/ceph/ceph.log`` by
89 default), but can also be monitored via the command line.
91 To follow the cluster log, use the following command
97 Ceph will print the status of the system, followed by each log message as it
98 is emitted. For example:
103 id: 477e46f1-ae41-4e43-9c8f-72c918ab0a20
107 mon: 1 daemons, quorum a
109 mds: 1/1/1 up {0=a=up:active}
110 osd: 1 osds: 1 up, 1 in
113 pools: 2 pools, 16 pgs
114 objects: 21 objects, 2246 bytes
115 usage: 546 GB used, 384 GB / 931 GB avail
119 2017-07-24 08:15:11.329298 mon.a mon.0 172.21.9.34:6789/0 23 : cluster [INF] osd.0 172.21.9.34:6806/20527 boot
120 2017-07-24 08:15:14.258143 mon.a mon.0 172.21.9.34:6789/0 39 : cluster [INF] Activating manager daemon x
121 2017-07-24 08:15:15.446025 mon.a mon.0 172.21.9.34:6789/0 47 : cluster [INF] Manager daemon x is now available
124 In addition to using ``ceph -w`` to print log lines as they are emitted,
125 use ``ceph log last [n]`` to see the most recent ``n`` lines from the cluster
128 Monitoring Health Checks
129 ========================
131 Ceph continously runs various *health checks* against its own status. When
132 a health check fails, this is reflected in the output of ``ceph status`` (or
133 ``ceph health``). In addition, messages are sent to the cluster log to
134 indicate when a check fails, and when the cluster recovers.
136 For example, when an OSD goes down, the ``health`` section of the status
137 output may be updated as follows:
143 Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded
145 At this time, cluster log messages are also emitted to record the failure of the
150 2017-07-25 10:08:58.265945 mon.a mon.0 172.21.9.34:6789/0 91 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
151 2017-07-25 10:09:01.302624 mon.a mon.0 172.21.9.34:6789/0 94 : cluster [WRN] Health check failed: Degraded data redundancy: 21/63 objects degraded (33.333%), 16 pgs unclean, 16 pgs degraded (PG_DEGRADED)
153 When the OSD comes back online, the cluster log records the cluster's return
158 2017-07-25 10:11:11.526841 mon.a mon.0 172.21.9.34:6789/0 109 : cluster [WRN] Health check update: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized (PG_DEGRADED)
159 2017-07-25 10:11:13.535493 mon.a mon.0 172.21.9.34:6789/0 110 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 2 pgs unclean, 2 pgs degraded, 2 pgs undersized)
160 2017-07-25 10:11:13.535577 mon.a mon.0 172.21.9.34:6789/0 111 : cluster [INF] Cluster is now healthy
163 Detecting configuration issues
164 ==============================
166 In addition to the health checks that Ceph continuously runs on its
167 own status, there are some configuration issues that may only be detected
170 Use the `ceph-medic`_ tool to run these additional checks on your Ceph
171 cluster's configuration.
173 Checking a Cluster's Usage Stats
174 ================================
176 To check a cluster's data usage and data distribution among pools, you can
177 use the ``df`` option. It is similar to Linux ``df``. Execute
182 The **GLOBAL** section of the output provides an overview of the amount of
183 storage your cluster uses for your data.
185 - **SIZE:** The overall storage capacity of the cluster.
186 - **AVAIL:** The amount of free space available in the cluster.
187 - **RAW USED:** The amount of raw storage used.
188 - **% RAW USED:** The percentage of raw storage used. Use this number in
189 conjunction with the ``full ratio`` and ``near full ratio`` to ensure that
190 you are not reaching your cluster's capacity. See `Storage Capacity`_ for
193 The **POOLS** section of the output provides a list of pools and the notional
194 usage of each pool. The output from this section **DOES NOT** reflect replicas,
195 clones or snapshots. For example, if you store an object with 1MB of data, the
196 notional usage will be 1MB, but the actual usage may be 2MB or more depending
197 on the number of replicas, clones and snapshots.
199 - **NAME:** The name of the pool.
200 - **ID:** The pool ID.
201 - **USED:** The notional amount of data stored in kilobytes, unless the number
202 appends **M** for megabytes or **G** for gigabytes.
203 - **%USED:** The notional percentage of storage used per pool.
204 - **MAX AVAIL:** An estimate of the notional amount of data that can be written
206 - **Objects:** The notional number of objects stored per pool.
208 .. note:: The numbers in the **POOLS** section are notional. They are not
209 inclusive of the number of replicas, shapshots or clones. As a result,
210 the sum of the **USED** and **%USED** amounts will not add up to the
211 **RAW USED** and **%RAW USED** amounts in the **GLOBAL** section of the
214 .. note:: The **MAX AVAIL** value is a complicated function of the
215 replication or erasure code used, the CRUSH rule that maps storage
216 to devices, the utilization of those devices, and the configured
224 You can check OSDs to ensure they are ``up`` and ``in`` by executing::
232 You can also check view OSDs according to their position in the CRUSH map. ::
236 Ceph will print out a CRUSH tree with a host, its OSDs, whether they are up
239 # id weight type name up/down reweight
247 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
249 Checking Monitor Status
250 =======================
252 If your cluster has multiple monitors (likely), you should check the monitor
253 quorum status after you start the cluster before reading and/or writing data. A
254 quorum must be present when multiple monitors are running. You should also check
255 monitor status periodically to ensure that they are running.
257 To see display the monitor map, execute the following::
265 To check the quorum status for the monitor cluster, execute the following::
269 Ceph will return the quorum status. For example, a Ceph cluster consisting of
270 three monitors may return the following:
272 .. code-block:: javascript
274 { "election_epoch": 10,
279 "monmap": { "epoch": 1,
280 "fsid": "444b489c-4f16-4b75-83f0-cb8097468898",
281 "modified": "2011-12-12 13:28:27.505520",
282 "created": "2011-12-12 13:28:27.505520",
286 "addr": "127.0.0.1:6789\/0"},
289 "addr": "127.0.0.1:6790\/0"},
292 "addr": "127.0.0.1:6791\/0"}
300 Metadata servers provide metadata services for Ceph FS. Metadata servers have
301 two sets of states: ``up | down`` and ``active | inactive``. To ensure your
302 metadata servers are ``up`` and ``active``, execute the following::
306 To display details of the metadata cluster, execute the following::
311 Checking Placement Group States
312 ===============================
314 Placement groups map objects to OSDs. When you monitor your
315 placement groups, you will want them to be ``active`` and ``clean``.
316 For a detailed discussion, refer to `Monitoring OSDs and Placement Groups`_.
318 .. _Monitoring OSDs and Placement Groups: ../monitoring-osd-pg
321 Using the Admin Socket
322 ======================
324 The Ceph admin socket allows you to query a daemon via a socket interface.
325 By default, Ceph sockets reside under ``/var/run/ceph``. To access a daemon
326 via the admin socket, login to the host running the daemon and use the
329 ceph daemon {daemon-name}
330 ceph daemon {path-to-socket-file}
332 For example, the following are equivalent::
334 ceph daemon osd.0 foo
335 ceph daemon /var/run/ceph/ceph-osd.0.asok foo
337 To view the available admin socket commands, execute the following command::
339 ceph daemon {daemon-name} help
341 The admin socket command enables you to show and set your configuration at
342 runtime. See `Viewing a Configuration at Runtime`_ for details.
344 Additionally, you can set configuration values at runtime directly (i.e., the
345 admin socket bypasses the monitor, unlike ``ceph tell {daemon-type}.{id}
346 injectargs``, which relies on the monitor but doesn't require you to login
347 directly to the host in question ).
349 .. _Viewing a Configuration at Runtime: ../../configuration/ceph-conf#ceph-runtime-config
350 .. _Storage Capacity: ../../configuration/mon-config-ref#storage-capacity
351 .. _ceph-medic: http://docs.ceph.com/ceph-medic/master/