src/ceph/doc/cephfs/full.rst

   1
   2 Handling a full Ceph filesystem
   3 ===============================
   4
   5 When a RADOS cluster reaches its ``mon_osd_full_ratio`` (default
   6 95%) capacity, it is marked with the OSD full flag.  This flag causes
   7 most normal RADOS clients to pause all operations until it is resolved
   8 (for example by adding more capacity to the cluster).
   9
  10 The filesystem has some special handling of the full flag, explained below.
  11
  12 Hammer and later
  13 ----------------
  14
  15 Since the hammer release, a full filesystem will lead to ENOSPC
  16 results from:
  17
  18  * Data writes on the client
  19  * Metadata operations other than deletes and truncates
  20
  21 Because the full condition may not be encountered until
  22 data is flushed to disk (sometime after a ``write`` call has already
  23 returned 0), the ENOSPC error may not be seen until the application
  24 calls ``fsync`` or ``fclose`` (or equivalent) on the file handle.
  25
  26 Calling ``fsync`` is guaranteed to reliably indicate whether the data
  27 made it to disk, and will return an error if it doesn't.  ``fclose`` will
  28 only return an error if buffered data happened to be flushed since
  29 the last write -- a successful ``fclose`` does not guarantee that the
  30 data made it to disk, and in a full-space situation, buffered data
  31 may be discarded after an ``fclose`` if no space is available to persist it.
  32
  33 .. warning::
  34     If an application appears to be misbehaving on a full filesystem,
  35     check that it is performing ``fsync()`` calls as necessary to ensure
  36     data is on disk before proceeding.
  37
  38 Data writes may be cancelled by the client if they are in flight at the
  39 time the OSD full flag is sent.  Clients update the ``osd_epoch_barrier``
  40 when releasing capabilities on files affected by cancelled operations, in
  41 order to ensure that these cancelled operations do not interfere with
  42 subsequent access to the data objects by the MDS or other clients.  For
  43 more on the epoch barrier mechanism, see :ref:`background_blacklisting_and_osd_epoch_barrier`.
  44
  45 Legacy (pre-hammer) behavior
  46 ----------------------------
  47
  48 In versions of Ceph earlier than hammer, the MDS would ignore
  49 the full status of the RADOS cluster, and any data writes from
  50 clients would stall until the cluster ceased to be full.
  51
  52 There are two dangerous conditions to watch for with this behaviour:
  53
  54 * If a client had pending writes to a file, then it was not possible
  55   for the client to release the file to the MDS for deletion: this could
  56   lead to difficulty clearing space on a full filesystem
  57 * If clients continued to create a large number of empty files, the
  58   resulting metadata writes from the MDS could lead to total exhaustion
  59   of space on the OSDs such that no further deletions could be performed.
  60