src/ceph/doc/cephfs/standby.rst

   1
   2 Terminology
   3 -----------
   4
   5 A Ceph cluster may have zero or more CephFS *filesystems*.  CephFS
   6 filesystems have a human readable name (set in ``fs new``)
   7 and an integer ID.  The ID is called the filesystem cluster ID,
   8 or *FSCID*.
   9
  10 Each CephFS filesystem has a number of *ranks*, one by default,
  11 which start at zero.  A rank may be thought of as a metadata shard.
  12 Controlling the number of ranks in a filesystem is described
  13 in :doc:`/cephfs/multimds`
  14
  15 Each CephFS ceph-mds process (a *daemon*) initially starts up
  16 without a rank.  It may be assigned one by the monitor cluster.
  17 A daemon may only hold one rank at a time.  Daemons only give up
  18 a rank when the ceph-mds process stops.
  19
  20 If a rank is not associated with a daemon, the rank is
  21 considered *failed*.  Once a rank is assigned to a daemon,
  22 the rank is considered *up*.
  23
  24 A daemon has a *name* that is set statically by the administrator
  25 when the daemon is first configured.  Typical configurations
  26 use the hostname where the daemon runs as the daemon name.
  27
  28 Each time a daemon starts up, it is also assigned a *GID*, which
  29 is unique to this particular process lifetime of the daemon.  The
  30 GID is an integer.
  31
  32 Referring to MDS daemons
  33 ------------------------
  34
  35 Most of the administrative commands that refer to an MDS daemon
  36 accept a flexible argument format that may contain a rank, a GID
  37 or a name.
  38
  39 Where a rank is used, this may optionally be qualified with
  40 a leading filesystem name or ID.  If a daemon is a standby (i.e.
  41 it is not currently assigned a rank), then it may only be
  42 referred to by GID or name.
  43
  44 For example, if we had an MDS daemon which was called 'myhost',
  45 had GID 5446, and was assigned rank 0 in the filesystem 'myfs'
  46 which had FSCID 3, then any of the following would be suitable
  47 forms of the 'fail' command:
  48
  49 ::
  50
  51     ceph mds fail 5446     # GID
  52     ceph mds fail myhost   # Daemon name
  53     ceph mds fail 0        # Unqualified rank
  54     ceph mds fail 3:0      # FSCID and rank
  55     ceph mds fail myfs:0   # Filesystem name and rank
  56
  57 Managing failover
  58 -----------------
  59
  60 If an MDS daemon stops communicating with the monitor, the monitor will
  61 wait ``mds_beacon_grace`` seconds (default 15 seconds) before marking
  62 the daemon as *laggy*.
  63
  64 Each file system may specify a number of standby daemons to be considered
  65 healthy. This number includes daemons in standby-replay waiting for a rank to
  66 fail (remember that a standby-replay daemon will not be assigned to take over a
  67 failure for another rank or a failure in a another CephFS file system). The
  68 pool of standby daemons not in replay count towards any file system count.
  69 Each file system may set the number of standby daemons wanted using:
  70
  71 ::
  72
  73     ceph fs set <fs name> standby_count_wanted <count>
  74
  75 Setting ``count`` to 0 will disable the health check.
  76
  77
  78 Configuring standby daemons
  79 ---------------------------
  80
  81 There are four configuration settings that control how a daemon
  82 will behave while in standby:
  83
  84 ::
  85
  86     mds_standby_for_name
  87     mds_standby_for_rank
  88     mds_standby_for_fscid
  89     mds_standby_replay
  90
  91 These may be set in the ceph.conf on the host where the MDS daemon
  92 runs (as opposed to on the monitor).  The daemon loads these settings
  93 when it starts, and sends them to the monitor.
  94
  95 By default, if none of these settings are used, all MDS daemons
  96 which do not hold a rank will be used as standbys for any rank.
  97
  98 The settings which associate a standby daemon with a particular
  99 name or rank do not guarantee that the daemon will *only* be used
 100 for that rank.  They mean that when several standbys are available,
 101 the associated standby daemon will be used.  If a rank is failed,
 102 and a standby is available, it will be used even if it is associated
 103 with a different rank or named daemon.
 104
 105 mds_standby_replay
 106 ~~~~~~~~~~~~~~~~~~
 107
 108 If this is set to true, then the standby daemon will continuously read
 109 the metadata journal of an up rank.  This will give it
 110 a warm metadata cache, and speed up the process of failing over
 111 if the daemon serving the rank fails.
 112
 113 An up rank may only have one standby replay daemon assigned to it,
 114 if two daemons are both set to be standby replay then one of them
 115 will arbitrarily win, and the other will become a normal non-replay
 116 standby.
 117
 118 Once a daemon has entered the standby replay state, it will only be
 119 used as a standby for the rank that it is following.  If another rank
 120 fails, this standby replay daemon will not be used as a replacement,
 121 even if no other standbys are available.
 122
 123 *Historical note:* In Ceph prior to v10.2.1, this setting (when ``false``) is
 124 always true when ``mds_standby_for_*`` is also set.
 125
 126 mds_standby_for_name
 127 ~~~~~~~~~~~~~~~~~~~~
 128
 129 Set this to make the standby daemon only take over a failed rank
 130 if the last daemon to hold it matches this name.
 131
 132 mds_standby_for_rank
 133 ~~~~~~~~~~~~~~~~~~~~
 134
 135 Set this to make the standby daemon only take over the specified
 136 rank.  If another rank fails, this daemon will not be used to
 137 replace it.
 138
 139 Use in conjunction with ``mds_standby_for_fscid`` to be specific
 140 about which filesystem's rank you are targeting, if you have
 141 multiple filesystems.
 142
 143 mds_standby_for_fscid
 144 ~~~~~~~~~~~~~~~~~~~~~
 145
 146 If ``mds_standby_for_rank`` is set, this is simply a qualifier to
 147 say which filesystem's rank is referred to.
 148
 149 If ``mds_standby_for_rank`` is not set, then setting FSCID will
 150 cause this daemon to target any rank in the specified FSCID.  Use
 151 this if you have a daemon that you want to use for any rank, but
 152 only within a particular filesystem.
 153
 154 mon_force_standby_active
 155 ~~~~~~~~~~~~~~~~~~~~~~~~
 156
 157 This setting is used on monitor hosts.  It defaults to true.
 158
 159 If it is false, then daemons configured with standby_replay=true
 160 will **only** become active if the rank/name that they have
 161 been configured to follow fails.  On the other hand, if this
 162 setting is true, then a daemon configured with standby_replay=true
 163 may be assigned some other rank.
 164
 165 Examples
 166 --------
 167
 168 These are example ceph.conf snippets.  In practice you can either
 169 copy a ceph.conf with all daemons' configuration to all your servers,
 170 or you can have a different file on each server that contains just
 171 that server's daemons' configuration.
 172
 173 Simple pair
 174 ~~~~~~~~~~~
 175
 176 Two MDS daemons 'a' and 'b' acting as a pair, where whichever one is not
 177 currently assigned a rank will be the standby replay follower
 178 of the other.
 179
 180 ::
 181
 182     [mds.a]
 183     mds standby replay = true
 184     mds standby for rank = 0
 185
 186     [mds.b]
 187     mds standby replay = true
 188     mds standby for rank = 0
 189
 190 Floating standby
 191 ~~~~~~~~~~~~~~~~
 192
 193 Three MDS daemons 'a', 'b' and 'c', in a filesystem that has
 194 ``max_mds`` set to 2.
 195
 196 ::
 197
 198     # No explicit configuration required: whichever daemon is
 199     # not assigned a rank will go into 'standby' and take over
 200     # for whichever other daemon fails.
 201
 202 Two MDS clusters
 203 ~~~~~~~~~~~~~~~~
 204
 205 With two filesystems, I have four MDS daemons, and I want two
 206 to act as a pair for one filesystem and two to act as a pair
 207 for the other filesystem.
 208
 209 ::
 210
 211     [mds.a]
 212     mds standby for fscid = 1
 213
 214     [mds.b]
 215     mds standby for fscid = 1
 216
 217     [mds.c]
 218     mds standby for fscid = 2
 219
 220     [mds.d]
 221     mds standby for fscid = 2
 222