5 A Ceph cluster may have zero or more CephFS *filesystems*. CephFS
6 filesystems have a human readable name (set in ``fs new``)
7 and an integer ID. The ID is called the filesystem cluster ID,
10 Each CephFS filesystem has a number of *ranks*, one by default,
11 which start at zero. A rank may be thought of as a metadata shard.
12 Controlling the number of ranks in a filesystem is described
13 in :doc:`/cephfs/multimds`
15 Each CephFS ceph-mds process (a *daemon*) initially starts up
16 without a rank. It may be assigned one by the monitor cluster.
17 A daemon may only hold one rank at a time. Daemons only give up
18 a rank when the ceph-mds process stops.
20 If a rank is not associated with a daemon, the rank is
21 considered *failed*. Once a rank is assigned to a daemon,
22 the rank is considered *up*.
24 A daemon has a *name* that is set statically by the administrator
25 when the daemon is first configured. Typical configurations
26 use the hostname where the daemon runs as the daemon name.
28 Each time a daemon starts up, it is also assigned a *GID*, which
29 is unique to this particular process lifetime of the daemon. The
32 Referring to MDS daemons
33 ------------------------
35 Most of the administrative commands that refer to an MDS daemon
36 accept a flexible argument format that may contain a rank, a GID
39 Where a rank is used, this may optionally be qualified with
40 a leading filesystem name or ID. If a daemon is a standby (i.e.
41 it is not currently assigned a rank), then it may only be
42 referred to by GID or name.
44 For example, if we had an MDS daemon which was called 'myhost',
45 had GID 5446, and was assigned rank 0 in the filesystem 'myfs'
46 which had FSCID 3, then any of the following would be suitable
47 forms of the 'fail' command:
51 ceph mds fail 5446 # GID
52 ceph mds fail myhost # Daemon name
53 ceph mds fail 0 # Unqualified rank
54 ceph mds fail 3:0 # FSCID and rank
55 ceph mds fail myfs:0 # Filesystem name and rank
60 If an MDS daemon stops communicating with the monitor, the monitor will
61 wait ``mds_beacon_grace`` seconds (default 15 seconds) before marking
62 the daemon as *laggy*.
64 Each file system may specify a number of standby daemons to be considered
65 healthy. This number includes daemons in standby-replay waiting for a rank to
66 fail (remember that a standby-replay daemon will not be assigned to take over a
67 failure for another rank or a failure in a another CephFS file system). The
68 pool of standby daemons not in replay count towards any file system count.
69 Each file system may set the number of standby daemons wanted using:
73 ceph fs set <fs name> standby_count_wanted <count>
75 Setting ``count`` to 0 will disable the health check.
78 Configuring standby daemons
79 ---------------------------
81 There are four configuration settings that control how a daemon
82 will behave while in standby:
91 These may be set in the ceph.conf on the host where the MDS daemon
92 runs (as opposed to on the monitor). The daemon loads these settings
93 when it starts, and sends them to the monitor.
95 By default, if none of these settings are used, all MDS daemons
96 which do not hold a rank will be used as standbys for any rank.
98 The settings which associate a standby daemon with a particular
99 name or rank do not guarantee that the daemon will *only* be used
100 for that rank. They mean that when several standbys are available,
101 the associated standby daemon will be used. If a rank is failed,
102 and a standby is available, it will be used even if it is associated
103 with a different rank or named daemon.
108 If this is set to true, then the standby daemon will continuously read
109 the metadata journal of an up rank. This will give it
110 a warm metadata cache, and speed up the process of failing over
111 if the daemon serving the rank fails.
113 An up rank may only have one standby replay daemon assigned to it,
114 if two daemons are both set to be standby replay then one of them
115 will arbitrarily win, and the other will become a normal non-replay
118 Once a daemon has entered the standby replay state, it will only be
119 used as a standby for the rank that it is following. If another rank
120 fails, this standby replay daemon will not be used as a replacement,
121 even if no other standbys are available.
123 *Historical note:* In Ceph prior to v10.2.1, this setting (when ``false``) is
124 always true when ``mds_standby_for_*`` is also set.
129 Set this to make the standby daemon only take over a failed rank
130 if the last daemon to hold it matches this name.
135 Set this to make the standby daemon only take over the specified
136 rank. If another rank fails, this daemon will not be used to
139 Use in conjunction with ``mds_standby_for_fscid`` to be specific
140 about which filesystem's rank you are targeting, if you have
141 multiple filesystems.
143 mds_standby_for_fscid
144 ~~~~~~~~~~~~~~~~~~~~~
146 If ``mds_standby_for_rank`` is set, this is simply a qualifier to
147 say which filesystem's rank is referred to.
149 If ``mds_standby_for_rank`` is not set, then setting FSCID will
150 cause this daemon to target any rank in the specified FSCID. Use
151 this if you have a daemon that you want to use for any rank, but
152 only within a particular filesystem.
154 mon_force_standby_active
155 ~~~~~~~~~~~~~~~~~~~~~~~~
157 This setting is used on monitor hosts. It defaults to true.
159 If it is false, then daemons configured with standby_replay=true
160 will **only** become active if the rank/name that they have
161 been configured to follow fails. On the other hand, if this
162 setting is true, then a daemon configured with standby_replay=true
163 may be assigned some other rank.
168 These are example ceph.conf snippets. In practice you can either
169 copy a ceph.conf with all daemons' configuration to all your servers,
170 or you can have a different file on each server that contains just
171 that server's daemons' configuration.
176 Two MDS daemons 'a' and 'b' acting as a pair, where whichever one is not
177 currently assigned a rank will be the standby replay follower
183 mds standby replay = true
184 mds standby for rank = 0
187 mds standby replay = true
188 mds standby for rank = 0
193 Three MDS daemons 'a', 'b' and 'c', in a filesystem that has
194 ``max_mds`` set to 2.
198 # No explicit configuration required: whichever daemon is
199 # not assigned a rank will go into 'standby' and take over
200 # for whichever other daemon fails.
205 With two filesystems, I have four MDS daemons, and I want two
206 to act as a pair for one filesystem and two to act as a pair
207 for the other filesystem.
212 mds standby for fscid = 1
215 mds standby for fscid = 1
218 mds standby for fscid = 2
221 mds standby for fscid = 2