X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=blobdiff_plain;f=src%2Fceph%2Fdoc%2Fcephfs%2Fstandby.rst;fp=src%2Fceph%2Fdoc%2Fcephfs%2Fstandby.rst;h=6cba2b75a938b47fa717c88f1b3aa3e8115e62e1;hb=812ff6ca9fcd3e629e49d4328905f33eee8ca3f5;hp=0000000000000000000000000000000000000000;hpb=15280273faafb77777eab341909a3f495cf248d9;p=stor4nfv.git

diff --git a/src/ceph/doc/cephfs/standby.rst b/src/ceph/doc/cephfs/standby.rst
new file mode 100644
index 0000000..6cba2b7
--- /dev/null
+++ b/src/ceph/doc/cephfs/standby.rst
@@ -0,0 +1,222 @@
+
+Terminology
+-----------
+
+A Ceph cluster may have zero or more CephFS *filesystems*.  CephFS
+filesystems have a human readable name (set in ``fs new``)
+and an integer ID.  The ID is called the filesystem cluster ID,
+or *FSCID*.
+
+Each CephFS filesystem has a number of *ranks*, one by default,
+which start at zero.  A rank may be thought of as a metadata shard.
+Controlling the number of ranks in a filesystem is described
+in :doc:`/cephfs/multimds`
+
+Each CephFS ceph-mds process (a *daemon*) initially starts up
+without a rank.  It may be assigned one by the monitor cluster.
+A daemon may only hold one rank at a time.  Daemons only give up
+a rank when the ceph-mds process stops.
+
+If a rank is not associated with a daemon, the rank is
+considered *failed*.  Once a rank is assigned to a daemon,
+the rank is considered *up*.
+
+A daemon has a *name* that is set statically by the administrator
+when the daemon is first configured.  Typical configurations
+use the hostname where the daemon runs as the daemon name.
+
+Each time a daemon starts up, it is also assigned a *GID*, which
+is unique to this particular process lifetime of the daemon.  The
+GID is an integer.
+
+Referring to MDS daemons
+------------------------
+
+Most of the administrative commands that refer to an MDS daemon
+accept a flexible argument format that may contain a rank, a GID
+or a name.
+
+Where a rank is used, this may optionally be qualified with
+a leading filesystem name or ID.  If a daemon is a standby (i.e.
+it is not currently assigned a rank), then it may only be
+referred to by GID or name.
+
+For example, if we had an MDS daemon which was called 'myhost',
+had GID 5446, and was assigned rank 0 in the filesystem 'myfs'
+which had FSCID 3, then any of the following would be suitable
+forms of the 'fail' command:
+
+::
+
+    ceph mds fail 5446     # GID
+    ceph mds fail myhost   # Daemon name
+    ceph mds fail 0        # Unqualified rank
+    ceph mds fail 3:0      # FSCID and rank
+    ceph mds fail myfs:0   # Filesystem name and rank
+
+Managing failover
+-----------------
+
+If an MDS daemon stops communicating with the monitor, the monitor will
+wait ``mds_beacon_grace`` seconds (default 15 seconds) before marking
+the daemon as *laggy*.
+
+Each file system may specify a number of standby daemons to be considered
+healthy. This number includes daemons in standby-replay waiting for a rank to
+fail (remember that a standby-replay daemon will not be assigned to take over a
+failure for another rank or a failure in a another CephFS file system). The
+pool of standby daemons not in replay count towards any file system count.
+Each file system may set the number of standby daemons wanted using:
+
+::
+
+    ceph fs set <fs name> standby_count_wanted <count>
+
+Setting ``count`` to 0 will disable the health check.
+
+
+Configuring standby daemons
+---------------------------
+
+There are four configuration settings that control how a daemon
+will behave while in standby:
+
+::
+
+    mds_standby_for_name
+    mds_standby_for_rank
+    mds_standby_for_fscid
+    mds_standby_replay
+
+These may be set in the ceph.conf on the host where the MDS daemon
+runs (as opposed to on the monitor).  The daemon loads these settings
+when it starts, and sends them to the monitor.
+
+By default, if none of these settings are used, all MDS daemons
+which do not hold a rank will be used as standbys for any rank.
+
+The settings which associate a standby daemon with a particular
+name or rank do not guarantee that the daemon will *only* be used
+for that rank.  They mean that when several standbys are available,
+the associated standby daemon will be used.  If a rank is failed,
+and a standby is available, it will be used even if it is associated
+with a different rank or named daemon.
+
+mds_standby_replay
+~~~~~~~~~~~~~~~~~~
+
+If this is set to true, then the standby daemon will continuously read
+the metadata journal of an up rank.  This will give it
+a warm metadata cache, and speed up the process of failing over
+if the daemon serving the rank fails.
+
+An up rank may only have one standby replay daemon assigned to it,
+if two daemons are both set to be standby replay then one of them
+will arbitrarily win, and the other will become a normal non-replay
+standby.
+
+Once a daemon has entered the standby replay state, it will only be
+used as a standby for the rank that it is following.  If another rank
+fails, this standby replay daemon will not be used as a replacement,
+even if no other standbys are available.
+
+*Historical note:* In Ceph prior to v10.2.1, this setting (when ``false``) is
+always true when ``mds_standby_for_*`` is also set.
+
+mds_standby_for_name
+~~~~~~~~~~~~~~~~~~~~
+
+Set this to make the standby daemon only take over a failed rank
+if the last daemon to hold it matches this name.
+
+mds_standby_for_rank
+~~~~~~~~~~~~~~~~~~~~
+
+Set this to make the standby daemon only take over the specified
+rank.  If another rank fails, this daemon will not be used to
+replace it.
+
+Use in conjunction with ``mds_standby_for_fscid`` to be specific
+about which filesystem's rank you are targeting, if you have
+multiple filesystems.
+
+mds_standby_for_fscid
+~~~~~~~~~~~~~~~~~~~~~
+
+If ``mds_standby_for_rank`` is set, this is simply a qualifier to
+say which filesystem's rank is referred to.
+
+If ``mds_standby_for_rank`` is not set, then setting FSCID will
+cause this daemon to target any rank in the specified FSCID.  Use
+this if you have a daemon that you want to use for any rank, but
+only within a particular filesystem.
+
+mon_force_standby_active
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+This setting is used on monitor hosts.  It defaults to true.
+
+If it is false, then daemons configured with standby_replay=true
+will **only** become active if the rank/name that they have
+been configured to follow fails.  On the other hand, if this
+setting is true, then a daemon configured with standby_replay=true
+may be assigned some other rank.
+
+Examples
+--------
+
+These are example ceph.conf snippets.  In practice you can either
+copy a ceph.conf with all daemons' configuration to all your servers,
+or you can have a different file on each server that contains just
+that server's daemons' configuration.
+
+Simple pair
+~~~~~~~~~~~
+
+Two MDS daemons 'a' and 'b' acting as a pair, where whichever one is not
+currently assigned a rank will be the standby replay follower
+of the other.
+
+::
+
+    [mds.a]
+    mds standby replay = true
+    mds standby for rank = 0
+
+    [mds.b]
+    mds standby replay = true
+    mds standby for rank = 0
+
+Floating standby
+~~~~~~~~~~~~~~~~
+
+Three MDS daemons 'a', 'b' and 'c', in a filesystem that has
+``max_mds`` set to 2.
+
+::
+    
+    # No explicit configuration required: whichever daemon is
+    # not assigned a rank will go into 'standby' and take over
+    # for whichever other daemon fails.
+
+Two MDS clusters
+~~~~~~~~~~~~~~~~
+
+With two filesystems, I have four MDS daemons, and I want two
+to act as a pair for one filesystem and two to act as a pair
+for the other filesystem.
+
+::
+
+    [mds.a]
+    mds standby for fscid = 1
+
+    [mds.b]
+    mds standby for fscid = 1
+
+    [mds.c]
+    mds standby for fscid = 2
+
+    [mds.d]
+    mds standby for fscid = 2
+