X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=blobdiff_plain;f=src%2Fceph%2Fdoc%2Frados%2Ftroubleshooting%2Ftroubleshooting-mon.rst;fp=src%2Fceph%2Fdoc%2Frados%2Ftroubleshooting%2Ftroubleshooting-mon.rst;h=0000000000000000000000000000000000000000;hb=7da45d65be36d36b880cc55c5036e96c24b53f00;hp=89fb94c32f998bd57bb4a30c38ee565a88c374e8;hpb=691462d09d0987b47e112d6ee8740375df3c51b2;p=stor4nfv.git diff --git a/src/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst b/src/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst deleted file mode 100644 index 89fb94c..0000000 --- a/src/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst +++ /dev/null @@ -1,567 +0,0 @@ -================================= - Troubleshooting Monitors -================================= - -.. index:: monitor, high availability - -When a cluster encounters monitor-related troubles there's a tendency to -panic, and some times with good reason. You should keep in mind that losing -a monitor, or a bunch of them, don't necessarily mean that your cluster is -down, as long as a majority is up, running and with a formed quorum. -Regardless of how bad the situation is, the first thing you should do is to -calm down, take a breath and try answering our initial troubleshooting script. - - -Initial Troubleshooting -======================== - - -**Are the monitors running?** - - First of all, we need to make sure the monitors are running. You would be - amazed by how often people forget to run the monitors, or restart them after - an upgrade. There's no shame in that, but let's try not losing a couple of - hours chasing an issue that is not there. - -**Are you able to connect to the monitor's servers?** - - Doesn't happen often, but sometimes people do have ``iptables`` rules that - block accesses to monitor servers or monitor ports. Usually leftovers from - monitor stress-testing that were forgotten at some point. Try ssh'ing into - the server and, if that succeeds, try connecting to the monitor's port - using you tool of choice (telnet, nc,...). - -**Does ceph -s run and obtain a reply from the cluster?** - - If the answer is yes then your cluster is up and running. One thing you - can take for granted is that the monitors will only answer to a ``status`` - request if there is a formed quorum. - - If ``ceph -s`` blocked however, without obtaining a reply from the cluster - or showing a lot of ``fault`` messages, then it is likely that your monitors - are either down completely or just a portion is up -- a portion that is not - enough to form a quorum (keep in mind that a quorum if formed by a majority - of monitors). - -**What if ceph -s doesn't finish?** - - If you haven't gone through all the steps so far, please go back and do. - - For those running on Emperor 0.72-rc1 and forward, you will be able to - contact each monitor individually asking them for their status, regardless - of a quorum being formed. This an be achieved using ``ceph ping mon.ID``, - ID being the monitor's identifier. You should perform this for each monitor - in the cluster. In section `Understanding mon_status`_ we will explain how - to interpret the output of this command. - - For the rest of you who don't tread on the bleeding edge, you will need to - ssh into the server and use the monitor's admin socket. Please jump to - `Using the monitor's admin socket`_. - -For other specific issues, keep on reading. - - -Using the monitor's admin socket -================================= - -The admin socket allows you to interact with a given daemon directly using a -Unix socket file. This file can be found in your monitor's ``run`` directory. -By default, the admin socket will be kept in ``/var/run/ceph/ceph-mon.ID.asok`` -but this can vary if you defined it otherwise. If you don't find it there, -please check your ``ceph.conf`` for an alternative path or run:: - - ceph-conf --name mon.ID --show-config-value admin_socket - -Please bear in mind that the admin socket will only be available while the -monitor is running. When the monitor is properly shutdown, the admin socket -will be removed. If however the monitor is not running and the admin socket -still persists, it is likely that the monitor was improperly shutdown. -Regardless, if the monitor is not running, you will not be able to use the -admin socket, with ``ceph`` likely returning ``Error 111: Connection Refused``. - -Accessing the admin socket is as simple as telling the ``ceph`` tool to use -the ``asok`` file. In pre-Dumpling Ceph, this can be achieved by:: - - ceph --admin-daemon /var/run/ceph/ceph-mon..asok - -while in Dumpling and beyond you can use the alternate (and recommended) -format:: - - ceph daemon mon. - -Using ``help`` as the command to the ``ceph`` tool will show you the -supported commands available through the admin socket. Please take a look -at ``config get``, ``config show``, ``mon_status`` and ``quorum_status``, -as those can be enlightening when troubleshooting a monitor. - - -Understanding mon_status -========================= - -``mon_status`` can be obtained through the ``ceph`` tool when you have -a formed quorum, or via the admin socket if you don't. This command will -output a multitude of information about the monitor, including the same -output you would get with ``quorum_status``. - -Take the following example of ``mon_status``:: - - - { "name": "c", - "rank": 2, - "state": "peon", - "election_epoch": 38, - "quorum": [ - 1, - 2], - "outside_quorum": [], - "extra_probe_peers": [], - "sync_provider": [], - "monmap": { "epoch": 3, - "fsid": "5c4e9d53-e2e1-478a-8061-f543f8be4cf8", - "modified": "2013-10-30 04:12:01.945629", - "created": "2013-10-29 14:14:41.914786", - "mons": [ - { "rank": 0, - "name": "a", - "addr": "127.0.0.1:6789\/0"}, - { "rank": 1, - "name": "b", - "addr": "127.0.0.1:6790\/0"}, - { "rank": 2, - "name": "c", - "addr": "127.0.0.1:6795\/0"}]}} - -A couple of things are obvious: we have three monitors in the monmap (*a*, *b* -and *c*), the quorum is formed by only two monitors, and *c* is in the quorum -as a *peon*. - -Which monitor is out of the quorum? - - The answer would be **a**. - -Why? - - Take a look at the ``quorum`` set. We have two monitors in this set: *1* - and *2*. These are not monitor names. These are monitor ranks, as established - in the current monmap. We are missing the monitor with rank 0, and according - to the monmap that would be ``mon.a``. - -By the way, how are ranks established? - - Ranks are (re)calculated whenever you add or remove monitors and follow a - simple rule: the **greater** the ``IP:PORT`` combination, the **lower** the - rank is. In this case, considering that ``127.0.0.1:6789`` is lower than all - the remaining ``IP:PORT`` combinations, ``mon.a`` has rank 0. - -Most Common Monitor Issues -=========================== - -Have Quorum but at least one Monitor is down ---------------------------------------------- - -When this happens, depending on the version of Ceph you are running, -you should be seeing something similar to:: - - $ ceph health detail - [snip] - mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum) - -How to troubleshoot this? - - First, make sure ``mon.a`` is running. - - Second, make sure you are able to connect to ``mon.a``'s server from the - other monitors' servers. Check the ports as well. Check ``iptables`` on - all your monitor nodes and make sure you are not dropping/rejecting - connections. - - If this initial troubleshooting doesn't solve your problems, then it's - time to go deeper. - - First, check the problematic monitor's ``mon_status`` via the admin - socket as explained in `Using the monitor's admin socket`_ and - `Understanding mon_status`_. - - Considering the monitor is out of the quorum, its state should be one of - ``probing``, ``electing`` or ``synchronizing``. If it happens to be either - ``leader`` or ``peon``, then the monitor believes to be in quorum, while - the remaining cluster is sure it is not; or maybe it got into the quorum - while we were troubleshooting the monitor, so check you ``ceph -s`` again - just to make sure. Proceed if the monitor is not yet in the quorum. - -What if the state is ``probing``? - - This means the monitor is still looking for the other monitors. Every time - you start a monitor, the monitor will stay in this state for some time - while trying to find the rest of the monitors specified in the ``monmap``. - The time a monitor will spend in this state can vary. For instance, when on - a single-monitor cluster, the monitor will pass through the probing state - almost instantaneously, since there are no other monitors around. On a - multi-monitor cluster, the monitors will stay in this state until they - find enough monitors to form a quorum -- this means that if you have 2 out - of 3 monitors down, the one remaining monitor will stay in this state - indefinitively until you bring one of the other monitors up. - - If you have a quorum, however, the monitor should be able to find the - remaining monitors pretty fast, as long as they can be reached. If your - monitor is stuck probing and you have gone through with all the communication - troubleshooting, then there is a fair chance that the monitor is trying - to reach the other monitors on a wrong address. ``mon_status`` outputs the - ``monmap`` known to the monitor: check if the other monitor's locations - match reality. If they don't, jump to - `Recovering a Monitor's Broken monmap`_; if they do, then it may be related - to severe clock skews amongst the monitor nodes and you should refer to - `Clock Skews`_ first, but if that doesn't solve your problem then it is - the time to prepare some logs and reach out to the community (please refer - to `Preparing your logs`_ on how to best prepare your logs). - - -What if state is ``electing``? - - This means the monitor is in the middle of an election. These should be - fast to complete, but at times the monitors can get stuck electing. This - is usually a sign of a clock skew among the monitor nodes; jump to - `Clock Skews`_ for more infos on that. If all your clocks are properly - synchronized, it is best if you prepare some logs and reach out to the - community. This is not a state that is likely to persist and aside from - (*really*) old bugs there is not an obvious reason besides clock skews on - why this would happen. - -What if state is ``synchronizing``? - - This means the monitor is synchronizing with the rest of the cluster in - order to join the quorum. The synchronization process is as faster as - smaller your monitor store is, so if you have a big store it may - take a while. Don't worry, it should be finished soon enough. - - However, if you notice that the monitor jumps from ``synchronizing`` to - ``electing`` and then back to ``synchronizing``, then you do have a - problem: the cluster state is advancing (i.e., generating new maps) way - too fast for the synchronization process to keep up. This used to be a - thing in early Cuttlefish, but since then the synchronization process was - quite refactored and enhanced to avoid just this sort of behavior. If this - happens in later versions let us know. And bring some logs - (see `Preparing your logs`_). - -What if state is ``leader`` or ``peon``? - - This should not happen. There is a chance this might happen however, and - it has a lot to do with clock skews -- see `Clock Skews`_. If you are not - suffering from clock skews, then please prepare your logs (see - `Preparing your logs`_) and reach out to us. - - -Recovering a Monitor's Broken monmap -------------------------------------- - -This is how a ``monmap`` usually looks like, depending on the number of -monitors:: - - - epoch 3 - fsid 5c4e9d53-e2e1-478a-8061-f543f8be4cf8 - last_changed 2013-10-30 04:12:01.945629 - created 2013-10-29 14:14:41.914786 - 0: 127.0.0.1:6789/0 mon.a - 1: 127.0.0.1:6790/0 mon.b - 2: 127.0.0.1:6795/0 mon.c - -This may not be what you have however. For instance, in some versions of -early Cuttlefish there was this one bug that could cause your ``monmap`` -to be nullified. Completely filled with zeros. This means that not even -``monmaptool`` would be able to read it because it would find it hard to -make sense of only-zeros. Some other times, you may end up with a monitor -with a severely outdated monmap, thus being unable to find the remaining -monitors (e.g., say ``mon.c`` is down; you add a new monitor ``mon.d``, -then remove ``mon.a``, then add a new monitor ``mon.e`` and remove -``mon.b``; you will end up with a totally different monmap from the one -``mon.c`` knows). - -In this sort of situations, you have two possible solutions: - -Scrap the monitor and create a new one - - You should only take this route if you are positive that you won't - lose the information kept by that monitor; that you have other monitors - and that they are running just fine so that your new monitor is able - to synchronize from the remaining monitors. Keep in mind that destroying - a monitor, if there are no other copies of its contents, may lead to - loss of data. - -Inject a monmap into the monitor - - Usually the safest path. You should grab the monmap from the remaining - monitors and inject it into the monitor with the corrupted/lost monmap. - - These are the basic steps: - - 1. Is there a formed quorum? If so, grab the monmap from the quorum:: - - $ ceph mon getmap -o /tmp/monmap - - 2. No quorum? Grab the monmap directly from another monitor (this - assumes the monitor you are grabbing the monmap from has id ID-FOO - and has been stopped):: - - $ ceph-mon -i ID-FOO --extract-monmap /tmp/monmap - - 3. Stop the monitor you are going to inject the monmap into. - - 4. Inject the monmap:: - - $ ceph-mon -i ID --inject-monmap /tmp/monmap - - 5. Start the monitor - - Please keep in mind that the ability to inject monmaps is a powerful - feature that can cause havoc with your monitors if misused as it will - overwrite the latest, existing monmap kept by the monitor. - - -Clock Skews ------------- - -Monitors can be severely affected by significant clock skews across the -monitor nodes. This usually translates into weird behavior with no obvious -cause. To avoid such issues, you should run a clock synchronization tool -on your monitor nodes. - - -What's the maximum tolerated clock skew? - - By default the monitors will allow clocks to drift up to ``0.05 seconds``. - - -Can I increase the maximum tolerated clock skew? - - This value is configurable via the ``mon-clock-drift-allowed`` option, and - although you *CAN* it doesn't mean you *SHOULD*. The clock skew mechanism - is in place because clock skewed monitor may not properly behave. We, as - developers and QA afficcionados, are comfortable with the current default - value, as it will alert the user before the monitors get out hand. Changing - this value without testing it first may cause unforeseen effects on the - stability of the monitors and overall cluster healthiness, although there is - no risk of dataloss. - - -How do I know there's a clock skew? - - The monitors will warn you in the form of a ``HEALTH_WARN``. ``ceph health - detail`` should show something in the form of:: - - mon.c addr 10.10.0.1:6789/0 clock skew 0.08235s > max 0.05s (latency 0.0045s) - - That means that ``mon.c`` has been flagged as suffering from a clock skew. - - -What should I do if there's a clock skew? - - Synchronize your clocks. Running an NTP client may help. If you are already - using one and you hit this sort of issues, check if you are using some NTP - server remote to your network and consider hosting your own NTP server on - your network. This last option tends to reduce the amount of issues with - monitor clock skews. - - -Client Can't Connect or Mount ------------------------------- - -Check your IP tables. Some OS install utilities add a ``REJECT`` rule to -``iptables``. The rule rejects all clients trying to connect to the host except -for ``ssh``. If your monitor host's IP tables have such a ``REJECT`` rule in -place, clients connecting from a separate node will fail to mount with a timeout -error. You need to address ``iptables`` rules that reject clients trying to -connect to Ceph daemons. For example, you would need to address rules that look -like this appropriately:: - - REJECT all -- anywhere anywhere reject-with icmp-host-prohibited - -You may also need to add rules to IP tables on your Ceph hosts to ensure -that clients can access the ports associated with your Ceph monitors (i.e., port -6789 by default) and Ceph OSDs (i.e., 6800 through 7300 by default). For -example:: - - iptables -A INPUT -m multiport -p tcp -s {ip-address}/{netmask} --dports 6789,6800:7300 -j ACCEPT - -Monitor Store Failures -====================== - -Symptoms of store corruption ----------------------------- - -Ceph monitor stores the `cluster map`_ in a key/value store such as LevelDB. If -a monitor fails due to the key/value store corruption, following error messages -might be found in the monitor log:: - - Corruption: error in middle of record - -or:: - - Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb - -Recovery using healthy monitor(s) ---------------------------------- - -If there is any survivers, we can always `replace`_ the corrupted one with a -new one. And after booting up, the new joiner will sync up with a healthy -peer, and once it is fully sync'ed, it will be able to serve the clients. - -Recovery using OSDs -------------------- - -But what if all monitors fail at the same time? Since users are encouraged to -deploy at least three monitors in a Ceph cluster, the chance of simultaneous -failure is rare. But unplanned power-downs in a data center with improperly -configured disk/fs settings could fail the underlying filesystem, and hence -kill all the monitors. In this case, we can recover the monitor store with the -information stored in OSDs.:: - - ms=/tmp/mon-store - mkdir $ms - # collect the cluster map from OSDs - for host in $hosts; do - rsync -avz $ms user@host:$ms - rm -rf $ms - ssh user@host <