Change rabbitmq queues HA mode from ha-all to ha-exactly
authorMichele Baldessari <michele@acksyn.org>
Thu, 29 Sep 2016 16:35:25 +0000 (18:35 +0200)
committerMichele Baldessari <michele@acksyn.org>
Wed, 5 Oct 2016 05:50:04 +0000 (07:50 +0200)
commit59a5f37c652c5cdad59723b5f96d808ff1558c90
tree86beb60f8199c3dde010983a6bdfb2d43fbc8c5f
parent264d49a255024eba9ea3bf2086c63e9d49b6da11
Change rabbitmq queues HA mode from ha-all to ha-exactly

It turns out that reducing number of rabbitmq queues in cluster
significantly improves performance of cluster especially in the case of
failover recovery time. Right now the cluster uses ha-all mode for rabbitmq
queues.

It is best to change this to "ha-exactly" mode and reduce the number
of queue copies to ceil(N/2) where N is number of controllers in the
cluster - so in typical scenario of 3 controller It would be 2 by
default.

It does not make much sense to keep the copies of queues over whole
cluster since if the quorum of nodes is lost then the rest of cluster
nodes will be stopped anyway. We let the user override this with a
parameter.

I.e. for a 3 node controlplane cluster we will go from this:
pcs resource show rabbitmq
 Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
  Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}"

To this:
pcs resource show rabbitmq
 Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
  Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}"

According to Marin Krcmarik's testing recovery time from failure was
reduced significantly.

Co-Authored-By: Marian Krcmarik <mkrcmari@redhat.com>
Change-Id: Ib62001c03e1e08f58cf0c6e0ba07a8879a584084
Partial-Bug: #1628998
manifests/profile/pacemaker/rabbitmq.pp