gerrit.opnfv Code Review - apex-tripleo-heat-templates.git/commit

author	Michele Baldessari <michele@acksyn.org>
	Thu, 29 Sep 2016 16:30:23 +0000 (18:30 +0200)
committer	Michele Baldessari <michele@acksyn.org>
	Fri, 30 Sep 2016 22:28:34 +0000 (00:28 +0200)
commit	1c5d16854417665f970ab6899759c25f865bf515
tree	020ea09e9425cb00c9787343456a1734b493c847	tree \| snapshot
parent	7ac30a97c4f814cced30598eab11ebc0cf31ca63	commit \| diff

Change rabbitmq queues HA mode from ha-all to ha-exactly

It turns out that reducing number of rabbitmq queues in cluster
significantly improves performance of cluster especially in the case of
failover recovery time. Right now the cluster uses ha-all mode for rabbitmq
queues.

It is best to change this to "ha-exactly" mode and reduce the number
of queue copies to ceil(N/2) where N is number of controllers in the
cluster - so in typical scenario of 3 controller It would be 2 by
default.

It does not make much sense to keep the copies of queues over whole
cluster since if the quorum of nodes is lost then the rest of cluster
nodes will be stopped anyway. We let the user override this with a
parameter.

I.e. for a 3 node controlplane cluster we will go from this:
pcs resource show rabbitmq
Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}"

To this:
pcs resource show rabbitmq
Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}"

According to Marin Krcmarik's testing recovery time from failure was
reduced significantly.

Partial-Bug: #1628998
Change-Id: Iace6daf27a76cb8ef1050ada0de7ff1f530916c6