This might prevent dropping members from corosync cluster on high load
environments. Symptoms of this problem happening can sometimes be found
in corosync log:
dub 05 17:23:45 overcloud-controller-0 corosync[14152]: [MAIN ] Corosync
main process was not scheduled for 3691.8391 ms (threshold is 1320.0000
ms). Consider token timeout increase.
The default in the Puppet manifest is 1 second, which matches the
corosync default, and we override it with hiera to 10 seconds.
Change-Id: I5ea850ada657e5eecafa3e8b28613a0ac48e78f3
pacemaker::corosync::manage_fw: false
pacemaker::resource_defaults::defaults:
resource-stickiness: { value: INFINITY }
+corosync_token_timeout: 10000
# horizon
horizon::cache_backend: django.core.cache.backends.memcached.MemcachedCache
$pacemaker_cluster_members = downcase(regsubst(hiera('controller_node_names'), ',', ' ', 'G'))
$corosync_ipv6 = str2bool(hiera('corosync_ipv6', false))
if $corosync_ipv6 {
- $cluster_setup_extras = { '--ipv6' => '' }
+ $cluster_setup_extras = { '--token' => hiera('corosync_token_timeout', 1000), '--ipv6' => '' }
} else {
- $cluster_setup_extras = {}
+ $cluster_setup_extras = { '--token' => hiera('corosync_token_timeout', 1000) }
}
class { '::pacemaker':
hacluster_pwd => hiera('hacluster_pwd'),