Jenkins [Fri, 7 Oct 2016 00:26:06 +0000 (00:26 +0000)]
Merge "Specify the Ceph packages to be installed"
Jenkins [Thu, 6 Oct 2016 23:24:42 +0000 (23:24 +0000)]
Merge "Add Select per-network hostnames for service_node_names to role.role.j2.yaml"
John Fulton [Wed, 5 Oct 2016 03:29:25 +0000 (23:29 -0400)]
Specify the Ceph packages to be installed
The puppet-ceph module defaults to 'ceph' but that is a metapacakge
which isn't provided in all repos.
Depends-On: I13462219522386f8740b0d70916a44f3474115e4
Change-Id: Ie55d22301dd22102d471e6002dfcaad4bfadd5f6
Related-Bug:
1629933
Emilien Macchi [Thu, 6 Oct 2016 15:18:14 +0000 (11:18 -0400)]
Enable firewalling by default on compute nodes
- Move VXLAN and VRRP rules from Neutron Server to the right services.
- Enable Firewall by default on Compute nodes.
Change-Id: I99d172dcedaf6be297aad184cc51fe9f292a57e1
Dan Prince [Tue, 4 Oct 2016 13:59:56 +0000 (09:59 -0400)]
Re-enable ManageFirewall by default.
This default setting got lots in the composable roles/services patches.
Re-enable the ManageFirewall setting by default per what we did in
git commit
73c76b867ddc8a23a30b9a3cac4031189d4178c6.
We also fix a typo in neutron-api.yaml so that the firewall rules
matches to service_name. (otherwise it won't get loaded).
Also, drops the environments/manage-firewall.yaml which is
no longer needed if we enable firewall management by default.
Change-Id: Ie198e4efd190131d0722085b10ef77da9005bc1b
Closes-bug:
1629934
Carlos Camacho [Wed, 5 Oct 2016 09:29:59 +0000 (11:29 +0200)]
Add Select per-network hostnames for service_node_names to role.role.j2.yaml
This will wire up the per-network hostnames in the generic role.
Needs to land after https://review.openstack.org/#/c/378764
Partial-Bug: #
1626976
Change-Id: I595f35cce03d9f416a1768aa5c349a1bb20b0e19
Jenkins [Thu, 6 Oct 2016 12:34:31 +0000 (12:34 +0000)]
Merge "restore missing fluentd client functionality"
Jenkins [Thu, 6 Oct 2016 12:34:24 +0000 (12:34 +0000)]
Merge "Add generic template for custom roles."
Jenkins [Thu, 6 Oct 2016 12:31:49 +0000 (12:31 +0000)]
Merge "Set proper ceph config path for manila"
Jenkins [Thu, 6 Oct 2016 11:56:02 +0000 (11:56 +0000)]
Merge "Select per-network hostnames for service_node_names"
Jenkins [Thu, 6 Oct 2016 09:26:29 +0000 (09:26 +0000)]
Merge "Fix OpendaylightApiNetwork key naming"
Carlos Camacho [Tue, 4 Oct 2016 09:50:33 +0000 (11:50 +0200)]
Add generic template for custom roles.
This submission creates a generic template
file to deploy custom roles.
Also adds a file to specify an exclusion role
list in order to avoid not to generate the
template for those roles.
Partial-Bug: #
1626976
Depends-On: I6d7247bbb8702eb0ab9bdf133b5ab1c6e8349d98
Change-Id: I3e11c089023b793a5063d9e1714527a3fe2b7458
Tom Barron [Wed, 5 Oct 2016 21:55:09 +0000 (17:55 -0400)]
Set proper ceph config path for manila
When deploying manila with cephfs backend,
/etc/manila/manila.conf should define
cephfs_conf_path = /etc/ceph/ceph.conf
in the cephfs native backend since this is
the conventional path that ceph operators expect
and since we document that path upstream.
Change-Id: I4abf5c33b675b1102413a84d64f4ce23b07b4485
Closes-Bug:
1630777
Jenkins [Wed, 5 Oct 2016 21:47:47 +0000 (21:47 +0000)]
Merge "Open tripleo-heat-templates for Ocata"
Jenkins [Wed, 5 Oct 2016 18:01:20 +0000 (18:01 +0000)]
Merge "Adds Environment File for Removing Sahara during M/N upgrade"
Lars Kellogg-Stedman [Wed, 5 Oct 2016 13:28:59 +0000 (09:28 -0400)]
restore missing fluentd client functionality
in the great rebase following the JINJA ALL THE THINGS changes we lost
critical functionality in the fluentd client service. This review
restores the missing features.
Change-Id: I7c23f16f81e75f3da6a24587b2eb8385b3e920a4
Closes-bug:
1630692
Steven Hardy [Wed, 5 Oct 2016 14:53:16 +0000 (15:53 +0100)]
Fix OpendaylightApiNetwork key naming
This captialization won't work with the CamelCase to snake_case
conversion we do, as the required name is opendaylight_api_network
Adds some clarification to the ServiceNetMap description to hopefully
avoid future confusion.
Change-Id: Ife04ee2185e81009ebef55ad521aef799251e002
Closes-Bug: #
1629408
Jenkins [Wed, 5 Oct 2016 14:30:53 +0000 (14:30 +0000)]
Merge "Fixing resources path in OpenDaylight"
John Trowbridge [Wed, 5 Oct 2016 14:05:58 +0000 (10:05 -0400)]
Open tripleo-heat-templates for Ocata
To avoid pushing an artificial alpha tag, following PBR semver keyword
bumps major version. See http://docs.openstack.org/developer/pbr/#version
Change-Id: Ic47869c96217269806daac9c3c888603e4e5d00a
Sem-Ver: api-break
marios [Fri, 23 Sep 2016 14:19:07 +0000 (17:19 +0300)]
Adds Environment File for Removing Sahara during M/N upgrade
The default path if the operator does nothing is to keep the
sahara services on mitaka to newton upgrades.
If the operator wishes to remove sahara services then they
need to specify the provided major-upgrade-remove-sahara.yaml
environment file in the stack upgrade commands.
The existing migration to ha arch already removes the constraints
and pcs resource for sahara api/engine so we just need to stop
it from starting again if we want to remove it.
This adds a KeepSaharaServiceOnUpgrade parameter to determine if
Sahara is disabled from starting up after the controllers are
upgraded (defaults true).
Finally it is worth noting that we default the sahara services
as 'on' during converge here in the resource_registry of the
converge environment file; any subsequent stack updates where
the deployment contains sahara services will need to
include the -e /environments/services/sahara.yaml environment
file.
Related-Bug:
1630247
Change-Id: I59536cae3260e3df52589289b4f63e9ea0129407
Steven Hardy [Wed, 28 Sep 2016 15:19:56 +0000 (16:19 +0100)]
Select per-network hostnames for service_node_names
Co-Authored-By: Juan Antonio Osorio Robles <jaosorior@redhat.com>
Depends-On: Ic6fec1057439ed9122d44ef294be890d3ff8a8ee
Change-Id: I754c4a41d8a294a4c7c18bd282ae014efd4b9b16
Closes-Bug: #
1628521
Steven Hardy [Tue, 4 Oct 2016 14:52:19 +0000 (15:52 +0100)]
j2 template per-role ServiceNetMapDefaults
The *HostnameResolveNetwork should default to a sane value
for all roles, including those specified by the user.
We choose internal_api by default (maintaining the existing
special-case for the CephStorage role which uses the storage
network), but users can of course override the default with
a network of their choice.
Change-Id: Ib240f56c1db5842b953fa510316e75fd53f24735
Closes-Bug: #
1629827
Jenkins [Wed, 5 Oct 2016 03:06:17 +0000 (03:06 +0000)]
Merge "Move the main template files for defalut services to new syntax generation"
Jenkins [Tue, 4 Oct 2016 21:40:13 +0000 (21:40 +0000)]
Merge "j2 template role config templates"
Carlos Camacho [Tue, 4 Oct 2016 16:28:39 +0000 (18:28 +0200)]
Move the main template files for defalut services to new syntax generation
When generating these templates, we should
create them with the "-role" appended as they will
be generated from a role.role.j2.yaml file.
i.e. role.role.j2.yaml will generate <service>-role.yaml
config.role.j2.yaml will generate <service>-config.yaml
Partial-Bug: #
1626976
Change-Id: I614dc462fd7fc088b67634d489d8e7b68e7d4ab1
Dan Prince [Tue, 4 Oct 2016 14:04:44 +0000 (10:04 -0400)]
Include redis/mongo hiera when using pacemaker
This patch updates the pacemaker composable service templates for
mongo and redis to extend the proper base (redis.yaml and mongo.yaml)
templates instead of the -base.yaml versions. This was causing
some missing hiera settings for these services which caused symptoms
like missing firewall rules for these services.
Change-Id: I3f94acbf4d1baadbb151b1c4d34b4a0ab28ad5e5
Partial-bug: #
1629934
Jenkins [Tue, 4 Oct 2016 11:00:47 +0000 (11:00 +0000)]
Merge "Use netapp_host_type instead of netapp_eseries_host_type"
Jenkins [Tue, 4 Oct 2016 05:18:43 +0000 (05:18 +0000)]
Merge "Make keystone api network hiera composable"
Jenkins [Tue, 4 Oct 2016 03:01:11 +0000 (03:01 +0000)]
Merge "Set ceph osd max object name and namespace len on upgrade when on ext4"
Jenkins [Mon, 3 Oct 2016 22:19:31 +0000 (22:19 +0000)]
Merge "reload HAProxy config in HA setups when certificate is updated"
Jenkins [Mon, 3 Oct 2016 18:17:47 +0000 (18:17 +0000)]
Merge "Update $service to $resource this variable does not exist in the context"
Jenkins [Mon, 3 Oct 2016 16:40:41 +0000 (16:40 +0000)]
Merge "Cinder volume service is not managed by Pacemaker on BlockStorage"
Jenkins [Mon, 3 Oct 2016 16:40:06 +0000 (16:40 +0000)]
Merge "Change the rabbitmq ha policies during an M/N Upgrade"
Mathieu Bultel [Mon, 3 Oct 2016 14:02:34 +0000 (16:02 +0200)]
Update $service to $resource this variable does not exist in the context
heat failed due to a:
service: unbound variable
In the context $service is never set.
Change-Id: If82ee4562612f2617b676732956396278ee40a88
Closes-Bug: #
1629903
Juan Antonio Osorio Robles [Mon, 3 Oct 2016 13:56:21 +0000 (16:56 +0300)]
reload HAProxy config in HA setups when certificate is updated
When updating a certificate for HAProxy, we only do a reload of the
configuration on non-HA setups. This means that if we try the same in
an HA setup, the cloud will still serve the old certificate and that
leads to several issues, such as serving a revoked or even a
compromised certificate for some time, or just SSL issues that the
certificate doesn't match. This enables a reload for HA cases too.
Change-Id: Ib8ca2fe91be345ef4324fc8265c45df8108add7a
Closes-Bug: #
1629886
Jenkins [Mon, 3 Oct 2016 09:50:29 +0000 (09:50 +0000)]
Merge "Fixed NoneType issue when monitoring-environment.yaml"
Jenkins [Mon, 3 Oct 2016 09:50:23 +0000 (09:50 +0000)]
Merge "Balance Rabbitmq Queue Master Location on queue declaration with min-masters strategy"
Jenkins [Mon, 3 Oct 2016 09:50:16 +0000 (09:50 +0000)]
Merge "Change rabbitmq queues HA mode from ha-all to ha-exactly"
Michele Baldessari [Sat, 1 Oct 2016 15:42:54 +0000 (17:42 +0200)]
Change the rabbitmq ha policies during an M/N Upgrade
This takes care of the M->N upgrade path when changing
the ha rabbitmq policy.
Partial-Bug: #
1628998
Change-Id: I2468a096b5d7042bc801a742a7a85fb1521c1c02
Jenkins [Mon, 3 Oct 2016 06:35:06 +0000 (06:35 +0000)]
Merge "Fixed NoneType issue when logging-environment.yaml is used"
Michele Baldessari [Thu, 29 Sep 2016 16:30:23 +0000 (18:30 +0200)]
Change rabbitmq queues HA mode from ha-all to ha-exactly
It turns out that reducing number of rabbitmq queues in cluster
significantly improves performance of cluster especially in the case of
failover recovery time. Right now the cluster uses ha-all mode for rabbitmq
queues.
It is best to change this to "ha-exactly" mode and reduce the number
of queue copies to ceil(N/2) where N is number of controllers in the
cluster - so in typical scenario of 3 controller It would be 2 by
default.
It does not make much sense to keep the copies of queues over whole
cluster since if the quorum of nodes is lost then the rest of cluster
nodes will be stopped anyway. We let the user override this with a
parameter.
I.e. for a 3 node controlplane cluster we will go from this:
pcs resource show rabbitmq
Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}"
To this:
pcs resource show rabbitmq
Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}"
According to Marin Krcmarik's testing recovery time from failure was
reduced significantly.
Partial-Bug: #
1628998
Change-Id: Iace6daf27a76cb8ef1050ada0de7ff1f530916c6
Jenkins [Fri, 30 Sep 2016 18:54:57 +0000 (18:54 +0000)]
Merge "telemetry: remove coordination_url hiera settings"
Jenkins [Fri, 30 Sep 2016 18:52:04 +0000 (18:52 +0000)]
Merge "Telemetry: add redis_password hiera parameter"
Jenkins [Fri, 30 Sep 2016 17:44:22 +0000 (17:44 +0000)]
Merge "Replace per role manifests with a common role manifest"
Steven Hardy [Fri, 30 Sep 2016 14:23:26 +0000 (15:23 +0100)]
Make keystone api network hiera composable
These hard-coded references to the Controller role mean that
things won't work if the keystone service is moved to any other
role, so we need to generate the lists dynamically based on the
enabled services for each role.
Change-Id: I5f1250a8a1a38cb3909feeb7d4c1000fd0fabd14
Closes-Bug: #
1629096
Steven Hardy [Wed, 28 Sep 2016 16:03:42 +0000 (17:03 +0100)]
j2 template role config templates
This means the user won't have to manually specify e.g the
OS::TripleO::ACustomRoleConfig resource manually.
Partial-Bug:
1626976
Change-Id: I063571d4c5cbc2f295a7a044d81c27d703bd0e10
Depends-On: I9f920e191344040a564214f3f9a1147b265e9ff3
Steven Hardy [Fri, 23 Sep 2016 14:39:33 +0000 (15:39 +0100)]
Replace per role manifests with a common role manifest
This removes the (nearly empty) per role manifests, and
replaces them with a generic manifest, where we use str_replace
to substitute the role name at runtime (or in some cases a
subset of the name for backwards compatibility)
Change-Id: I79da0f523189959b783bbcbb3b0f37be778e02fe
Partial-Bug: #
1626976
Emilien Macchi [Fri, 30 Sep 2016 13:48:56 +0000 (09:48 -0400)]
telemetry: remove coordination_url hiera settings
They are now normalized and set in puppet-tripleo.
Change-Id: I197481c577b85894178e7899a55869da47847755
Closes-Bug: #
1629279
Depends-On: Ic6de09acf0d36ca90cc2041c0add1bc2b4a369a5
Emilien Macchi [Fri, 30 Sep 2016 13:28:06 +0000 (09:28 -0400)]
Telemetry: add redis_password hiera parameter
Add redis_password parameter in Hiera so we can re-use it from
puppet-tripleo later for Aodh, Ceilometer and Gnocchi.
Change-Id: I038e2bac22e3bfa5047d2e76e23cff664546464d
Partial-Bug: #
1629279
Juan Badia Payno [Fri, 30 Sep 2016 08:25:44 +0000 (10:25 +0200)]
Fixed NoneType issue when monitoring-environment.yaml
When you tried to use the environemnt/monitoring-environment.yaml
as a part of the deployment on the overcloud you hit the
following error and it stops the deploy of the overcloud.
***
Deploying templates in the directory /home/stack/tripleo-heat-templates
'NoneType' object does not support item assignment
***
Closes-Bug: #
1629323
Change-Id: I8cf2e7d8f3a4e79cc71a1566ec17d0a977c38d60
Signed-off-by: Juan Badia Payno <jbadiapa@redhat.com>
Juan Badia Payno [Fri, 30 Sep 2016 08:13:29 +0000 (10:13 +0200)]
Fixed NoneType issue when logging-environment.yaml is used
When you tried to use the environemnt/logging-environemnt.yaml
as a part of the deployment on the overcloud you hit the
following error and it stops the deploy of the overcloud.
***
Deploying templates in the directory /home/stack/tripleo-heat-templates
'NoneType' object does not support item assignment
***
Closes-Bug: #
1629315
Change-Id: I55e5c7f20ddf30f3e48247b734f6fa47f5de3750
Signed-off-by: Juan Badia Payno <jbadiapa@redhat.com>
Jenkins [Fri, 30 Sep 2016 12:48:04 +0000 (12:48 +0000)]
Merge "Add option to specify Certmonger CA"
Jenkins [Fri, 30 Sep 2016 07:48:29 +0000 (07:48 +0000)]
Merge "Move the rest of static roles resource registry entries to j2"
Jenkins [Thu, 29 Sep 2016 23:58:01 +0000 (23:58 +0000)]
Merge "Use -L with chown and set crush map tunables when upgrading Ceph"
Jenkins [Thu, 29 Sep 2016 23:57:43 +0000 (23:57 +0000)]
Merge "Fix typo in fixing gnocchi upgrade."
Jenkins [Thu, 29 Sep 2016 23:52:33 +0000 (23:52 +0000)]
Merge "Add gateway_ip in OS::Neutron::Subnet"
Juan Antonio Osorio Robles [Mon, 12 Sep 2016 08:42:02 +0000 (11:42 +0300)]
Add option to specify Certmonger CA
This will be used for internal (or even public) TLS, for when
certmonger is generating the certificates. This same setting is used
for the undercloud with the generate_service_certificate option.
Change-Id: Ic54fe512b9ed5c71417a66491b7954e653f660b6
Michele Baldessari [Thu, 29 Sep 2016 16:50:39 +0000 (18:50 +0200)]
Balance Rabbitmq Queue Master Location on queue declaration with min-masters strategy
It may happen that one of the controllers may become unavailable and
Queue Masters will be located on available controllers during queue
declarations. Once a lost controller will be become available masters of
newly declared queues are not placed with priority to such controller
with obviously lower number of queue masters and thus the distribution
may be unbalanced and one of the controllers may become under
significantly higher load in some circumstances of multiple fail-overs.
With rabbit 3.6.0 rabbitmq introduced a new HA feature of Queue masters
distribution - one of the strategies is min-masters, which picks the
node hosting the minimum number of masters.
One of the ways how to turn such min-masters strategy on is by adding
following into configuration file - rabbitmq.config
{rabbit,[ ..
{queue_master_locator, <<"min-masters">>},
.. ]},
Change-Id: I61bcab0e93027282b62f2a97bec87cbb0a6e6551
Closes-Bug: #
1629010
Giulio Fidente [Thu, 29 Sep 2016 11:52:32 +0000 (13:52 +0200)]
Set ceph osd max object name and namespace len on upgrade when on ext4
As per [1] we need to lower osd max object name and namespace len when
upgrading from Hammer and the OSD is backed by ext4.
These could also be given via ExtraConfig but on upgrade we only run
puppet apply after this script is executed, so the values won't be
effective unless the daemon is restarted. Yet we do not want puppet
to restart the daemon because we can't bring all OSDs down
unconditionally or guests will die.
1. http://tracker.ceph.com/issues/16187
Co-Authored-By: Michele Baldessari <michele@acksyn.org>
Co-Authored-By: Dimitri Savineau <dsavinea@redhat.com>
Change-Id: I7fec4e2426bdacd5f364adbebd42ab23dcfa523a
Closes-Bug:
1628874
Giulio Fidente [Thu, 29 Sep 2016 12:05:46 +0000 (14:05 +0200)]
Cinder volume service is not managed by Pacemaker on BlockStorage
We do not want cinder-volume to be managed by Pacemaker on
BlockStorage nodes, where Pacemaker is not running at all.
This change adds a new BlockStorageCinderVolume service name
which can (and is, by default) mapped to the non Pacemaker
implementation of the service.
The error was:
Could not find dependency Exec[wait-for-settle] for
Pacemaker::Resource::Systemd[openstack-cinder-volume]
Also moves cinder::host setting into the Pacemaker specific service
definition because we only want to set a shared host= string when
the service is managed by Pacemaker.
Closes-Bug: #
1628912
Change-Id: I2f7e82db4fdfd5f161e44d65d17893c3e19a89c9
Carlos Camacho [Thu, 29 Sep 2016 12:57:36 +0000 (14:57 +0200)]
Move the rest of static roles resource registry entries to j2
Moving the rest of the static based resource registry
entries to j2, this allows to extend the content of the
template to the roles_list.
Also moved the templates to correspond with the role name.
Partial-Bug: #
1626976
Change-Id: I1cbe101eb4ce5a89cba5f2cc45cace43d3380f22
Jenkins [Thu, 29 Sep 2016 14:56:56 +0000 (14:56 +0000)]
Merge "j2 template per-role things in default registry"
Jenkins [Thu, 29 Sep 2016 14:56:49 +0000 (14:56 +0000)]
Merge "Relax pre-upgrade check for failed actions"
Jenkins [Thu, 29 Sep 2016 14:56:41 +0000 (14:56 +0000)]
Merge "Fix races in major-upgrade-pacemaker Step2"
Sofer Athlan-Guyot [Thu, 29 Sep 2016 13:22:16 +0000 (15:22 +0200)]
Fix typo in fixing gnocchi upgrade.
Change-Id: I44451a280dd928cd694dd6845d5d83040ad1f482
Related-Bug: #
1626592
Jenkins [Thu, 29 Sep 2016 13:08:35 +0000 (13:08 +0000)]
Merge "Full HA->HA NG migration might fail setting maintenance-mode"
Jenkins [Thu, 29 Sep 2016 13:03:21 +0000 (13:03 +0000)]
Merge "Update gnocchi database during M/N upgrade."
Giulio Fidente [Wed, 28 Sep 2016 13:05:14 +0000 (15:05 +0200)]
Use -L with chown and set crush map tunables when upgrading Ceph
Previously the chown command wasn't traversing symlinks, causing
the new ownership to not be set for some needed files.
This change also ensures the crush map tunables are set to the 'default'
profile after the upgrade.
Finally redirects the output of a pidof to /dev/null to avoid spurious
logging.
Change-Id: Id4865ffff207edfc727d729f9cc04e6e81ad19d8
Jenkins [Thu, 29 Sep 2016 10:12:44 +0000 (10:12 +0000)]
Merge "Move db::mysql into service_config_settings"
Steven Hardy [Fri, 23 Sep 2016 12:51:45 +0000 (13:51 +0100)]
j2 template per-role things in default registry
The default resource-registry file contains a bunch of per-role
things which mean you need to cut/paste into a custom environment
file for custom roles, even if you only want the defaults like the
built-in roles. Using j2 we can template these just like in the
overcloud.j2.yaml and other files.
Change-Id: I52a9bffd043ca8fb0f05077c8a401a68def82926
Partial-Bug: #
1626976
Giulio Fidente [Wed, 31 Aug 2016 21:32:40 +0000 (23:32 +0200)]
Use netapp_host_type instead of netapp_eseries_host_type
This patch deprecates netapp_eseries_host_type in favor of netapp_host_type.
Change-Id: I113c770ca2e4dc54526d4262bacae48e223c54f4
Closes-Bug:
1579161
Michele Baldessari [Wed, 28 Sep 2016 20:55:25 +0000 (22:55 +0200)]
Relax pre-upgrade check for failed actions
Before this change we checked the cluster for any failed actions and
we stopped the upgrade process if there were any.
This is likely eccessive as a failed action could have happened in the
past and the cluster is now fully functional.
Better to check if any of the resources are in Stopped state and break
the upgrade process if any of them are.
We also need to restrict this check to the bootstrap node because
otherwise the following might happen:
1) Bootstrap node does the check, it is successful and it starts
the full HA -> HA NG migration which *will* create failed actions
and will start stopping resources
2) If the check now starts on a non-bootstrap node while 1) is ongoing,
it will find either failed actions or stopped resources so it will
fail.
Change-Id: Ib091f6dd8884025d2e23bf2fa700169e2dec778f
Closes-Bug: #
1628653
Michele Baldessari [Tue, 27 Sep 2016 16:18:33 +0000 (18:18 +0200)]
Fix races in major-upgrade-pacemaker Step2
tripleo-heat-templates/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh
has the following code:
...
check_resource mongod started 600
if [[ -n $(is_bootstrap_node) ]]; then
...
tstart=$(date +%s)
while ! clustercheck; do
sleep 5
tnow=$(date +%s)
if (( tnow-tstart > galera_sync_timeout )) ; then
echo_error "ERROR galera sync timed out"
exit 1
fi
done
# Run all the db syncs
cinder-manage db sync
...
fi
start_or_enable_service rabbitmq
check_resource rabbitmq started 600
start_or_enable_service redis
check_resource redis started 600
start_or_enable_service openstack-cinder-volume
check_resource openstack-cinder-volume started 600
systemctl_swift start
for service in $(services_to_migrate); do
manage_systemd_service start "${service%%-clone}"
check_resource_systemd "${service%%-clone}" started 600
done
"""
The problem with the above code is that it is open to the following race
condition:
1) Bootstrap node is busy checking the galera status via cluster check
2) Non-bootstrap node has already reached: start_or_enable_service
rabbitmq and later lines. These lines will be skipped because
start_or_enable_service is a noop on non-bootstrap nodes and
check_resource rabbitmq only checks that pcs status |grep rabbitmq
returns true.
3) Non-bootstrap node can then reach the manage_systemd_service start
and it will fail with stuff like:
"Job for openstack-nova-scheduler.service failed because the control
process exited with error code. See \"systemctl status
openstack-nova-scheduler.service\" and \"journalctl -xe\" for
details.\n" (because the db tables are not migrated yet)
This happens because 3) was started on non-bootstrap nodes before the
db-sync statements are complete on the bootstrap node. I did not feel
like changing the semantics of check_resource and remove the noop on
non-bootstrap nodes as other parts of the tree might rely on this
behaviour.
Depends-On: Ia016264b51f485b97fa150ebd357b109581342ed
Change-Id: I663313e183bb05b35d0c5af016c2d1705c772bd9
Closes-Bug: #
1627965
Sofer Athlan-Guyot [Thu, 22 Sep 2016 14:41:16 +0000 (16:41 +0200)]
Update gnocchi database during M/N upgrade.
We call gnocchi-upgrade to make sure we update all the needed schemas
during the major-upgrade-pacemaker step.
We also make sure that redis is started before we call gnocchi-upgrade
otherwise the command will be stuck in a loop trying to contact redis.
Closes-Bug: #
1626592
Change-Id: Ia016264b51f485b97fa150ebd357b109581342ed
Jenkins [Wed, 28 Sep 2016 15:25:44 +0000 (15:25 +0000)]
Merge "Fix predictable placement indexing"
Dan Prince [Mon, 26 Sep 2016 17:52:46 +0000 (13:52 -0400)]
Move db::mysql into service_config_settings
This patch movs the various db::mysql hiera settings into a
'mysql' specific service_config_settings section for each
service so that these will only get applied on the MySQL service
node. This follows a similar puppet-tripleo change where we
create the actual databases for all services locally on
the MySQL service node to avoid permission issues.
Change-Id: Ic0692b1f7aa8409699630ef3924c4be98ca6ffb2
Closes-bug: #
1620595
Depends-On: I05cc0afa9373429a3197c194c3e8f784ae96de5f
Depends-On: I5e1ef2dc6de6f67d7c509e299855baec371f614d
Michele Baldessari [Wed, 28 Sep 2016 07:41:30 +0000 (09:41 +0200)]
Full HA->HA NG migration might fail setting maintenance-mode
Currently we do the following in the migration path:
pcs property set maintenance-mode=true
if ! timeout -k 10 300 crm_resource --wait; then
echo_error "ERROR: cluster remained unstable after setting maintenance-mode for more than 300 seconds, exiting."
exit 1
fi
crm_resource --wait can actually take forever under certain conditions.
The property will be set atomically across the cluster nodes so we should be good
without this.
Change-Id: I8f531d63479b81d65b572c4431c9db6f610f7e04
Closes-Bug: #
1628393
Michele Baldessari [Wed, 28 Sep 2016 10:19:10 +0000 (12:19 +0200)]
Fix "Not all flavors have been migrated to the API database"
After a successful upgrade to Newton, I ran the tripleo.sh
--overcloud-pingtest and it failed with the following:
resources.test_flavor: Not all flavors have been migrated to the API database (HTTP 409)
The issue is the fact that some tables have migrated to the
nova_api db and we need to migrate the data as well.
Currently we do:
nova-manage db sync
nova-manage api_db sync
We want to add:
nova-manage db online_data_migrations
After launching this command the overcloud-pingtest works correctly:
tripleo.sh -- Overcloud pingtest SUCCEEDED
Change-Id: Id2d5b28b5d4ade7dff6c5e760be0f509b4fe5096
Closes-Bug: #
1628450
Jenkins [Wed, 28 Sep 2016 07:26:39 +0000 (07:26 +0000)]
Merge "Deprecate the NeutronL3HA parameter"
Marius Cornea [Tue, 27 Sep 2016 14:08:27 +0000 (16:08 +0200)]
Fix NTP servers hieradata
This patch enables correctly setting the NTP server passed via
--ntp-server in the overcloud nodes' /etc/ntp.conf.
Change-Id: Iff644b9da51fb8cd1946ad9d297ba0e94d3d782b
Jenkins [Tue, 27 Sep 2016 08:50:49 +0000 (08:50 +0000)]
Merge "Remove deprecated scheduler_driver settings"
Jenkins [Tue, 27 Sep 2016 08:24:24 +0000 (08:24 +0000)]
Merge "Add metricd workers support in gnocchi"
Jenkins [Tue, 27 Sep 2016 08:23:31 +0000 (08:23 +0000)]
Merge "Use parameter name to configure gmcast_listen_addr"
Jenkins [Tue, 27 Sep 2016 06:50:47 +0000 (06:50 +0000)]
Merge "Set manila::keystone::auth::tenant"
Jenkins [Tue, 27 Sep 2016 06:50:12 +0000 (06:50 +0000)]
Merge "Disable openstack-cinder-volume in step1 and reenable it in step2"
Jenkins [Tue, 27 Sep 2016 05:57:14 +0000 (05:57 +0000)]
Merge "Activate StorageMgmtPort on computes in HCI environment"
Tom Barron [Tue, 27 Sep 2016 03:02:23 +0000 (23:02 -0400)]
Set manila::keystone::auth::tenant
Without setting this parameter, overcloud deploy fails and
'openstack stack failures list overcloud' reveals the
following error:
Error: Puppet::Type::Keystone_user_role::ProviderOpenstack: Could
not find project with name [services] and domain [Default]
Error:
/Stage[main]/Manila::Keystone::Auth/Keystone::Resource::Service_identity[manilav2]/Keystone_user_role[manilav2@services]:
Could not evaluate: undefined method `[]' for nil:NilClass
When we set manila::keystone::auth::tenant to 'service', analogous
to cinder, nova, etc., the overcloud deploy completes successfully.
Change-Id: I996ac2ff602c632a9f9ea9c293472a6f2f92fd72
Jenkins [Tue, 27 Sep 2016 02:24:00 +0000 (02:24 +0000)]
Merge "Add FixedIPs parameter to from_service.yaml"
Jenkins [Tue, 27 Sep 2016 02:04:27 +0000 (02:04 +0000)]
Merge "Fix ignore warning on ceph major upgrade."
Jenkins [Tue, 27 Sep 2016 01:11:53 +0000 (01:11 +0000)]
Merge "Add integration with Manila CephFS Native driver"
Jenkins [Tue, 27 Sep 2016 01:11:46 +0000 (01:11 +0000)]
Merge "A few major-upgrade issues"
Jenkins [Tue, 27 Sep 2016 01:11:39 +0000 (01:11 +0000)]
Merge "Start mongod before calling ceilometer-dbsync"
Jenkins [Tue, 27 Sep 2016 01:11:32 +0000 (01:11 +0000)]
Merge "Reinstantiate parts of code that were accidentally removed"
Jenkins [Tue, 27 Sep 2016 00:13:37 +0000 (00:13 +0000)]
Merge "Neutron metadata agent worker count fix"
Jenkins [Tue, 27 Sep 2016 00:11:07 +0000 (00:11 +0000)]
Merge "Remove double definition of config_settings key in keystone"
Ben Nemec [Mon, 26 Sep 2016 21:40:20 +0000 (16:40 -0500)]
Fix predictable placement indexing
As noted in the bug, predictable placement is broken right now
because the %index% in the scheduler hint isn't being interpolated.
This is because the parameter was moved from overcloud.yaml to the
service-specific files, which doesn't provide the index value.
Because the Compute role's parameter is named NovaCompute... we also
have to include some backwards compatibility logic to handle the
mismatch.
Change-Id: Ibee2949fe4c6c707203d7250e2ce169c769b1dcd
Closes-Bug:
1627858
Sofer Athlan-Guyot [Mon, 26 Sep 2016 13:36:29 +0000 (15:36 +0200)]
Fix ignore warning on ceph major upgrade.
The paramater IgnoreCephUpgradeWarnings is type cast into a boolean
which is rendered as 'True' or 'False' as a string not 'true' or
'false'. This fix the check.
Change-Id: I8840c384d07f9d185a72bde5f91a3872a321f623
Closes-Bug:
1627736
Jenkins [Mon, 26 Sep 2016 13:53:42 +0000 (13:53 +0000)]
Merge "Bind MySQL address to hostname appropriate to its network"
Juan Antonio Osorio Robles [Mon, 26 Sep 2016 12:59:58 +0000 (15:59 +0300)]
Use parameter name to configure gmcast_listen_addr
This used to used mysql_bind_ip, but this parameter is quite misleading
since what it actually configures is not the bind-ip itself, but the
gmcast.listen_addr parameter. This fixes that confusion.
Depends-On: Iea4bd67074824e5dc6732fd7e408743e693d80b3
Change-Id: I2b114600e622491ccff08a07946926734b50ac70
Juan Antonio Osorio Robles [Mon, 26 Sep 2016 11:10:39 +0000 (14:10 +0300)]
Remove double definition of config_settings key in keystone
Change-Id: I291bfb1e5736864ea504cd82eea1d4001fcdd931