apex-tripleo-heat-templates.git
8 years agoj2 template per-role ServiceNetMapDefaults
Steven Hardy [Tue, 4 Oct 2016 14:52:19 +0000 (15:52 +0100)]
j2 template per-role ServiceNetMapDefaults

The *HostnameResolveNetwork should default to a sane value
for all roles, including those specified by the user.

We choose internal_api by default (maintaining the existing
special-case for the CephStorage role which uses the storage
network), but users can of course override the default with
a network of their choice.

Change-Id: Ib240f56c1db5842b953fa510316e75fd53f24735
Closes-Bug: #1629827

8 years agoMerge "Move the main template files for defalut services to new syntax generation"
Jenkins [Wed, 5 Oct 2016 03:06:17 +0000 (03:06 +0000)]
Merge "Move the main template files for defalut services to new syntax generation"

8 years agoMerge "j2 template role config templates"
Jenkins [Tue, 4 Oct 2016 21:40:13 +0000 (21:40 +0000)]
Merge "j2 template role config templates"

8 years agoMove the main template files for defalut services to new syntax generation
Carlos Camacho [Tue, 4 Oct 2016 16:28:39 +0000 (18:28 +0200)]
Move the main template files for defalut services to new syntax generation

When generating these templates, we should
create them with the "-role" appended as they will
be generated from a role.role.j2.yaml file.

i.e. role.role.j2.yaml will generate <service>-role.yaml
     config.role.j2.yaml will generate <service>-config.yaml

Partial-Bug: #1626976
Change-Id: I614dc462fd7fc088b67634d489d8e7b68e7d4ab1

8 years agoInclude redis/mongo hiera when using pacemaker
Dan Prince [Tue, 4 Oct 2016 14:04:44 +0000 (10:04 -0400)]
Include redis/mongo hiera when using pacemaker

This patch updates the pacemaker composable service templates for
mongo and redis to extend the proper base (redis.yaml and mongo.yaml)
templates instead of the -base.yaml versions. This was causing
some missing hiera settings for these services which caused symptoms
like missing firewall rules for these services.

Change-Id: I3f94acbf4d1baadbb151b1c4d34b4a0ab28ad5e5
Partial-bug: #1629934

8 years agoMerge "Use netapp_host_type instead of netapp_eseries_host_type"
Jenkins [Tue, 4 Oct 2016 11:00:47 +0000 (11:00 +0000)]
Merge "Use netapp_host_type instead of netapp_eseries_host_type"

8 years agoMerge "Make keystone api network hiera composable"
Jenkins [Tue, 4 Oct 2016 05:18:43 +0000 (05:18 +0000)]
Merge "Make keystone api network hiera composable"

8 years agoMerge "Set ceph osd max object name and namespace len on upgrade when on ext4"
Jenkins [Tue, 4 Oct 2016 03:01:11 +0000 (03:01 +0000)]
Merge "Set ceph osd max object name and namespace len on upgrade when on ext4"

8 years agoMerge "reload HAProxy config in HA setups when certificate is updated"
Jenkins [Mon, 3 Oct 2016 22:19:31 +0000 (22:19 +0000)]
Merge "reload HAProxy config in HA setups when certificate is updated"

8 years agoMerge "Update $service to $resource this variable does not exist in the context"
Jenkins [Mon, 3 Oct 2016 18:17:47 +0000 (18:17 +0000)]
Merge "Update $service to $resource this variable does not exist in the context"

8 years agoMerge "Cinder volume service is not managed by Pacemaker on BlockStorage"
Jenkins [Mon, 3 Oct 2016 16:40:41 +0000 (16:40 +0000)]
Merge "Cinder volume service is not managed by Pacemaker on BlockStorage"

8 years agoMerge "Change the rabbitmq ha policies during an M/N Upgrade"
Jenkins [Mon, 3 Oct 2016 16:40:06 +0000 (16:40 +0000)]
Merge "Change the rabbitmq ha policies during an M/N Upgrade"

8 years agoUpdate $service to $resource this variable does not exist in the context
Mathieu Bultel [Mon, 3 Oct 2016 14:02:34 +0000 (16:02 +0200)]
Update $service to $resource this variable does not exist in the context

heat failed due to a:
service: unbound variable
In the context $service is never set.

Change-Id: If82ee4562612f2617b676732956396278ee40a88
Closes-Bug: #1629903

8 years agoreload HAProxy config in HA setups when certificate is updated
Juan Antonio Osorio Robles [Mon, 3 Oct 2016 13:56:21 +0000 (16:56 +0300)]
reload HAProxy config in HA setups when certificate is updated

When updating a certificate for HAProxy, we only do a reload of the
configuration on non-HA setups. This means that if we try the same in
an HA setup, the cloud will still serve the old certificate and that
leads to several issues, such as serving a revoked or even a
compromised certificate for some time, or just SSL issues that the
certificate doesn't match. This enables a reload for HA cases too.

Change-Id: Ib8ca2fe91be345ef4324fc8265c45df8108add7a
Closes-Bug: #1629886

8 years agoMerge "Fixed NoneType issue when monitoring-environment.yaml"
Jenkins [Mon, 3 Oct 2016 09:50:29 +0000 (09:50 +0000)]
Merge "Fixed NoneType issue when monitoring-environment.yaml"

8 years agoMerge "Balance Rabbitmq Queue Master Location on queue declaration with min-masters...
Jenkins [Mon, 3 Oct 2016 09:50:23 +0000 (09:50 +0000)]
Merge "Balance Rabbitmq Queue Master Location on queue declaration with min-masters strategy"

8 years agoMerge "Change rabbitmq queues HA mode from ha-all to ha-exactly"
Jenkins [Mon, 3 Oct 2016 09:50:16 +0000 (09:50 +0000)]
Merge "Change rabbitmq queues HA mode from ha-all to ha-exactly"

8 years agoChange the rabbitmq ha policies during an M/N Upgrade
Michele Baldessari [Sat, 1 Oct 2016 15:42:54 +0000 (17:42 +0200)]
Change the rabbitmq ha policies during an M/N Upgrade

This takes care of the M->N upgrade path when changing
the ha rabbitmq policy.

Partial-Bug: #1628998

Change-Id: I2468a096b5d7042bc801a742a7a85fb1521c1c02

8 years agoMerge "Fixed NoneType issue when logging-environment.yaml is used"
Jenkins [Mon, 3 Oct 2016 06:35:06 +0000 (06:35 +0000)]
Merge "Fixed NoneType issue when logging-environment.yaml is used"

8 years agoChange rabbitmq queues HA mode from ha-all to ha-exactly
Michele Baldessari [Thu, 29 Sep 2016 16:30:23 +0000 (18:30 +0200)]
Change rabbitmq queues HA mode from ha-all to ha-exactly

It turns out that reducing number of rabbitmq queues in cluster
significantly improves performance of cluster especially in the case of
failover recovery time. Right now the cluster uses ha-all mode for rabbitmq
queues.

It is best to change this to "ha-exactly" mode and reduce the number
of queue copies to ceil(N/2) where N is number of controllers in the
cluster - so in typical scenario of 3 controller It would be 2 by
default.

It does not make much sense to keep the copies of queues over whole
cluster since if the quorum of nodes is lost then the rest of cluster
nodes will be stopped anyway. We let the user override this with a
parameter.

I.e. for a 3 node controlplane cluster we will go from this:
pcs resource show rabbitmq
 Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
  Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}"

To this:
pcs resource show rabbitmq
 Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
  Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}"

According to Marin Krcmarik's testing recovery time from failure was
reduced significantly.

Partial-Bug: #1628998
Change-Id: Iace6daf27a76cb8ef1050ada0de7ff1f530916c6

8 years agoMerge "telemetry: remove coordination_url hiera settings"
Jenkins [Fri, 30 Sep 2016 18:54:57 +0000 (18:54 +0000)]
Merge "telemetry: remove coordination_url hiera settings"

8 years agoMerge "Telemetry: add redis_password hiera parameter"
Jenkins [Fri, 30 Sep 2016 18:52:04 +0000 (18:52 +0000)]
Merge "Telemetry: add redis_password hiera parameter"

8 years agoMerge "Replace per role manifests with a common role manifest"
Jenkins [Fri, 30 Sep 2016 17:44:22 +0000 (17:44 +0000)]
Merge "Replace per role manifests with a common role manifest"

8 years agoMake keystone api network hiera composable
Steven Hardy [Fri, 30 Sep 2016 14:23:26 +0000 (15:23 +0100)]
Make keystone api network hiera composable

These hard-coded references to the Controller role mean that
things won't work if the keystone service is moved to any other
role, so we need to generate the lists dynamically based on the
enabled services for each role.

Change-Id: I5f1250a8a1a38cb3909feeb7d4c1000fd0fabd14
Closes-Bug: #1629096

8 years agoj2 template role config templates
Steven Hardy [Wed, 28 Sep 2016 16:03:42 +0000 (17:03 +0100)]
j2 template role config templates

This means the user won't have to manually specify e.g the
OS::TripleO::ACustomRoleConfig resource manually.

Partial-Bug: 1626976
Change-Id: I063571d4c5cbc2f295a7a044d81c27d703bd0e10
Depends-On: I9f920e191344040a564214f3f9a1147b265e9ff3

8 years agoReplace per role manifests with a common role manifest
Steven Hardy [Fri, 23 Sep 2016 14:39:33 +0000 (15:39 +0100)]
Replace per role manifests with a common role manifest

This removes the (nearly empty) per role manifests, and
replaces them with a generic manifest, where we use str_replace
to substitute the role name at runtime (or in some cases a
subset of the name for backwards compatibility)

Change-Id: I79da0f523189959b783bbcbb3b0f37be778e02fe
Partial-Bug: #1626976

8 years agotelemetry: remove coordination_url hiera settings
Emilien Macchi [Fri, 30 Sep 2016 13:48:56 +0000 (09:48 -0400)]
telemetry: remove coordination_url hiera settings

They are now normalized and set in puppet-tripleo.

Change-Id: I197481c577b85894178e7899a55869da47847755
Closes-Bug: #1629279
Depends-On: Ic6de09acf0d36ca90cc2041c0add1bc2b4a369a5

8 years agoTelemetry: add redis_password hiera parameter
Emilien Macchi [Fri, 30 Sep 2016 13:28:06 +0000 (09:28 -0400)]
Telemetry: add redis_password hiera parameter

Add redis_password parameter in Hiera so we can re-use it from
puppet-tripleo later for Aodh, Ceilometer and Gnocchi.

Change-Id: I038e2bac22e3bfa5047d2e76e23cff664546464d
Partial-Bug: #1629279

8 years agoFixed NoneType issue when monitoring-environment.yaml
Juan Badia Payno [Fri, 30 Sep 2016 08:25:44 +0000 (10:25 +0200)]
Fixed NoneType issue when monitoring-environment.yaml

When you tried to use the environemnt/monitoring-environment.yaml
as a part of the deployment on the overcloud you hit the
following error and it stops the deploy of the overcloud.

***
Deploying templates in the directory /home/stack/tripleo-heat-templates
'NoneType' object does not support item assignment
***

Closes-Bug: #1629323
Change-Id: I8cf2e7d8f3a4e79cc71a1566ec17d0a977c38d60
Signed-off-by: Juan Badia Payno <jbadiapa@redhat.com>
8 years agoFixed NoneType issue when logging-environment.yaml is used
Juan Badia Payno [Fri, 30 Sep 2016 08:13:29 +0000 (10:13 +0200)]
Fixed NoneType issue when logging-environment.yaml is used

When you tried to use the environemnt/logging-environemnt.yaml
as a part of the deployment on the overcloud you hit the
following error and it stops the deploy of the overcloud.

***
Deploying templates in the directory /home/stack/tripleo-heat-templates
'NoneType' object does not support item assignment
***

Closes-Bug: #1629315
Change-Id: I55e5c7f20ddf30f3e48247b734f6fa47f5de3750
Signed-off-by: Juan Badia Payno <jbadiapa@redhat.com>
8 years agoMerge "Add option to specify Certmonger CA"
Jenkins [Fri, 30 Sep 2016 12:48:04 +0000 (12:48 +0000)]
Merge "Add option to specify Certmonger CA"

8 years agoMerge "Move the rest of static roles resource registry entries to j2"
Jenkins [Fri, 30 Sep 2016 07:48:29 +0000 (07:48 +0000)]
Merge "Move the rest of static roles resource registry entries to j2"

8 years agoMerge "Use -L with chown and set crush map tunables when upgrading Ceph"
Jenkins [Thu, 29 Sep 2016 23:58:01 +0000 (23:58 +0000)]
Merge "Use -L with chown and set crush map tunables when upgrading Ceph"

8 years agoMerge "Fix typo in fixing gnocchi upgrade."
Jenkins [Thu, 29 Sep 2016 23:57:43 +0000 (23:57 +0000)]
Merge "Fix typo in fixing gnocchi upgrade."

8 years agoMerge "Add gateway_ip in OS::Neutron::Subnet"
Jenkins [Thu, 29 Sep 2016 23:52:33 +0000 (23:52 +0000)]
Merge "Add gateway_ip in OS::Neutron::Subnet"

8 years agoAdd option to specify Certmonger CA
Juan Antonio Osorio Robles [Mon, 12 Sep 2016 08:42:02 +0000 (11:42 +0300)]
Add option to specify Certmonger CA

This will be used for internal (or even public) TLS, for when
certmonger is generating the certificates. This same setting is used
for the undercloud with the generate_service_certificate option.

Change-Id: Ic54fe512b9ed5c71417a66491b7954e653f660b6

8 years agoBalance Rabbitmq Queue Master Location on queue declaration with min-masters strategy
Michele Baldessari [Thu, 29 Sep 2016 16:50:39 +0000 (18:50 +0200)]
Balance Rabbitmq Queue Master Location on queue declaration with min-masters strategy

It may happen that one of the controllers may become unavailable and
Queue Masters will be located on available controllers during queue
declarations. Once a lost controller will be become available masters of
newly declared queues are not placed with priority to such controller
with obviously lower number of queue masters and thus the distribution
may be unbalanced and one of the controllers may become under
significantly higher load in some circumstances of multiple fail-overs.

With rabbit 3.6.0 rabbitmq introduced a new HA feature of Queue masters
distribution - one of the strategies is min-masters, which picks the
node hosting the minimum number of masters.

One of the ways how to turn such min-masters strategy on is by adding
following into configuration file - rabbitmq.config
{rabbit,[ ..
          {queue_master_locator, <<"min-masters">>},
          .. ]},

Change-Id: I61bcab0e93027282b62f2a97bec87cbb0a6e6551
Closes-Bug: #1629010

8 years agoSet ceph osd max object name and namespace len on upgrade when on ext4
Giulio Fidente [Thu, 29 Sep 2016 11:52:32 +0000 (13:52 +0200)]
Set ceph osd max object name and namespace len on upgrade when on ext4

As per [1] we need to lower osd max object name and namespace len when
upgrading from Hammer and the OSD is backed by ext4.

These could also be given via ExtraConfig but on upgrade we only run
puppet apply after this script is executed, so the values won't be
effective unless the daemon is restarted. Yet we do not want puppet
to restart the daemon because we can't bring all OSDs down
unconditionally or guests will die.

1. http://tracker.ceph.com/issues/16187

Co-Authored-By: Michele Baldessari <michele@acksyn.org>
Co-Authored-By: Dimitri Savineau <dsavinea@redhat.com>
Change-Id: I7fec4e2426bdacd5f364adbebd42ab23dcfa523a
Closes-Bug: 1628874

8 years agoCinder volume service is not managed by Pacemaker on BlockStorage
Giulio Fidente [Thu, 29 Sep 2016 12:05:46 +0000 (14:05 +0200)]
Cinder volume service is not managed by Pacemaker on BlockStorage

We do not want cinder-volume to be managed by Pacemaker on
BlockStorage nodes, where Pacemaker is not running at all.

This change adds a new BlockStorageCinderVolume service name
which can (and is, by default) mapped to the non Pacemaker
implementation of the service.

The error was:
Could not find dependency Exec[wait-for-settle] for
Pacemaker::Resource::Systemd[openstack-cinder-volume]

Also moves cinder::host setting into the Pacemaker specific service
definition because we only want to set a shared host= string when
the service is managed by Pacemaker.

Closes-Bug: #1628912
Change-Id: I2f7e82db4fdfd5f161e44d65d17893c3e19a89c9

8 years agoMove the rest of static roles resource registry entries to j2
Carlos Camacho [Thu, 29 Sep 2016 12:57:36 +0000 (14:57 +0200)]
Move the rest of static roles resource registry entries to j2

Moving the rest of the static based resource registry
entries to j2, this allows to extend the content of the
template to the roles_list.

Also moved the templates to correspond with the role name.

Partial-Bug: #1626976

Change-Id: I1cbe101eb4ce5a89cba5f2cc45cace43d3380f22

8 years agoMerge "j2 template per-role things in default registry"
Jenkins [Thu, 29 Sep 2016 14:56:56 +0000 (14:56 +0000)]
Merge "j2 template per-role things in default registry"

8 years agoMerge "Relax pre-upgrade check for failed actions"
Jenkins [Thu, 29 Sep 2016 14:56:49 +0000 (14:56 +0000)]
Merge "Relax pre-upgrade check for failed actions"

8 years agoMerge "Fix races in major-upgrade-pacemaker Step2"
Jenkins [Thu, 29 Sep 2016 14:56:41 +0000 (14:56 +0000)]
Merge "Fix races in major-upgrade-pacemaker Step2"

8 years agoFix typo in fixing gnocchi upgrade.
Sofer Athlan-Guyot [Thu, 29 Sep 2016 13:22:16 +0000 (15:22 +0200)]
Fix typo in fixing gnocchi upgrade.

Change-Id: I44451a280dd928cd694dd6845d5d83040ad1f482
Related-Bug: #1626592

8 years agoMerge "Full HA->HA NG migration might fail setting maintenance-mode"
Jenkins [Thu, 29 Sep 2016 13:08:35 +0000 (13:08 +0000)]
Merge "Full HA->HA NG migration might fail setting maintenance-mode"

8 years agoMerge "Update gnocchi database during M/N upgrade."
Jenkins [Thu, 29 Sep 2016 13:03:21 +0000 (13:03 +0000)]
Merge "Update gnocchi database during M/N upgrade."

8 years agoUse -L with chown and set crush map tunables when upgrading Ceph
Giulio Fidente [Wed, 28 Sep 2016 13:05:14 +0000 (15:05 +0200)]
Use -L with chown and set crush map tunables when upgrading Ceph

Previously the chown command wasn't traversing symlinks, causing
the new ownership to not be set for some needed files.

This change also ensures the crush map tunables are set to the 'default'
profile after the upgrade.

Finally redirects the output of a pidof to /dev/null to avoid spurious
logging.

Change-Id: Id4865ffff207edfc727d729f9cc04e6e81ad19d8

8 years agoMerge "Move db::mysql into service_config_settings"
Jenkins [Thu, 29 Sep 2016 10:12:44 +0000 (10:12 +0000)]
Merge "Move db::mysql into service_config_settings"

8 years agoj2 template per-role things in default registry
Steven Hardy [Fri, 23 Sep 2016 12:51:45 +0000 (13:51 +0100)]
j2 template per-role things in default registry

The default resource-registry file contains a bunch of per-role
things which mean you need to cut/paste into a custom environment
file for custom roles, even if you only want the defaults like the
built-in roles.  Using j2 we can template these just like in the
overcloud.j2.yaml and other files.

Change-Id: I52a9bffd043ca8fb0f05077c8a401a68def82926
Partial-Bug: #1626976

8 years agoUse netapp_host_type instead of netapp_eseries_host_type
Giulio Fidente [Wed, 31 Aug 2016 21:32:40 +0000 (23:32 +0200)]
Use netapp_host_type instead of netapp_eseries_host_type

This patch deprecates netapp_eseries_host_type in favor of netapp_host_type.

Change-Id: I113c770ca2e4dc54526d4262bacae48e223c54f4
Closes-Bug: 1579161

8 years agoRelax pre-upgrade check for failed actions
Michele Baldessari [Wed, 28 Sep 2016 20:55:25 +0000 (22:55 +0200)]
Relax pre-upgrade check for failed actions

Before this change we checked the cluster for any failed actions and
we stopped the upgrade process if there were any.
This is likely eccessive as a failed action could have happened in the
past and the cluster is now fully functional.

Better to check if any of the resources are in Stopped state and break
the upgrade process if any of them are.

We also need to restrict this check to the bootstrap node because
otherwise the following might happen:
1) Bootstrap node does the check, it is successful and it starts
   the full HA -> HA NG migration which *will* create failed actions
   and will start stopping resources
2) If the check now starts on a non-bootstrap node while 1) is ongoing,
   it will find either failed actions or stopped resources so it will
   fail.

Change-Id: Ib091f6dd8884025d2e23bf2fa700169e2dec778f
Closes-Bug: #1628653

8 years agoFix races in major-upgrade-pacemaker Step2
Michele Baldessari [Tue, 27 Sep 2016 16:18:33 +0000 (18:18 +0200)]
Fix races in major-upgrade-pacemaker Step2

tripleo-heat-templates/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh
has the following code:
...
check_resource mongod started 600

if [[ -n $(is_bootstrap_node) ]]; then
...
    tstart=$(date +%s)
    while ! clustercheck; do
        sleep 5
        tnow=$(date +%s)
        if (( tnow-tstart > galera_sync_timeout )) ; then
            echo_error "ERROR galera sync timed out"
            exit 1
        fi
    done

    # Run all the db syncs
    cinder-manage db sync
...
fi

start_or_enable_service rabbitmq
check_resource rabbitmq started 600
start_or_enable_service redis
check_resource redis started 600
start_or_enable_service openstack-cinder-volume
check_resource openstack-cinder-volume started 600

systemctl_swift start

for service in $(services_to_migrate); do
    manage_systemd_service start "${service%%-clone}"
    check_resource_systemd "${service%%-clone}" started 600
done
"""

The problem with the above code is that it is open to the following race
condition:
1) Bootstrap node is busy checking the galera status via cluster check
2) Non-bootstrap node has already reached: start_or_enable_service
   rabbitmq and later lines. These lines will be skipped because
   start_or_enable_service is a noop on non-bootstrap nodes and
   check_resource rabbitmq only checks that pcs status |grep rabbitmq
   returns true.
3) Non-bootstrap node can then reach the manage_systemd_service start
   and it will fail with stuff like:
  "Job for openstack-nova-scheduler.service failed because the control
  process exited with error code. See \"systemctl status
  openstack-nova-scheduler.service\" and \"journalctl -xe\" for
  details.\n" (because the db tables are not migrated yet)

This happens because 3) was started on non-bootstrap nodes before the
db-sync statements are complete on the bootstrap node. I did not feel
like changing the semantics of check_resource and remove the noop on
non-bootstrap nodes as other parts of the tree might rely on this
behaviour.

Depends-On: Ia016264b51f485b97fa150ebd357b109581342ed
Change-Id: I663313e183bb05b35d0c5af016c2d1705c772bd9
Closes-Bug: #1627965

8 years agoUpdate gnocchi database during M/N upgrade.
Sofer Athlan-Guyot [Thu, 22 Sep 2016 14:41:16 +0000 (16:41 +0200)]
Update gnocchi database during M/N upgrade.

We call gnocchi-upgrade to make sure we update all the needed schemas
during the major-upgrade-pacemaker step.

We also make sure that redis is started before we call gnocchi-upgrade
otherwise the command will be stuck in a loop trying to contact redis.

Closes-Bug: #1626592
Change-Id: Ia016264b51f485b97fa150ebd357b109581342ed

8 years agoMerge "Fix predictable placement indexing"
Jenkins [Wed, 28 Sep 2016 15:25:44 +0000 (15:25 +0000)]
Merge "Fix predictable placement indexing"

8 years agoMove db::mysql into service_config_settings
Dan Prince [Mon, 26 Sep 2016 17:52:46 +0000 (13:52 -0400)]
Move db::mysql into service_config_settings

This patch movs the various db::mysql hiera settings into a
'mysql' specific service_config_settings section for each
service so that these will only get applied on the MySQL service
node. This follows a similar puppet-tripleo change where we
create the actual databases for all services locally on
the MySQL service node to avoid permission issues.

Change-Id: Ic0692b1f7aa8409699630ef3924c4be98ca6ffb2
Closes-bug: #1620595
Depends-On: I05cc0afa9373429a3197c194c3e8f784ae96de5f
Depends-On: I5e1ef2dc6de6f67d7c509e299855baec371f614d

8 years agoFull HA->HA NG migration might fail setting maintenance-mode
Michele Baldessari [Wed, 28 Sep 2016 07:41:30 +0000 (09:41 +0200)]
Full HA->HA NG migration might fail setting maintenance-mode

Currently we do the following in the migration path:
pcs property set maintenance-mode=true
if ! timeout -k 10 300 crm_resource --wait; then
     echo_error "ERROR: cluster remained unstable after setting maintenance-mode for more than 300 seconds, exiting."
     exit 1
fi

crm_resource --wait can actually take forever under certain conditions.
The property will be set atomically across the cluster nodes so we should be good
without this.

Change-Id: I8f531d63479b81d65b572c4431c9db6f610f7e04
Closes-Bug: #1628393

8 years agoFix "Not all flavors have been migrated to the API database"
Michele Baldessari [Wed, 28 Sep 2016 10:19:10 +0000 (12:19 +0200)]
Fix "Not all flavors have been migrated to the API database"

After a successful upgrade to Newton, I ran the tripleo.sh
--overcloud-pingtest and it failed with the following:

resources.test_flavor: Not all flavors have been migrated to the API database (HTTP 409)

The issue is the fact that some tables have migrated to the
nova_api db and we need to migrate the data as well.

Currently we do:
    nova-manage db sync
    nova-manage api_db sync

We want to add:
    nova-manage db online_data_migrations

After launching this command the overcloud-pingtest works correctly:
tripleo.sh -- Overcloud pingtest SUCCEEDED

Change-Id: Id2d5b28b5d4ade7dff6c5e760be0f509b4fe5096
Closes-Bug: #1628450

8 years agoMerge "Deprecate the NeutronL3HA parameter"
Jenkins [Wed, 28 Sep 2016 07:26:39 +0000 (07:26 +0000)]
Merge "Deprecate the NeutronL3HA parameter"

8 years agoFix NTP servers hieradata
Marius Cornea [Tue, 27 Sep 2016 14:08:27 +0000 (16:08 +0200)]
Fix NTP servers hieradata

This patch enables correctly setting the NTP server passed via
--ntp-server in the overcloud nodes' /etc/ntp.conf.

Change-Id: Iff644b9da51fb8cd1946ad9d297ba0e94d3d782b

8 years agoMerge "Remove deprecated scheduler_driver settings"
Jenkins [Tue, 27 Sep 2016 08:50:49 +0000 (08:50 +0000)]
Merge "Remove deprecated scheduler_driver settings"

8 years agoMerge "Add metricd workers support in gnocchi"
Jenkins [Tue, 27 Sep 2016 08:24:24 +0000 (08:24 +0000)]
Merge "Add metricd workers support in gnocchi"

8 years agoMerge "Use parameter name to configure gmcast_listen_addr"
Jenkins [Tue, 27 Sep 2016 08:23:31 +0000 (08:23 +0000)]
Merge "Use parameter name to configure gmcast_listen_addr"

8 years agoMerge "Set manila::keystone::auth::tenant"
Jenkins [Tue, 27 Sep 2016 06:50:47 +0000 (06:50 +0000)]
Merge "Set manila::keystone::auth::tenant"

8 years agoMerge "Disable openstack-cinder-volume in step1 and reenable it in step2"
Jenkins [Tue, 27 Sep 2016 06:50:12 +0000 (06:50 +0000)]
Merge "Disable openstack-cinder-volume in step1 and reenable it in step2"

8 years agoMerge "Activate StorageMgmtPort on computes in HCI environment"
Jenkins [Tue, 27 Sep 2016 05:57:14 +0000 (05:57 +0000)]
Merge "Activate StorageMgmtPort on computes in HCI environment"

8 years agoSet manila::keystone::auth::tenant
Tom Barron [Tue, 27 Sep 2016 03:02:23 +0000 (23:02 -0400)]
Set manila::keystone::auth::tenant

Without setting this parameter, overcloud deploy fails and
'openstack stack failures list overcloud' reveals the
following error:

    Error: Puppet::Type::Keystone_user_role::ProviderOpenstack: Could
not find project with name [services] and domain [Default]
    Error:
/Stage[main]/Manila::Keystone::Auth/Keystone::Resource::Service_identity[manilav2]/Keystone_user_role[manilav2@services]:
Could not evaluate: undefined method `[]' for nil:NilClass

When we set manila::keystone::auth::tenant to 'service', analogous
to cinder, nova, etc., the overcloud deploy completes successfully.

Change-Id: I996ac2ff602c632a9f9ea9c293472a6f2f92fd72

8 years agoMerge "Add FixedIPs parameter to from_service.yaml"
Jenkins [Tue, 27 Sep 2016 02:24:00 +0000 (02:24 +0000)]
Merge "Add FixedIPs parameter to from_service.yaml"

8 years agoMerge "Fix ignore warning on ceph major upgrade."
Jenkins [Tue, 27 Sep 2016 02:04:27 +0000 (02:04 +0000)]
Merge "Fix ignore warning on ceph major upgrade."

8 years agoMerge "Add integration with Manila CephFS Native driver"
Jenkins [Tue, 27 Sep 2016 01:11:53 +0000 (01:11 +0000)]
Merge "Add integration with Manila CephFS Native driver"

8 years agoMerge "A few major-upgrade issues"
Jenkins [Tue, 27 Sep 2016 01:11:46 +0000 (01:11 +0000)]
Merge "A few major-upgrade issues"

8 years agoMerge "Start mongod before calling ceilometer-dbsync"
Jenkins [Tue, 27 Sep 2016 01:11:39 +0000 (01:11 +0000)]
Merge "Start mongod before calling ceilometer-dbsync"

8 years agoMerge "Reinstantiate parts of code that were accidentally removed"
Jenkins [Tue, 27 Sep 2016 01:11:32 +0000 (01:11 +0000)]
Merge "Reinstantiate parts of code that were accidentally removed"

8 years agoMerge "Neutron metadata agent worker count fix"
Jenkins [Tue, 27 Sep 2016 00:13:37 +0000 (00:13 +0000)]
Merge "Neutron metadata agent worker count fix"

8 years agoMerge "Remove double definition of config_settings key in keystone"
Jenkins [Tue, 27 Sep 2016 00:11:07 +0000 (00:11 +0000)]
Merge "Remove double definition of config_settings key in keystone"

8 years agoFix predictable placement indexing
Ben Nemec [Mon, 26 Sep 2016 21:40:20 +0000 (16:40 -0500)]
Fix predictable placement indexing

As noted in the bug, predictable placement is broken right now
because the %index% in the scheduler hint isn't being interpolated.
This is because the parameter was moved from overcloud.yaml to the
service-specific files, which doesn't provide the index value.

Because the Compute role's parameter is named NovaCompute... we also
have to include some backwards compatibility logic to handle the
mismatch.

Change-Id: Ibee2949fe4c6c707203d7250e2ce169c769b1dcd
Closes-Bug: 1627858

8 years agoFix ignore warning on ceph major upgrade.
Sofer Athlan-Guyot [Mon, 26 Sep 2016 13:36:29 +0000 (15:36 +0200)]
Fix ignore warning on ceph major upgrade.

The paramater IgnoreCephUpgradeWarnings is type cast into a boolean
which is rendered as 'True' or 'False' as a string not 'true' or
'false'.  This fix the check.

Change-Id: I8840c384d07f9d185a72bde5f91a3872a321f623
Closes-Bug: 1627736

8 years agoMerge "Bind MySQL address to hostname appropriate to its network"
Jenkins [Mon, 26 Sep 2016 13:53:42 +0000 (13:53 +0000)]
Merge "Bind MySQL address to hostname appropriate to its network"

8 years agoUse parameter name to configure gmcast_listen_addr
Juan Antonio Osorio Robles [Mon, 26 Sep 2016 12:59:58 +0000 (15:59 +0300)]
Use parameter name to configure gmcast_listen_addr

This used to used mysql_bind_ip, but this parameter is quite misleading
since what it actually configures is not the bind-ip itself, but the
gmcast.listen_addr parameter. This fixes that confusion.

Depends-On: Iea4bd67074824e5dc6732fd7e408743e693d80b3
Change-Id: I2b114600e622491ccff08a07946926734b50ac70

8 years agoRemove double definition of config_settings key in keystone
Juan Antonio Osorio Robles [Mon, 26 Sep 2016 11:10:39 +0000 (14:10 +0300)]
Remove double definition of config_settings key in keystone

Change-Id: I291bfb1e5736864ea504cd82eea1d4001fcdd931

8 years agoBind MySQL address to hostname appropriate to its network
Juan Antonio Osorio Robles [Fri, 23 Sep 2016 14:28:06 +0000 (17:28 +0300)]
Bind MySQL address to hostname appropriate to its network

This now takes into use the mysql_bind_host key, to set an
appropriate fqdn for mysql to bind to.

Closes-Bug: #1627060
Change-Id: I50f4082ea968d93b240b6b5541d84f27afd6e2a3
Depends-On: I316acfd514aac63b84890e20283c4ca611ccde8b

8 years agoAdd metricd workers support in gnocchi
Carlos Camacho [Thu, 22 Sep 2016 11:08:58 +0000 (13:08 +0200)]
Add metricd workers support in gnocchi

Depending on the environment, gnocchi workers
uses several controller resources RAM/CPU,
this option makes it configurable.

Also, configured to 1 in environments/low-memory-usage.yaml
which will reduce the service footprint in i.e. CI

Change-Id: Ia008b32151f4d8fec586cf89994ac836751b7cce
Closes-bug: #1626473

8 years agoget_param calls with multiple arguments need brackets around them
Michele Baldessari [Fri, 23 Sep 2016 15:31:19 +0000 (17:31 +0200)]
get_param calls with multiple arguments need brackets around them

This issue was spotted during major upgrade where we had calls like
this:

   servers: {get_param: servers, Controller}

These get_param calls are hanging indefinitely and make the whole
upgrade end in a timeout. We need to put brackets around the get_param
function when there are multiple arguments:
http://docs.openstack.org/developer/heat/template_guide/hot_spec.html#get-param

This is already done in most of the tree, and the few places where this
was not happening were parts not under CI. After this change the
following grep returns only one false positive:

   grep -ir get_param: |grep -v -- '\[' |grep ','

Change-Id: I65b23bb44f37b93e017dd15a5212939ffac76614
Closes-Bug: #1626628

8 years agoA few major-upgrade issues
Michele Baldessari [Sun, 25 Sep 2016 12:10:31 +0000 (14:10 +0200)]
A few major-upgrade issues

This commit does the following:
1. We now explicitly disable/stop and then remove the resources that are
   moving to systemd. We do this because we want to make sure they are all
   stopped before doing a yum upgrade, which otherwise would take ages due
   to rabbitmq and galera being down. It is best if we do this via pcs
   while we do the HA Full -> HA NG migration because it is simpler to make
   sure all the services are stopped at that stage. For extra safety we can
   still do a check by hand. By doing it via pacemaker we have the
   guarantee that all the migrated services are down already when we stop
   the cluster (which happens to be a syncronization point between all
   controller nodes). That way we can be certain that they are all down on
   all nodes before starting the yum upgrade process.

2. We actually need to start the systemd services in
   major_upgrade_controller_pacemaker_2.sh and not stop them.

3. We need to use the proper bash variable name

4. Use is_bootstrap_node everywhere to make the code more consistent

Change-Id: Ic565c781b80357bed9483df45a4a94ec0423487c
Closes-Bug: #1627490

8 years agoDisable openstack-cinder-volume in step1 and reenable it in step2
Michele Baldessari [Sun, 25 Sep 2016 09:52:04 +0000 (11:52 +0200)]
Disable openstack-cinder-volume in step1 and reenable it in step2

Currently we do not disable openstack-cinder-volume during our
major-upgrade-pacemaker step. This leads to the following scenario. In
major_upgrade_controller_pacemaker_2.sh we do:

  start_or_enable_service galera
  check_resource galera started 600
  ....
  if [[ -n $(is_bootstrap_node) ]]; then
  ...
      cinder-manage db sync
  ...

What happens here is that since openstack-cinder-volume was never
disabled it will already be started by pacemaker before we call
cinder-manage and this will give us the following errors during the
start:
06:05:21.861 19482 ERROR cinder.cmd.volume DBError:
                   (pymysql.err.InternalError) (1054, u"Unknown column 'services.cluster_name' in 'field list'")

Change-Id: I01b2daf956c30b9a4985ea62cbf4c941ec66dcdf
Closes-Bug: #1627470

8 years agoStart mongod before calling ceilometer-dbsync
Michele Baldessari [Sun, 25 Sep 2016 08:49:15 +0000 (10:49 +0200)]
Start mongod before calling ceilometer-dbsync

Currently we in major_upgrade_controller_pacemaker_2.sh we are calling
ceilometer-dbsync before mongod is actually started (only galera is
started at this point). This will make the dbsync hang indefinitely
until the heat stack times out.

Now this approach should be okay, but do note that when we start mongod
via systemctl we are not guaranteed that it will be up on all nodes
before we call ceilometer-dbsync. This *should* be okay because
ceilometer-dbsync keeps retrying and eventually one of the nodes will
be available. A completely clean fix here would be to add another
step in heat to have the guarantee that all mongo servers are up and
running before the dbsync call.

Change-Id: I10c960b1e0efdeb1e55d77c25aebf1e3e67f17ca
Closes-Bug: #1627453

8 years agoRemove deprecated scheduler_driver settings
Michele Baldessari [Sun, 25 Sep 2016 08:30:55 +0000 (10:30 +0200)]
Remove deprecated scheduler_driver settings

In bug https://bugs.launchpad.net/tripleo/+bug/1615035 we fixed the
scheduler_host setting which got deprecated in newton. It seems also the
scheduler_driver settings needs tweaking:

systemctl status openstack-nova-scheduler.service:
2016-09-24 20:24:54.337 15278 WARNING stevedore.named [-] Could not load nova.scheduler.filter_scheduler.FilterScheduler
2016-09-24 20:24:54.338 15278 CRITICAL nova [-] RuntimeError: (u'Cannot load scheduler driver from configuration %(conf)s.',
                              {'conf': 'nova.scheduler.filter_scheduler.FilterScheduler'})

Let's set this to default during the upgrade step. From newton's nova.conf:

  The class of the driver used by the scheduler. This should be chosen
  from one of the entrypoints under the namespace 'nova.scheduler.driver'
  of file 'setup.cfg'. If nothing is specified in this option, the
  'filter_scheduler' is used.

  This option also supports deprecated full Python path to the class to
  be used.  For example, "nova.scheduler.filter_scheduler.FilterScheduler".
  But note: this support will be dropped in the N Release.

Change-Id: Ic384292ad05a57757158995ec4c1a269fe4b00f1
Depends-On: I89124ead8928ff33e6b6907a7c2178169e91f4e6
Closes-Bug: #1627450

8 years agoReinstantiate parts of code that were accidentally removed
Michele Baldessari [Sun, 25 Sep 2016 08:15:41 +0000 (10:15 +0200)]
Reinstantiate parts of code that were accidentally removed

With commit fb25385d34e604d2f670cebe3e03fd57c14fa6be
"Rework the pacemaker_common_functions for M..N upgrades" we
accidentally removed some lines that fixed M/N upgrade issues.
Namely:
extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh

  -# https://bugzilla.redhat.com/show_bug.cgi?id=1284047
  -# Change-Id: Ib3f6c12ff5471e1f017f28b16b1e6496a4a4b435
  -crudini --set /etc/ceilometer/ceilometer.conf DEFAULT rpc_backend rabbit
  -# https://bugzilla.redhat.com/show_bug.cgi?id=1284058
  -# Ifd1861e3df46fad0e44ff9b5cbd58711bbc87c97 Swift Ceilometer middleware no longer exists
  -crudini --set /etc/swift/proxy-server.conf pipeline:main pipeline "catch_errors healthcheck cache ratelimit tempurl formpost authtoken keystone staticweb proxy-logging proxy-server"
  -# LP: 1615035, required only for M/N upgrade.
  -crudini --set /etc/nova/nova.conf DEFAULT scheduler_host_manager host_manager

extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh
  nova-manage db sync
- nova-manage api_db sync

This patch simply puts that code back without reverting the
whole commit that broke things, because that is needed.

Closes-Bug: #1627448

Change-Id: I89124ead8928ff33e6b6907a7c2178169e91f4e6

8 years agoAdd FixedIPs parameter to from_service.yaml
Ben Nemec [Fri, 23 Sep 2016 20:50:53 +0000 (15:50 -0500)]
Add FixedIPs parameter to from_service.yaml

Without this, deployments using the from_service.yaml port for
service VIPs will fail with:

"Property error: : resources.RedisVirtualIP.properties: : Unknown
Property FixedIPs"

Change-Id: Ie0d3b940a87741c56fe022c9e50da0d3ae9b583b
Closes-Bug: 1627189

8 years agoMerge "Remove hard-coded roles in EnabledServices output"
Jenkins [Fri, 23 Sep 2016 17:45:24 +0000 (17:45 +0000)]
Merge "Remove hard-coded roles in EnabledServices output"

8 years agoAdd integration with Manila CephFS Native driver
Erno Kuvaja [Mon, 22 Aug 2016 09:52:02 +0000 (10:52 +0100)]
Add integration with Manila CephFS Native driver

Enables configuring CephFS Native backend for Manila.

This change is based on the usage of environments like in
review https://review.openstack.org/#/c/354019 for Netapp
driver.

Co-Authored-By: Marios Andreou <marios@redhat.com>
Change-Id: If013d796bcdfe48b2c995bcab462c89c360b7367
Depends-On: I918f6f23ae0bd3542bcfe1bf0c797d4e6aa8f4d9
Depends-On: I2b537f735b8d1be8f39e8c274be3872b193c1014

8 years agoMove keystone::auth into service_config_settings
Dan Prince [Thu, 15 Sep 2016 07:19:15 +0000 (09:19 +0200)]
Move keystone::auth into service_config_settings

This patch moves the keystone::auth settings for all
services into the new service_config_settings section. This
is important because we execute the keystone commands via
puppet only on the role containing the keystone service
and without these settings it will fail.

Note that yaql merging/filtering is used here to ensure that
service_config_settings is optional in service templates,
and also that we'll only deploy hieradata for a given
service on a node running the service (the key in
the service_config_settings map must match the service_name
in the service template for this to work).

e.g the following will result in only deploying keystone: 123
in hiera on the role running the "keystone" service,
regardless of which service template defines it.

  service_config_settings:
    keystone:
      keystone: 123

Co-Authored-By: Steven Hardy <shardy@redhat.com>
Change-Id: I0c2fce037a1a38772f998d582a816b4b703f8265
Closes-bug: 1620829

8 years agoMerge "Tolerate missing keys from role_data in service templates"
Jenkins [Fri, 23 Sep 2016 11:35:10 +0000 (11:35 +0000)]
Merge "Tolerate missing keys from role_data in service templates"

8 years agoActivate StorageMgmtPort on computes in HCI environment
Giulio Fidente [Fri, 23 Sep 2016 11:26:28 +0000 (13:26 +0200)]
Activate StorageMgmtPort on computes in HCI environment

Change-Id: If4d3b186d1d943ca6fad46427fb3b35699cdfc90

8 years agoMerge "explicitly set fluentd service_provider"
Jenkins [Fri, 23 Sep 2016 10:23:15 +0000 (10:23 +0000)]
Merge "explicitly set fluentd service_provider"

8 years agoMerge "No-op Puppet for upgrades/migrations according to composable roles"
Jenkins [Fri, 23 Sep 2016 09:58:20 +0000 (09:58 +0000)]
Merge "No-op Puppet for upgrades/migrations according to composable roles"

8 years agoRemove hard-coded roles in EnabledServices output
Steven Hardy [Wed, 21 Sep 2016 10:16:03 +0000 (11:16 +0100)]
Remove hard-coded roles in EnabledServices output

This was missed during custom-roles work, and will mean deployments
break if any of the existing roles are removed from roles_data.yaml

Change-Id: Ia737b48a0dd272f8d706b7458764201fa47cb0bb
Closes-Bug: #1625755

8 years agoMerge "Make apache-based services use network-dependent servername"
Jenkins [Fri, 23 Sep 2016 08:39:09 +0000 (08:39 +0000)]
Merge "Make apache-based services use network-dependent servername"

8 years agoNeutron metadata agent worker count fix
Brent Eagles [Thu, 22 Sep 2016 15:16:37 +0000 (12:46 -0230)]
Neutron metadata agent worker count fix

This patch changes the default value and type of the NeutronWorkers
parameter, allowing it to be unset and let a system-dependent value to
be used (e.g. processorcount or some derivate value).

Change-Id: Ia385b3503fe405c4b981c451f131ac91e1af5602
Closes-Bug: #1626126

8 years agoexplicitly set fluentd service_provider
Lars Kellogg-Stedman [Thu, 22 Sep 2016 14:20:17 +0000 (10:20 -0400)]
explicitly set fluentd service_provider

the konstantin-fluentd package assumes sysv init scripts, while the
fluentd package in rhel(/centos/fedora) uses systemd.  this can cause
errors starting the service.

This review explicitly sets the service_provider to "systemd".

This requires https://github.com/soylent/konstantin-fluentd/pull/15, which exposes the service_provider parameter in konstantin-fluentd.

Change-Id: I24332203de33f56a0e49fcc15f7fb7bb576e8752

8 years agoDeprecate the NeutronL3HA parameter
Brent Eagles [Thu, 22 Sep 2016 13:48:08 +0000 (11:18 -0230)]
Deprecate the NeutronL3HA parameter

NeutronL3HA used to be enabled by the tripleoclient if the controller
count > 1. This functionality has been moved into the relevant heat
template, making the parameter less valuable for general use. If
necessary, deployers can override the automatic behavior through extra
config.

Change-Id: Id5bb5070b9627fd545357acc9ef51bdc69d10551
Related-Bug: #1623155