gerrit.opnfv Code Review - apex-tripleo-heat-templates.git/log

j2 template per-role things in default registry

The default resource-registry file contains a bunch of per-role
things which mean you need to cut/paste into a custom environment
file for custom roles, even if you only want the defaults like the
built-in roles. Using j2 we can template these just like in the
overcloud.j2.yaml and other files.

Change-Id: I52a9bffd043ca8fb0f05077c8a401a68def82926
Partial-Bug: #1626976

commit | commitdiff | tree

Michele Baldessari [Wed, 28 Sep 2016 20:55:25 +0000 (22:55 +0200)]

Relax pre-upgrade check for failed actions

Before this change we checked the cluster for any failed actions and
we stopped the upgrade process if there were any.
This is likely eccessive as a failed action could have happened in the
past and the cluster is now fully functional.

Better to check if any of the resources are in Stopped state and break
the upgrade process if any of them are.

We also need to restrict this check to the bootstrap node because
otherwise the following might happen:
1) Bootstrap node does the check, it is successful and it starts
   the full HA -> HA NG migration which *will* create failed actions
   and will start stopping resources
2) If the check now starts on a non-bootstrap node while 1) is ongoing,
   it will find either failed actions or stopped resources so it will
   fail.

Change-Id: Ib091f6dd8884025d2e23bf2fa700169e2dec778f
Closes-Bug: #1628653

commit | commitdiff | tree

Michele Baldessari [Tue, 27 Sep 2016 16:18:33 +0000 (18:18 +0200)]

Fix races in major-upgrade-pacemaker Step2

tripleo-heat-templates/extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh
has the following code:
...
check_resource mongod started 600

if [[ -n $(is_bootstrap_node) ]]; then
...
    tstart=$(date +%s)
    while ! clustercheck; do
        sleep 5
        tnow=$(date +%s)
        if (( tnow-tstart > galera_sync_timeout )) ; then
            echo_error "ERROR galera sync timed out"
            exit 1
        fi
    done

    # Run all the db syncs
    cinder-manage db sync
...
fi

start_or_enable_service rabbitmq
check_resource rabbitmq started 600
start_or_enable_service redis
check_resource redis started 600
start_or_enable_service openstack-cinder-volume
check_resource openstack-cinder-volume started 600

systemctl_swift start

for service in $(services_to_migrate); do
    manage_systemd_service start "${service%%-clone}"
    check_resource_systemd "${service%%-clone}" started 600
done
"""

The problem with the above code is that it is open to the following race
condition:
1) Bootstrap node is busy checking the galera status via cluster check
2) Non-bootstrap node has already reached: start_or_enable_service
   rabbitmq and later lines. These lines will be skipped because
   start_or_enable_service is a noop on non-bootstrap nodes and
   check_resource rabbitmq only checks that pcs status |grep rabbitmq
   returns true.
3) Non-bootstrap node can then reach the manage_systemd_service start
   and it will fail with stuff like:
  "Job for openstack-nova-scheduler.service failed because the control
  process exited with error code. See \"systemctl status
  openstack-nova-scheduler.service\" and \"journalctl -xe\" for
  details.\n" (because the db tables are not migrated yet)

This happens because 3) was started on non-bootstrap nodes before the
db-sync statements are complete on the bootstrap node. I did not feel
like changing the semantics of check_resource and remove the noop on
non-bootstrap nodes as other parts of the tree might rely on this
behaviour.

Depends-On: Ia016264b51f485b97fa150ebd357b109581342ed
Change-Id: I663313e183bb05b35d0c5af016c2d1705c772bd9
Closes-Bug: #1627965

commit | commitdiff | tree

Sofer Athlan-Guyot [Thu, 22 Sep 2016 14:41:16 +0000 (16:41 +0200)]

Update gnocchi database during M/N upgrade.

We call gnocchi-upgrade to make sure we update all the needed schemas
during the major-upgrade-pacemaker step.

We also make sure that redis is started before we call gnocchi-upgrade
otherwise the command will be stuck in a loop trying to contact redis.

Closes-Bug: #1626592
Change-Id: Ia016264b51f485b97fa150ebd357b109581342ed

commit | commitdiff | tree

Jenkins [Wed, 28 Sep 2016 15:25:44 +0000 (15:25 +0000)]

Merge "Fix predictable placement indexing"

commit | commitdiff | tree

Dan Prince [Mon, 26 Sep 2016 17:52:46 +0000 (13:52 -0400)]

Move db::mysql into service_config_settings

This patch movs the various db::mysql hiera settings into a
'mysql' specific service_config_settings section for each
service so that these will only get applied on the MySQL service
node. This follows a similar puppet-tripleo change where we
create the actual databases for all services locally on
the MySQL service node to avoid permission issues.

Change-Id: Ic0692b1f7aa8409699630ef3924c4be98ca6ffb2
Closes-bug: #1620595
Depends-On: I05cc0afa9373429a3197c194c3e8f784ae96de5f
Depends-On: I5e1ef2dc6de6f67d7c509e299855baec371f614d

commit | commitdiff | tree

Michele Baldessari [Wed, 28 Sep 2016 07:41:30 +0000 (09:41 +0200)]

Full HA->HA NG migration might fail setting maintenance-mode

Currently we do the following in the migration path:
pcs property set maintenance-mode=true
if ! timeout -k 10 300 crm_resource --wait; then
echo_error "ERROR: cluster remained unstable after setting maintenance-mode for more than 300 seconds, exiting."
exit 1
fi

crm_resource --wait can actually take forever under certain conditions.
The property will be set atomically across the cluster nodes so we should be good
without this.

Change-Id: I8f531d63479b81d65b572c4431c9db6f610f7e04
Closes-Bug: #1628393

commit | commitdiff | tree

Michele Baldessari [Wed, 28 Sep 2016 10:19:10 +0000 (12:19 +0200)]

Fix "Not all flavors have been migrated to the API database"

After a successful upgrade to Newton, I ran the tripleo.sh
--overcloud-pingtest and it failed with the following:

resources.test_flavor: Not all flavors have been migrated to the API database (HTTP 409)

The issue is the fact that some tables have migrated to the
nova_api db and we need to migrate the data as well.

Currently we do:
    nova-manage db sync
    nova-manage api_db sync

We want to add:
    nova-manage db online_data_migrations

After launching this command the overcloud-pingtest works correctly:
tripleo.sh -- Overcloud pingtest SUCCEEDED

Change-Id: Id2d5b28b5d4ade7dff6c5e760be0f509b4fe5096
Closes-Bug: #1628450

commit | commitdiff | tree

Jenkins [Wed, 28 Sep 2016 07:26:39 +0000 (07:26 +0000)]

Merge "Deprecate the NeutronL3HA parameter"

commit | commitdiff | tree

Marius Cornea [Tue, 27 Sep 2016 14:08:27 +0000 (16:08 +0200)]

Fix NTP servers hieradata

This patch enables correctly setting the NTP server passed via
--ntp-server in the overcloud nodes' /etc/ntp.conf.

Change-Id: Iff644b9da51fb8cd1946ad9d297ba0e94d3d782b

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 08:50:49 +0000 (08:50 +0000)]

Merge "Remove deprecated scheduler_driver settings"

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 08:24:24 +0000 (08:24 +0000)]

Merge "Add metricd workers support in gnocchi"

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 08:23:31 +0000 (08:23 +0000)]

Merge "Use parameter name to configure gmcast_listen_addr"

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 06:50:47 +0000 (06:50 +0000)]

Merge "Set manila::keystone::auth::tenant"

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 06:50:12 +0000 (06:50 +0000)]

Merge "Disable openstack-cinder-volume in step1 and reenable it in step2"

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 05:57:14 +0000 (05:57 +0000)]

Merge "Activate StorageMgmtPort on computes in HCI environment"

commit | commitdiff | tree

Tom Barron [Tue, 27 Sep 2016 03:02:23 +0000 (23:02 -0400)]

Set manila::keystone::auth::tenant

Without setting this parameter, overcloud deploy fails and
'openstack stack failures list overcloud' reveals the
following error:

Error: Puppet::Type::Keystone_user_role::ProviderOpenstack: Could
not find project with name [services] and domain [Default]
Error:
/Stage[main]/Manila::Keystone::Auth/Keystone::Resource::Service_identity[manilav2]/Keystone_user_role[manilav2@services]:
Could not evaluate: undefined method `[]' for nil:NilClass

When we set manila::keystone::auth::tenant to 'service', analogous
to cinder, nova, etc., the overcloud deploy completes successfully.

Change-Id: I996ac2ff602c632a9f9ea9c293472a6f2f92fd72

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 02:24:00 +0000 (02:24 +0000)]

Merge "Add FixedIPs parameter to from_service.yaml"

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 02:04:27 +0000 (02:04 +0000)]

Merge "Fix ignore warning on ceph major upgrade."

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 01:11:53 +0000 (01:11 +0000)]

Merge "Add integration with Manila CephFS Native driver"

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 01:11:46 +0000 (01:11 +0000)]

Merge "A few major-upgrade issues"

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 01:11:39 +0000 (01:11 +0000)]

Merge "Start mongod before calling ceilometer-dbsync"

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 01:11:32 +0000 (01:11 +0000)]

Merge "Reinstantiate parts of code that were accidentally removed"

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 00:13:37 +0000 (00:13 +0000)]

Merge "Neutron metadata agent worker count fix"

commit | commitdiff | tree

Jenkins [Tue, 27 Sep 2016 00:11:07 +0000 (00:11 +0000)]

Merge "Remove double definition of config_settings key in keystone"

commit | commitdiff | tree

Ben Nemec [Mon, 26 Sep 2016 21:40:20 +0000 (16:40 -0500)]

Fix predictable placement indexing

As noted in the bug, predictable placement is broken right now
because the %index% in the scheduler hint isn't being interpolated.
This is because the parameter was moved from overcloud.yaml to the
service-specific files, which doesn't provide the index value.

Because the Compute role's parameter is named NovaCompute... we also
have to include some backwards compatibility logic to handle the
mismatch.

Change-Id: Ibee2949fe4c6c707203d7250e2ce169c769b1dcd
Closes-Bug: 1627858

commit | commitdiff | tree

Sofer Athlan-Guyot [Mon, 26 Sep 2016 13:36:29 +0000 (15:36 +0200)]

Fix ignore warning on ceph major upgrade.

The paramater IgnoreCephUpgradeWarnings is type cast into a boolean
which is rendered as 'True' or 'False' as a string not 'true' or
'false'. This fix the check.

Change-Id: I8840c384d07f9d185a72bde5f91a3872a321f623
Closes-Bug: 1627736

commit | commitdiff | tree

Jenkins [Mon, 26 Sep 2016 13:53:42 +0000 (13:53 +0000)]

Merge "Bind MySQL address to hostname appropriate to its network"

commit | commitdiff | tree

Juan Antonio Osorio Robles [Mon, 26 Sep 2016 12:59:58 +0000 (15:59 +0300)]

Use parameter name to configure gmcast_listen_addr

This used to used mysql_bind_ip, but this parameter is quite misleading
since what it actually configures is not the bind-ip itself, but the
gmcast.listen_addr parameter. This fixes that confusion.

Depends-On: Iea4bd67074824e5dc6732fd7e408743e693d80b3
Change-Id: I2b114600e622491ccff08a07946926734b50ac70

commit | commitdiff | tree

Juan Antonio Osorio Robles [Mon, 26 Sep 2016 11:10:39 +0000 (14:10 +0300)]

Remove double definition of config_settings key in keystone

Change-Id: I291bfb1e5736864ea504cd82eea1d4001fcdd931

commit | commitdiff | tree

Juan Antonio Osorio Robles [Fri, 23 Sep 2016 14:28:06 +0000 (17:28 +0300)]

Bind MySQL address to hostname appropriate to its network

This now takes into use the mysql_bind_host key, to set an
appropriate fqdn for mysql to bind to.

Closes-Bug: #1627060
Change-Id: I50f4082ea968d93b240b6b5541d84f27afd6e2a3
Depends-On: I316acfd514aac63b84890e20283c4ca611ccde8b

commit | commitdiff | tree

Carlos Camacho [Thu, 22 Sep 2016 11:08:58 +0000 (13:08 +0200)]

Add metricd workers support in gnocchi

Depending on the environment, gnocchi workers
uses several controller resources RAM/CPU,
this option makes it configurable.

Also, configured to 1 in environments/low-memory-usage.yaml
which will reduce the service footprint in i.e. CI

Change-Id: Ia008b32151f4d8fec586cf89994ac836751b7cce
Closes-bug: #1626473

commit | commitdiff | tree

Michele Baldessari [Fri, 23 Sep 2016 15:31:19 +0000 (17:31 +0200)]

get_param calls with multiple arguments need brackets around them

This issue was spotted during major upgrade where we had calls like
this:

servers: {get_param: servers, Controller}

These get_param calls are hanging indefinitely and make the whole
upgrade end in a timeout. We need to put brackets around the get_param
function when there are multiple arguments:
http://docs.openstack.org/developer/heat/template_guide/hot_spec.html#get-param

This is already done in most of the tree, and the few places where this
was not happening were parts not under CI. After this change the
following grep returns only one false positive:

grep -ir get_param: |grep -v -- '\[' |grep ','

Change-Id: I65b23bb44f37b93e017dd15a5212939ffac76614
Closes-Bug: #1626628

commit | commitdiff | tree

Michele Baldessari [Sun, 25 Sep 2016 12:10:31 +0000 (14:10 +0200)]

A few major-upgrade issues

This commit does the following:
1. We now explicitly disable/stop and then remove the resources that are
   moving to systemd. We do this because we want to make sure they are all
   stopped before doing a yum upgrade, which otherwise would take ages due
   to rabbitmq and galera being down. It is best if we do this via pcs
   while we do the HA Full -> HA NG migration because it is simpler to make
   sure all the services are stopped at that stage. For extra safety we can
   still do a check by hand. By doing it via pacemaker we have the
   guarantee that all the migrated services are down already when we stop
   the cluster (which happens to be a syncronization point between all
   controller nodes). That way we can be certain that they are all down on
   all nodes before starting the yum upgrade process.

2. We actually need to start the systemd services in
   major_upgrade_controller_pacemaker_2.sh and not stop them.

3. We need to use the proper bash variable name

4. Use is_bootstrap_node everywhere to make the code more consistent

Change-Id: Ic565c781b80357bed9483df45a4a94ec0423487c
Closes-Bug: #1627490

commit | commitdiff | tree

Michele Baldessari [Sun, 25 Sep 2016 09:52:04 +0000 (11:52 +0200)]

Disable openstack-cinder-volume in step1 and reenable it in step2

Currently we do not disable openstack-cinder-volume during our
major-upgrade-pacemaker step. This leads to the following scenario. In
major_upgrade_controller_pacemaker_2.sh we do:

  start_or_enable_service galera
  check_resource galera started 600
  ....
  if [[ -n $(is_bootstrap_node) ]]; then
  ...
      cinder-manage db sync
  ...

What happens here is that since openstack-cinder-volume was never
disabled it will already be started by pacemaker before we call
cinder-manage and this will give us the following errors during the
start:
06:05:21.861 19482 ERROR cinder.cmd.volume DBError:
                   (pymysql.err.InternalError) (1054, u"Unknown column 'services.cluster_name' in 'field list'")

Change-Id: I01b2daf956c30b9a4985ea62cbf4c941ec66dcdf
Closes-Bug: #1627470

commit | commitdiff | tree

Michele Baldessari [Sun, 25 Sep 2016 08:49:15 +0000 (10:49 +0200)]

Start mongod before calling ceilometer-dbsync

Currently we in major_upgrade_controller_pacemaker_2.sh we are calling
ceilometer-dbsync before mongod is actually started (only galera is
started at this point). This will make the dbsync hang indefinitely
until the heat stack times out.

Now this approach should be okay, but do note that when we start mongod
via systemctl we are not guaranteed that it will be up on all nodes
before we call ceilometer-dbsync. This *should* be okay because
ceilometer-dbsync keeps retrying and eventually one of the nodes will
be available. A completely clean fix here would be to add another
step in heat to have the guarantee that all mongo servers are up and
running before the dbsync call.

Change-Id: I10c960b1e0efdeb1e55d77c25aebf1e3e67f17ca
Closes-Bug: #1627453

commit | commitdiff | tree

Michele Baldessari [Sun, 25 Sep 2016 08:30:55 +0000 (10:30 +0200)]

Remove deprecated scheduler_driver settings

In bug https://bugs.launchpad.net/tripleo/+bug/1615035 we fixed the
scheduler_host setting which got deprecated in newton. It seems also the
scheduler_driver settings needs tweaking:

systemctl status openstack-nova-scheduler.service:
2016-09-24 20:24:54.337 15278 WARNING stevedore.named [-] Could not load nova.scheduler.filter_scheduler.FilterScheduler
2016-09-24 20:24:54.338 15278 CRITICAL nova [-] RuntimeError: (u'Cannot load scheduler driver from configuration %(conf)s.',
                              {'conf': 'nova.scheduler.filter_scheduler.FilterScheduler'})

Let's set this to default during the upgrade step. From newton's nova.conf:

  The class of the driver used by the scheduler. This should be chosen
  from one of the entrypoints under the namespace 'nova.scheduler.driver'
  of file 'setup.cfg'. If nothing is specified in this option, the
  'filter_scheduler' is used.

  This option also supports deprecated full Python path to the class to
  be used.  For example, "nova.scheduler.filter_scheduler.FilterScheduler".
  But note: this support will be dropped in the N Release.

Change-Id: Ic384292ad05a57757158995ec4c1a269fe4b00f1
Depends-On: I89124ead8928ff33e6b6907a7c2178169e91f4e6
Closes-Bug: #1627450

commit | commitdiff | tree

Michele Baldessari [Sun, 25 Sep 2016 08:15:41 +0000 (10:15 +0200)]

Reinstantiate parts of code that were accidentally removed

With commit fb25385d34e604d2f670cebe3e03fd57c14fa6be
"Rework the pacemaker_common_functions for M..N upgrades" we
accidentally removed some lines that fixed M/N upgrade issues.
Namely:
extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh

  -# https://bugzilla.redhat.com/show_bug.cgi?id=1284047
  -# Change-Id: Ib3f6c12ff5471e1f017f28b16b1e6496a4a4b435
  -crudini --set /etc/ceilometer/ceilometer.conf DEFAULT rpc_backend rabbit
  -# https://bugzilla.redhat.com/show_bug.cgi?id=1284058
  -# Ifd1861e3df46fad0e44ff9b5cbd58711bbc87c97 Swift Ceilometer middleware no longer exists
  -crudini --set /etc/swift/proxy-server.conf pipeline:main pipeline "catch_errors healthcheck cache ratelimit tempurl formpost authtoken keystone staticweb proxy-logging proxy-server"
  -# LP: 1615035, required only for M/N upgrade.
  -crudini --set /etc/nova/nova.conf DEFAULT scheduler_host_manager host_manager

extraconfig/tasks/major_upgrade_controller_pacemaker_2.sh
  nova-manage db sync
- nova-manage api_db sync

This patch simply puts that code back without reverting the
whole commit that broke things, because that is needed.

Closes-Bug: #1627448

Change-Id: I89124ead8928ff33e6b6907a7c2178169e91f4e6

commit | commitdiff | tree

Ben Nemec [Fri, 23 Sep 2016 20:50:53 +0000 (15:50 -0500)]

Add FixedIPs parameter to from_service.yaml

Without this, deployments using the from_service.yaml port for
service VIPs will fail with:

"Property error: : resources.RedisVirtualIP.properties: : Unknown
Property FixedIPs"

Change-Id: Ie0d3b940a87741c56fe022c9e50da0d3ae9b583b
Closes-Bug: 1627189

commit | commitdiff | tree

Jenkins [Fri, 23 Sep 2016 17:45:24 +0000 (17:45 +0000)]

Merge "Remove hard-coded roles in EnabledServices output"

commit | commitdiff | tree

Erno Kuvaja [Mon, 22 Aug 2016 09:52:02 +0000 (10:52 +0100)]

Add integration with Manila CephFS Native driver

Enables configuring CephFS Native backend for Manila.

This change is based on the usage of environments like in
review https://review.openstack.org/#/c/354019 for Netapp
driver.

Co-Authored-By: Marios Andreou <marios@redhat.com>
Change-Id: If013d796bcdfe48b2c995bcab462c89c360b7367
Depends-On: I918f6f23ae0bd3542bcfe1bf0c797d4e6aa8f4d9
Depends-On: I2b537f735b8d1be8f39e8c274be3872b193c1014

commit | commitdiff | tree

Dan Prince [Thu, 15 Sep 2016 07:19:15 +0000 (09:19 +0200)]

Move keystone::auth into service_config_settings

This patch moves the keystone::auth settings for all
services into the new service_config_settings section. This
is important because we execute the keystone commands via
puppet only on the role containing the keystone service
and without these settings it will fail.

Note that yaql merging/filtering is used here to ensure that
service_config_settings is optional in service templates,
and also that we'll only deploy hieradata for a given
service on a node running the service (the key in
the service_config_settings map must match the service_name
in the service template for this to work).

e.g the following will result in only deploying keystone: 123
in hiera on the role running the "keystone" service,
regardless of which service template defines it.

  service_config_settings:
    keystone:
      keystone: 123

Co-Authored-By: Steven Hardy <shardy@redhat.com>
Change-Id: I0c2fce037a1a38772f998d582a816b4b703f8265
Closes-bug: 1620829

commit | commitdiff | tree

Jenkins [Fri, 23 Sep 2016 11:35:10 +0000 (11:35 +0000)]

Merge "Tolerate missing keys from role_data in service templates"

commit | commitdiff | tree

Giulio Fidente [Fri, 23 Sep 2016 11:26:28 +0000 (13:26 +0200)]

Activate StorageMgmtPort on computes in HCI environment

Change-Id: If4d3b186d1d943ca6fad46427fb3b35699cdfc90

commit | commitdiff | tree

Jenkins [Fri, 23 Sep 2016 10:23:15 +0000 (10:23 +0000)]

Merge "explicitly set fluentd service_provider"

commit | commitdiff | tree

Jenkins [Fri, 23 Sep 2016 09:58:20 +0000 (09:58 +0000)]

Merge "No-op Puppet for upgrades/migrations according to composable roles"

commit | commitdiff | tree

Steven Hardy [Wed, 21 Sep 2016 10:16:03 +0000 (11:16 +0100)]

Remove hard-coded roles in EnabledServices output

This was missed during custom-roles work, and will mean deployments
break if any of the existing roles are removed from roles_data.yaml

Change-Id: Ia737b48a0dd272f8d706b7458764201fa47cb0bb
Closes-Bug: #1625755

commit | commitdiff | tree

Jenkins [Fri, 23 Sep 2016 08:39:09 +0000 (08:39 +0000)]

Merge "Make apache-based services use network-dependent servername"

commit | commitdiff | tree

Brent Eagles [Thu, 22 Sep 2016 15:16:37 +0000 (12:46 -0230)]

Neutron metadata agent worker count fix

This patch changes the default value and type of the NeutronWorkers
parameter, allowing it to be unset and let a system-dependent value to
be used (e.g. processorcount or some derivate value).

Change-Id: Ia385b3503fe405c4b981c451f131ac91e1af5602
Closes-Bug: #1626126

commit | commitdiff | tree

Lars Kellogg-Stedman [Thu, 22 Sep 2016 14:20:17 +0000 (10:20 -0400)]

explicitly set fluentd service_provider

the konstantin-fluentd package assumes sysv init scripts, while the
fluentd package in rhel(/centos/fedora) uses systemd. this can cause
errors starting the service.

This review explicitly sets the service_provider to "systemd".

This requires https://github.com/soylent/konstantin-fluentd/pull/15, which exposes the service_provider parameter in konstantin-fluentd.

Change-Id: I24332203de33f56a0e49fcc15f7fb7bb576e8752

commit | commitdiff | tree

Brent Eagles [Thu, 22 Sep 2016 13:48:08 +0000 (11:18 -0230)]

Deprecate the NeutronL3HA parameter

NeutronL3HA used to be enabled by the tripleoclient if the controller
count > 1. This functionality has been moved into the relevant heat
template, making the parameter less valuable for general use. If
necessary, deployers can override the automatic behavior through extra
config.

Change-Id: Id5bb5070b9627fd545357acc9ef51bdc69d10551
Related-Bug: #1623155

commit | commitdiff | tree

Steven Hardy [Wed, 21 Sep 2016 13:42:52 +0000 (14:42 +0100)]

Tolerate missing keys from role_data in service templates

Currently we have a few keys which may be considered optional,
such as monitoring_subscription, logging* and global_config_settings.

Currently we dereference these directly via get_attr, but this will
break when heat output validation is fixed, ref bug #1599114 is fixed
(patches are up for this, so it may be soon).

Change-Id: If4eed1ca39c10ace9b1cb5ce2dc4b9c70a3dd2f4
Partial-Bug: #1620829

commit | commitdiff | tree

Jiri Stransky [Thu, 22 Sep 2016 12:56:50 +0000 (14:56 +0200)]

No-op Puppet for upgrades/migrations according to composable roles

Our previous no-ops stopped working because the Puppet run resources
moved under a different entry in resource registry. This is now fixed
to follow the latest way.

Change-Id: Ia5598385ddca185bfbf10e2d3babb53f6f77d1ac
Closes-Bug: #1626452

commit | commitdiff | tree

Jenkins [Thu, 22 Sep 2016 09:52:37 +0000 (09:52 +0000)]

Merge "Make sure major upgrade script fails."

commit | commitdiff | tree

Jenkins [Wed, 21 Sep 2016 23:10:12 +0000 (23:10 +0000)]

Merge "Provide for RAM-constrained environments"

commit | commitdiff | tree

Jenkins [Wed, 21 Sep 2016 21:21:11 +0000 (21:21 +0000)]

Merge "Glance worker count fix"

commit | commitdiff | tree

Jenkins [Wed, 21 Sep 2016 21:20:48 +0000 (21:20 +0000)]

Merge "Define step input as a Number type"

commit | commitdiff | tree

Jenkins [Wed, 21 Sep 2016 21:00:21 +0000 (21:00 +0000)]

Merge "Update capabilities-map.yaml"

commit | commitdiff | tree

Jenkins [Wed, 21 Sep 2016 16:00:39 +0000 (16:00 +0000)]

Merge "Set Neutron's metadata_ip to the nova metadata VIP"

commit | commitdiff | tree

Steven Hardy [Wed, 21 Sep 2016 13:53:27 +0000 (14:53 +0100)]

Define step input as a Number type

Currently we pass numbers in (hard-coded in post.j2.yaml) but the
SoftwareConfig schema defaults to String. If puppet requires an
integer number, setting this type may help preserve the type for
the hook.

Change-Id: Ie9227d7adb58ea3c791aa459a1ab5b17ad935919

commit | commitdiff | tree

Joe Talerico [Tue, 2 Aug 2016 18:28:55 +0000 (14:28 -0400)]

Glance worker count fix

This patch changes the default value and type of the Glance worker
configuration to allow it to be unset and allow a system dependent
default to be used (e.g. processorcount or some derivative value). The
previous default of 0 would result in a single self contained process,
which while suitable for debugging and testing is not appropriate for
production deployments.

Partial-Bug: #1626126
Change-Id: I58a6a72a581e7083e1dc4e5ca568fdd3fdd6cdf1

commit | commitdiff | tree

Jiri Stransky [Wed, 21 Sep 2016 11:53:19 +0000 (13:53 +0200)]

Provide for RAM-constrained environments

We hit problems in environments which don't have a lot of RAM (e.g. dev
envs, could be also CI) that Apache ate too much memory due to
too many worker processes being spawned.

This commit allows customizing the Apache MaxRequestWorkers and
ServerLimit directives via Heat parameters. The default stays 256 as
that's the default in the Puppet module, to be suited for production
environments with powerful machines. Also low-memory-usage.yaml
environment file is added, which can be used to make dev/test/CI
overclouds less memory hungry, where the limits are now set to 32.

Change-Id: Ibcf1d9c3326df8bb5b380066166c4ae3c4bf8d96
Co-Authored-By: Carlos Camacho <ccamacho@redhat.com>
Closes-Bug: #1619205

commit | commitdiff | tree

Steven Hardy [Wed, 21 Sep 2016 10:10:47 +0000 (11:10 +0100)]

Make defaults from roles_data.yaml more robust

The previous logic left out the default Count completely when it was
zero, which breaks nested validation and it's likely similar problems
would exist with the other optional defaults, so rework it so the
defaulting happens in the jinja2 logic, and document the interfaces
better in roles_data.yaml

Change-Id: I7f2eb4a3a0b43c5d2cd0d001ed3c73f783c95c74
Closes-Bug: #1625760

commit | commitdiff | tree

Jenkins [Wed, 21 Sep 2016 10:00:42 +0000 (10:00 +0000)]

Merge "Enable L3 HA when multiple controllers and no DVR"

commit | commitdiff | tree

Juan Antonio Osorio Robles [Mon, 5 Sep 2016 11:39:04 +0000 (14:39 +0300)]

Make apache-based services use network-dependent servername

Currently the servername is incorrectly set for the services running
over apache. It currently takes the default value which is just the
regular FQDN, when the services actually might be running on
different IPs that require alternative FQDNs.

This fixes that by filling that value from a fact in hiera that's
dependant on the service's network.

Closes-Bug: #1625677
Change-Id: Ib7ea5fd2d18a376eaa2f5a3fa5687cb9b719a8e2

commit | commitdiff | tree

Sofer Athlan-Guyot [Wed, 7 Sep 2016 09:25:41 +0000 (11:25 +0200)]

Make sure major upgrade script fails.

Running upgrade-non-controller.sh against compute and object storage did
not fail if the /root/tripleo_upgrade_node.sh failed.

This make it harder to detect error in CI system for instance.

Change-Id: I12b7d640547d3b8ec1f70104d159d6052b7638ff
Closes-Bug: 1620973

commit | commitdiff | tree

Jenkins [Tue, 20 Sep 2016 22:40:58 +0000 (22:40 +0000)]

Merge "RabbitMQ threads should be configured dynamically"

commit | commitdiff | tree

Brent Eagles [Tue, 20 Sep 2016 20:51:40 +0000 (18:21 -0230)]

Set Neutron's metadata_ip to the nova metadata VIP

The neutron metadata agent's metadata_ip field is meant to refer to the
nova metadata service, not the local address on the NeutronApiNetwork.

Change-Id: Ibb25a80ea3e66ab3f5cf63c197460d495939778d
Closes-Bug: #1625504

commit | commitdiff | tree

Juan Antonio Osorio Robles [Tue, 20 Sep 2016 10:25:53 +0000 (13:25 +0300)]

Add nova-metadata template

This is needed because currently we're not generating
nova_metadata_vip or nova_metadata_nodes_ip, and a service profile is
required for that. Unfortunately, currently puppet-nova only deploys
osapi and metadata through the same manifest, so this profile doesn't
really inject any puppet code. We can make this more elegant later.

Change-Id: Id7112111f16d0c749a6203b90e29e6d9f1e4d57e
Closes-Bug: #1625543

commit | commitdiff | tree

Michele Baldessari [Tue, 20 Sep 2016 08:11:54 +0000 (10:11 +0200)]

RabbitMQ threads should be configured dynamically

Currently in puppet/services/rabbitmq.yaml we hardcode the thread pool
size to 30 (via the +A30 snippet):
rabbitmq_environment:
RABBITMQ_SERVER_ERL_ARGS: '"+K true +A30 +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<<5000:64/native>>}] -kernel inet_default_listen_options [{raw,6,18,<<5000:64/native>>}]"'

Upstream rabbit has gained the ability to dynamically configure the
number of threads since 3.6.2 via the following commit:
https://github.com/rabbitmq/rabbitmq-server/commit/41ce5ad808863944cd6d62ce7f7e2271f1010582

Given that the default was hardcoded in rabbit from at least 3.4.0 up
until 3.6.2 (see LP bug associated to this commit), we can actually
remove this hardcoded value as it overrides a sane default.

Before the change:
/usr/lib64/erlang/erts-7.3.1/bin/beam.smp -W w -A 64 -K true -A30 -P 1048576 ...

After the change:
/usr/lib64/erlang/erts-7.3.1/bin/beam.smp -W w -A 64 -K true -P 1048576 ...

So effectively with this change we will have the following:
- With older rabbitmq versions we keep the +A30 default
- With rabbitmq versions >= 3.6.2 the thread number is dynamically
computed to nr_cpus * 16

Change-Id: I8d30c7d141c29fcc439d40fc767498520be7966e
Closes-Bug: #1625486

commit | commitdiff | tree

Brent Eagles [Fri, 16 Sep 2016 20:31:00 +0000 (18:01 -0230)]

Enable L3 HA when multiple controllers and no DVR

This patch conditionally enables Neutron L3 HA if there are multiple
controllers but DVR has not been enabled. If the conditions are false,
the value of NeutronL3HA is used.

Change-Id: If1ebeaf417c0da99d833450e394b71cabff2c800
Closes-Bug: #1623155

commit | commitdiff | tree

Jenkins [Mon, 19 Sep 2016 17:23:08 +0000 (17:23 +0000)]

Merge "Add a function to upgrade from full HA to NG HA"

commit | commitdiff | tree

Jenkins [Mon, 19 Sep 2016 15:57:19 +0000 (15:57 +0000)]

Merge "Set VNC URL parameters for nova-compute"

commit | commitdiff | tree

Michele Baldessari [Fri, 26 Aug 2016 14:46:44 +0000 (16:46 +0200)]

Add a function to upgrade from full HA to NG HA

This is the initial work to have a function that migrates a full HA
architecture as deployed in Mitaka to the HA architecture as deployed in
Newton where only a few resources are managed by pacemaker.

The sequence is the following:
1) We remove the desired services from pacemaker's control. The services
   at this point are still running normally via the systemd service as
   invoked by pacemaker
2) We do a "systemctl stop <service>" on all controllers for all the
   services that were removed from pacemaker's control. We do this to make
   sure that during the yum upgrade, the %post sections that call
   "systemctl try-restart" do not take ages, because at this point during
   the upgrade rabbit is down. The only exceptions are "openstack-core"
   and "delay" which are dummy pacemaker resources that do not exist on
   the system
3) We do a "systemctl start <service>" on all nodes for all the services
   mentioned above.

We should probably merge this patch only when newton has branched as it
is very specific to the M/N upgrade.

Closes-Bug: 1617520
Change-Id: I4c409ce58c1a57b6e0decc3cf168b62698b32e39

commit | commitdiff | tree

Giulio Fidente [Wed, 14 Sep 2016 16:15:55 +0000 (18:15 +0200)]

Use osd_pool_default_* puppet parameters when creating the pools

While it is possible to override the pg_num, pgp_num and size for
each pool, the defaults are hardcoded. This patch uses as default
the values given via ceph::profile::params::osd_pool_default_*
parameters, if any.

Closes-Bug: 1623590
Change-Id: Iecde772e7f72fd9abedb54cff4b8f2605df8fedd

commit | commitdiff | tree

Jenkins [Sat, 17 Sep 2016 19:57:02 +0000 (19:57 +0000)]

Merge "M/N upgrade sahara-api fails to restart."

commit | commitdiff | tree

Jenkins [Sat, 17 Sep 2016 18:50:51 +0000 (18:50 +0000)]

Merge "Add fluentd client service"

commit | commitdiff | tree

Jenkins [Sat, 17 Sep 2016 17:38:36 +0000 (17:38 +0000)]

Merge "Move rabbit's clustering port away from the ephemeral port range"

commit | commitdiff | tree

Sofer Athlan-Guyot [Fri, 19 Aug 2016 17:16:33 +0000 (19:16 +0200)]

M/N upgrade sahara-api fails to restart.

Change-Id: I7a041dab8b1b1edc9c80248e1eef3ce7ab272292
Closes-Bug: 1615056

commit | commitdiff | tree

Jenkins [Sat, 17 Sep 2016 17:28:53 +0000 (17:28 +0000)]

Merge "Rework the pacemaker_common_functions for M..N upgrades"

commit | commitdiff | tree

Juan Antonio Osorio Robles [Sat, 17 Sep 2016 07:34:48 +0000 (10:34 +0300)]

Set VNC URL parameters for nova-compute

These are needed so the computes can advertize the VNC URL correctly.

Change-Id: Ic3eba9fe929ce396b584249eb84415de09ab1b62
Closes-Bug: #1623607

commit | commitdiff | tree

Jenkins [Sat, 17 Sep 2016 09:33:38 +0000 (09:33 +0000)]

Merge "Add mongo config settings in collector service templates"

commit | commitdiff | tree

marios [Wed, 25 May 2016 08:56:02 +0000 (11:56 +0300)]

Rework the pacemaker_common_functions for M..N upgrades

For N we cannot assume services are managed by pacemaker.
This adds functions to check if a service is systemd or
pcmk managed and start/stops it accordingly. For pcmk,
only stop/disable on bootstrap node for example, whereas
systemd should stop/start on all controllers.

There is also an equivalent change to the check_resource
which has been reworked to allow both pcmk and systemd.

Implements: blueprint overcloud-upgrades-workflow-mitaka-to-newton
Change-Id: Ic8252736781dc906b3aef8fc756eb8b2f3bb1f02

commit | commitdiff | tree

Jenkins [Sat, 17 Sep 2016 02:53:34 +0000 (02:53 +0000)]

Merge "Add NetApp Manila driver integration and tidy up generic"

commit | commitdiff | tree

Jenkins [Sat, 17 Sep 2016 02:53:28 +0000 (02:53 +0000)]

Merge "Convert AllNodesExtraConfig to support composable roles"

commit | commitdiff | tree

Lars Kellogg-Stedman [Tue, 9 Aug 2016 20:20:18 +0000 (16:20 -0400)]

Add fluentd client service

This implements support for installing fluentd agents as a composable
service on the overcloud.

Depends-On: I2e1abe4d8c8359e56ff626255ee50c9cacca1940

Implements: tripleo-opstools-centralized-logging
Change-Id: I23b0e23881b742158fcfb6b8c145a3211d45086e

commit | commitdiff | tree

Jenkins [Fri, 16 Sep 2016 21:09:15 +0000 (21:09 +0000)]

Merge "Expose parameter to enable combination alarms"

commit | commitdiff | tree

Jenkins [Fri, 16 Sep 2016 20:11:47 +0000 (20:11 +0000)]

Merge "Refactor upgrade checks."

commit | commitdiff | tree

Jenkins [Fri, 16 Sep 2016 19:48:23 +0000 (19:48 +0000)]

Merge "Add CephRgw to roles_data.yaml"

commit | commitdiff | tree

Jenkins [Fri, 16 Sep 2016 19:31:08 +0000 (19:31 +0000)]

Merge "Convert UpdateWorkflow to support composable roles"