1 .. This work is licensed under a Creative Commons Attribution 4.0 International License.
2 .. http://creativecommons.org/licenses/by/4.0
7 OPNFV installers install most components of Doctor framework including
8 OpenStack Nova, Neutron and Cinder (Doctor Controller) and OpenStack
9 Ceilometer and Aodh (Doctor Notifier) except Doctor Monitor.
11 After major components of OPNFV are deployed, you can setup Doctor functions
12 by following instructions in this section. You can also learn detailed
13 steps for all supported installers under `doctor/doctor_tests/installer`_.
15 .. _doctor/doctor_tests/installer: https://git.opnfv.org/doctor/tree/doctor_tests/installer
20 You need to configure one of Doctor Inspectors below. You can also learn detailed steps for
21 all supported Inspectors under `doctor/doctor_tests/inspector`_.
23 .. _doctor/doctor_tests/inspector: https://git.opnfv.org/doctor/tree/doctor_tests/inspector
28 Sample Inspector is intended to show minimum functions of Doctor Inspector.
30 Sample Inspector is suggested to be placed in one of the controller nodes,
31 but it can be put on any host where Sample Inspector can reach and access
32 the OpenStack Controllers (e.g. Nova, Neutron).
34 Make sure OpenStack env parameters are set properly, so that Sample Inspector
35 can issue admin actions such as compute host force-down and state update of VM.
37 Then, you can configure Sample Inspector as follows:
41 git clone https://gerrit.opnfv.org/gerrit/doctor
42 cd doctor/doctor_tests/inspector
44 python sample.py $INSPECTOR_PORT > inspector.log 2>&1 &
48 OpenStack `Congress`_ is a Governance as a Service (previously Policy as a
49 Service). Congress implements Doctor Inspector as it can inspect a fault
50 situation and propagate errors onto other entities.
52 .. _Congress: https://governance.openstack.org/tc/reference/projects/congress.html
54 Congress is deployed by OPNFV Apex installer. You need to enable doctor
55 datasource driver and set policy rules. By the example configuration below,
56 Congress will force down nova compute service when it received a fault event
57 of that compute host. Also, Congress will set the state of all VMs running on
58 that host from ACTIVE to ERROR state.
62 openstack congress datasource create doctor "doctor"
64 openstack congress datasource create --config api_version=$NOVA_MICRO_VERSION \
65 --config username=$OS_USERNAME --config tenant_name=$OS_TENANT_NAME \
66 --config password=$OS_PASSWORD --config auth_url=$OS_AUTH_URL \
69 openstack congress policy rule create \
70 --name host_down classification \
72 doctor:events(hostname=host, type="compute.host.down", status="down")'
74 openstack congress policy rule create \
75 --name active_instance_in_host classification \
76 'active_instance_in_host(vmid, host) :-
77 nova:servers(id=vmid, host_name=host, status="ACTIVE")'
79 openstack congress policy rule create \
80 --name host_force_down classification \
81 'execute[nova:services.force_down(host, "nova-compute", "True")] :-
84 openstack congress policy rule create \
85 --name error_vm_states classification \
86 'execute[nova:servers.reset_state(vmid, "error")] :-
88 active_instance_in_host(vmid, host)'
92 OpenStack `Vitrage`_ is an RCA (Root Cause Analysis) service for organizing,
93 analyzing and expanding OpenStack alarms & events. Vitrage implements Doctor
94 Inspector, as it receives a notification that a host is down and calls Nova
95 force-down API. In addition, it raises alarms on the instances running on this
98 .. _Vitrage: https://wiki.openstack.org/wiki/Vitrage
100 Vitrage is not deployed by OPNFV installers yet. It can be installed either on
101 top of a devstack environment, or on top of a real OpenStack environment. See
102 `Vitrage Installation`_
104 .. _`Vitrage Installation`: https://docs.openstack.org/developer/vitrage/installation-and-configuration.html
106 Doctor SB API and a Doctor datasource were implemented in Vitrage in the Ocata
107 release. The Doctor datasource is enabled by default.
109 After Vitrage is installed and configured, there is a need to configure it to
110 support the Doctor use case. This can be done in a few steps:
112 1. Make sure that 'aodh' and 'doctor' are included in the list of datasource
113 types in /etc/vitrage/vitrage.conf:
118 types = aodh,doctor,nova.host,nova.instance,nova.zone,static,cinder.volume,neutron.network,neutron.port,heat.stack
120 2. Enable the Vitrage Nova notifier. Set the following line in
121 /etc/vitrage/vitrage.conf:
128 3. Add a template that is responsible to call Nova force-down if Vitrage
129 receives a 'compute.host.down' alarm. Copy `template`_ and place it under
130 /etc/vitrage/templates
132 .. _template: https://github.com/openstack/vitrage/blob/master/etc/vitrage/templates.sample/host_down_scenarios.yaml
134 4. Restart the vitrage-graph and vitrage-notifier services
140 Doctor Monitors are suggested to be placed in one of the controller nodes,
141 but those can be put on any host which is reachable to target compute host and
142 accessible by the Doctor Inspector.
143 You need to configure Monitors for all compute hosts one by one. You can also learn detailed
144 steps for all supported monitors under `doctor/doctor_tests/monitor`_.
146 .. _doctor/doctor_tests/monitor: https://git.opnfv.org/doctor/tree/doctor_tests/monitor
149 You can configure the Sample Monitor as follows (Example for Apex deployment):
153 git clone https://gerrit.opnfv.org/gerrit/doctor
154 cd doctor/doctor_tests/monitor
156 COMPUTE_HOST='overcloud-novacompute-1.localdomain.com'
157 COMPUTE_IP=192.30.9.5
158 sudo python sample.py "$COMPUTE_HOST" "$COMPUTE_IP" \
159 "http://127.0.0.1:$INSPECTOR_PORT/events" > monitor.log 2>&1 &
166 In OPNFV and with Doctor testing you can have all OpenStack components configured
167 as needed. Here is sample of the needed configuration modifications.
172 /etc/ceilometer/event_definitions.yaml:
173 # Maintenance use case needs new alarm definitions to be added
174 - event_type: maintenance.scheduled
177 fields: payload.maintenance_at
180 fields: payload.allowed_actions
182 fields: payload.host_id
184 fields: payload.instances
186 fields: payload.metadata
188 fields: payload.project_id
190 fields: payload.reply_url
192 fields: payload.session_id
194 fields: payload.state
195 - event_type: maintenance.host
200 fields: payload.project_id
202 fields: payload.session_id
204 fields: payload.state
206 /etc/ceilometer/event_pipeline.yaml:
207 # Maintenance and Fault management both needs these to be added
209 - notifier://?topic=alarm.all
215 cpu_allocation_ratio=1.0