1 .. This work is licensed under a Creative Commons Attribution 4.0 International License.
2 .. http://creativecommons.org/licenses/by/4.0
10 Doctor platform provides these features in `Colorado Release <https://wiki.opnfv.org/display/SWREL/Colorado>`_:
12 * Immediate Notification
13 * Consistent resource state awareness for compute host down
14 * Valid compute host status given to VM owner
16 These features enable high availability of Network Services on top of
17 the virtualized infrastructure. Immediate notification allows VNF managers
18 (VNFM) to process recovery actions promptly once a failure has occurred.
20 Consistency of resource state is necessary to execute recovery actions
23 Ability to query host status gives VM owner the possibility to get
24 consistent state information through an API in case of a compute host
27 The Doctor platform consists of the following components:
29 * OpenStack Compute (Nova)
30 * OpenStack Telemetry (Ceilometer)
31 * OpenStack Alarming (Aodh)
36 Doctor Inspector and Monitor are sample implementations for reference.
38 You can see an overview of the Doctor platform and how components interact in
41 .. figure:: /platformoverview/images/figure-p1.png
45 Doctor platform and typical sequence (Colorado)
47 Detailed information on the Doctor architecture can be found in the Doctor
48 requirements documentation:
49 http://artifacts.opnfv.org/doctor/docs/requirements/05-implementation.html
55 * A consumer of the NFVI wants to receive immediate notifications about faults
56 in the NFVI affecting the proper functioning of the virtual resources.
57 Therefore, such faults have to be detected as quickly as possible, and, when
58 a critical error is observed, the affected consumer is immediately informed
59 about the fault and can switch over to the STBY configuration.
61 The faults to be monitored (and at which detection rate) will be configured by
62 the consumer. Once a fault is detected, the Inspector in the Doctor
63 architecture will check the resource map maintained by the Controller, to find
64 out which virtual resources are affected and then update the resources state.
65 The Notifier will receive the failure event requests sent from the Controller,
66 and notify the consumer(s) of the affected resources according to the alarm
69 Detailed workflow information is as follows:
71 * Consumer(VNFM): (step 0) creates resources (network, server/instance) and an
72 event alarm on state down notification of that server/instance
74 * Monitor: (step 1) periodically checks nodes, such as ping from/to each
75 dplane nic to/from gw of node, (step 2) once it fails to send out event
76 with "raw" fault event information to Inspector
78 * Inspector: when it receives an event, it will (step 3) mark the host down
79 ("mark-host-down"), (step 4) map the PM to VM, and change the VM status to
82 * Controller: (step 5) sends out instance update event to Ceilometer
84 * Notifier: (step 6) Ceilometer transforms and passes the event to Aodh,
85 (step 7) Aodh will evaluate event with the registered alarm definitions,
86 then (step 8) it will fire the alarm to the "consumer" who owns the
89 * Consumer(VNFM): (step 9) receives the event and (step 10) recreates a new
95 Functest will call the "run.sh" script in Doctor to run the test job.
97 The "run.sh" script will execute the following steps.
99 Firstly, verify connectivity to target compute host according to different
100 installer and prepare image for booting VM. Currently, only 'Apex' and
101 'local' installer are supported.
103 Secondly, the Doctor components are started, and, based on the above
104 preparation, a test user (default as demo) will be created for the Doctor
107 Thirdly, the VM is booted, and an alarm event is created in Ceilometer.
108 After sleeping for 1 minute in order to wait for the VM launch to complete,
109 a failure is injected to the system, i.e. the network of compute host is
110 disabled for 3 minutes. To ensure the host is down, the status of the host
113 Finally, the notification time, i.e. the time between the execution of step 2
114 (Monitor detects failure) and step 9 (Consumer receives failure notification)
117 According to the Doctor requirements, the Doctor test is successful if the
118 notification time is below 1 second.