From: Gerald Kunzmann Date: Fri, 1 Jul 2016 10:13:16 +0000 (+0000) Subject: Merge "Pointer to "Port status update" RFE in upstream" X-Git-Tag: colorado.1.0~55 X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=commitdiff_plain;h=d97f8f2c49ef6a81fb5bd504a3f78af1f0f00fc6;hp=642239c778639a0504cd4709237b5eb5d0b6cba5;p=doctor.git Merge "Pointer to "Port status update" RFE in upstream" --- diff --git a/INFO b/INFO index 67d08e1d..4bce1d67 100644 --- a/INFO +++ b/INFO @@ -13,6 +13,7 @@ Repository: doctor Committers: Ashiq Khan (NTT DOCOMO, khan@nttdocomo.com) Carlos Goncalves (NEC, Carlos.Goncalves@neclab.eu) +Dong Wenjuan (ZTE, dong.wenjuan@zte.com.cn) Gerald Kunzmann (NTT DOCOMO, kunzmann@docomolab-euro.com) Mario Cho (hephaex@gmail.com) Peter Lee (ClearPath Networks, plee@clearpathnet.com) @@ -26,3 +27,4 @@ Link to TSC approval of the project: http://meetbot.opnfv.org/meetings/opnfv-mee Link(s) to approval of committer update: http://lists.opnfv.org/pipermail/opnfv-tsc/2015-June/000905.html http://lists.opnfv.org/pipermail/opnfv-tech-discuss/2015-June/003165.html +http://lists.opnfv.org/pipermail/opnfv-tech-discuss/2016-June/011245.html diff --git a/docs/requirements/05-implementation.rst b/docs/requirements/05-implementation.rst index e0753bdd..297f34a3 100644 --- a/docs/requirements/05-implementation.rst +++ b/docs/requirements/05-implementation.rst @@ -644,6 +644,95 @@ Parameters: resources. For each resource, information about the current state, the firmware version, etc. is provided. + +Detailed southbound interface specification +------------------------------------------- + +This section is specifying the southbound interfaces for fault management +between the Monitors and the Inspector. +Although southbound interfaces should be flexible to handle various events from +different types of Monitors, we define unified event API in order to improve +interoperability between the Monitors and the Inspector. +This is not limiting implementation of Monitor and Inspector as these could be +extended in order to support failures from intelligent inspection like prediction. + +Note: The interface definition will be aligned with current work in ETSI NFV IFA +working group. + +Fault event interface +^^^^^^^^^^^^^^^^^^^^^ + +This interface allows the Monitors to notify the Inspector about an event which +was captured by the Monitor and may effect resources managed in the VIM. + +EventNotification +_________________ + + +Event notification including fault description. +The entity of this notification is event, and not fault or error specifically. +This allows us to use generic event format or framework build out of Doctor project. +The parameters below shall be mandatory, but keys in 'Details' can be optional. + +Parameters: + +* Time [1]: Datetime when the fault was observed in the Monitor. +* Type [1]: Type of event that will be used to process correlation in Inspector. +* Details [0..1]: Details containing additional information with Key-value pair style. + Keys shall be defined depending on the Type of the event. + +E.g.: + +.. code-block:: bash + + { + 'event': { + 'time': '2016-04-12T08:00:00', + 'type': 'compute.host.down', + 'details': { + 'hostname': 'compute-1', + 'source': 'sample_monitor', + 'cause': 'link-down', + 'severity': 'critical', + 'status': 'down', + 'monitor_id': 'monitor-1', + 'monitor_event_id': '123', + } + } + } + +Optional parameters in 'Details': + +* Hostname: the hostname on which the event occurred. +* Source: the display name of reporter of this event. This is not limited to monitor, other entity can be specified such as 'KVM'. +* Cause: description of the cause of this event which could be different from the type of this event. +* Severity: the severity of this event set by the monitor. +* Status: the status of target object in which error occurred. +* MonitorID: the ID of the monitor sending this event. +* MonitorEventID: the ID of the event in the monitor. This can be used by operator while tracking the monitor log. +* RelatedTo: the array of IDs which related to this event. + +Also, we can have bulk API to receive multiple events in a single HTTP POST +message by using the 'events' wrapper as follows: + +.. code-block:: bash + + { + 'events': [ + 'event': { + 'time': '2016-04-12T08:00:00', + 'type': 'compute.host.down', + 'details': {}, + }, + 'event': { + 'time': '2016-04-12T08:00:00', + 'type': 'compute.host.nic.error', + 'details': {}, + } + ] + } + + Blueprints ---------- @@ -838,6 +927,3 @@ as Doctor will be doing. Also this BP might need enhancement to change server and service states correctly. .. [*] https://blueprints.launchpad.net/nova/+spec/pacemaker-servicegroup-driver - -.. - vim: set tabstop=4 expandtab textwidth=80: diff --git a/docs/userguide/doctor_scenario_in_functest.rst b/docs/userguide/doctor_scenario_in_functest.rst new file mode 100644 index 00000000..3a435706 --- /dev/null +++ b/docs/userguide/doctor_scenario_in_functest.rst @@ -0,0 +1,118 @@ +.. This work is licensed under a Creative Commons Attribution 4.0 International License. +.. http://creativecommons.org/licenses/by/4.0 + +Doctor +^^^^^^ + +Platform overview +""""""""""""""""" + +Doctor platform provides these features in `Colorado Release `_: + +* Immediate Notification +* Consistent resource state awareness for compute host down +* Valid compute host status given to VM owner + +These features enable high availability of Network Services on top of +the virtualized infrastructure. Immediate notification allows VNF managers +(VNFM) to process recovery actions promptly once a failure has occurred. + +Consistency of resource state is necessary to execute recovery actions +properly in the VIM. + +Ability to query host status gives VM owner the possibility to get +consistent state information through an API in case of a compute host +fault. + +The Doctor platform consists of the following components: + +* OpenStack Compute (Nova) +* OpenStack Telemetry (Ceilometer) +* OpenStack Alarming (Aodh) +* Doctor Inspector +* Doctor Monitor + +.. note:: + Doctor Inspector and Monitor are sample implementations for reference. + +You can see an overview of the Doctor platform and how components interact in +:numref:`figure-p1`. + +.. figure:: /platformoverview/images/figure-p1.png + :name: figure-p1 + :width: 100% + + Doctor platform and typical sequence (Colorado) + +Detailed information on the Doctor architecture can be found in the Doctor +requirements documentation: +http://artifacts.opnfv.org/doctor/docs/requirements/05-implementation.html + + +Use case +"""""""" + +* A consumer of the NFVI wants to receive immediate notifications about faults + in the NFVI affecting the proper functioning of the virtual resources. + Therefore, such faults have to be detected as quickly as possible, and, when + a critical error is observed, the affected consumer is immediately informed + about the fault and can switch over to the STBY configuration. + +The faults to be monitored (and at which detection rate) will be configured by +the consumer. Once a fault is detected, the Inspector in the Doctor +architecture will check the resource map maintained by the Controller, to find +out which virtual resources are affected and then update the resources state. +The Notifier will receive the failure event requests sent from the Controller, +and notify the consumer(s) of the affected resources according to the alarm +configuration. + +Detailed workflow information is as follows: + +* Consumer(VNFM): (step 0) creates resources (network, server/instance) and an + event alarm on state down notification of that server/instance + +* Monitor: (step 1) periodically checks nodes, such as ping from/to each + dplane nic to/from gw of node, (step 2) once it fails to send out event + with "raw" fault event information to Inspector + +* Inspector: when it receives an event, it will (step 3) mark the host down + ("mark-host-down"), (step 4) map the PM to VM, and change the VM status to + down + +* Controller: (step 5) sends out instance update event to Ceilometer + +* Notifier: (step 6) Ceilometer transforms and passes the event to Aodh, + (step 7) Aodh will evaluate event with the registered alarm definitions, + then (step 8) it will fire the alarm to the "consumer" who owns the + instance + +* Consumer(VNFM): (step 9) receives the event and (step 10) recreates a new + instance + +Test case +""""""""" + +Functest will call the "run.sh" script in Doctor to run the test job. + +The "run.sh" script will execute the following steps. + +Firstly, verify connectivity to target compute host according to different +installer and prepare image for booting VM. Currently, only 'Apex' and +'local' installer are supported. + +Secondly, the Doctor components are started, and, based on the above +preparation, a test user (default as demo) will be created for the Doctor +tests. + +Thirdly, the VM is booted, and an alarm event is created in Ceilometer. +After sleeping for 1 minute in order to wait for the VM launch to complete, +a failure is injected to the system, i.e. the network of compute host is +disabled for 3 minutes. To ensure the host is down, the status of the host +will be checked. + +Finally, the notification time, i.e. the time between the execution of step 2 +(Monitor detects failure) and step 9 (Consumer receives failure notification) +is calculated. + +According to the Doctor requirements, the Doctor test is successful if the +notification time is below 1 second. diff --git a/tests/run.sh b/tests/run.sh index 56bacca7..d7240d24 100755 --- a/tests/run.sh +++ b/tests/run.sh @@ -166,12 +166,17 @@ stop_consumer() { wait_for_vm_launch() { echo "waiting for vm launch..." - while true + count=0 + while [[ ${count} -lt 60 ]] do state=$(nova list | grep " $VM_NAME " | awk '{print $6}') [[ "$state" == "ACTIVE" ]] && return 0 + [[ "$state" == "ERROR" ]] && echo "vm state is ERROR" && exit 1 + count=$(($count+1)) sleep 1 done + echo "ERROR: time out while waiting for vm launch" + exit 1 } inject_failure() {