From: Gerald Kunzmann <kunzmann@docomolab-euro.com>
Date: Fri, 1 Jul 2016 10:13:16 +0000 (+0000)
Subject: Merge "Pointer to "Port status update" RFE in upstream"
X-Git-Tag: colorado.1.0~55
X-Git-Url: https://gerrit.opnfv.org/gerrit/gitweb?a=commitdiff_plain;h=d97f8f2c49ef6a81fb5bd504a3f78af1f0f00fc6;hp=642239c778639a0504cd4709237b5eb5d0b6cba5;p=doctor.git

Merge "Pointer to "Port status update" RFE in upstream"
---

diff --git a/INFO b/INFO
index 67d08e1d..4bce1d67 100644
--- a/INFO
+++ b/INFO
@@ -13,6 +13,7 @@ Repository: doctor
 Committers:
 Ashiq Khan (NTT DOCOMO, khan@nttdocomo.com)
 Carlos Goncalves (NEC, Carlos.Goncalves@neclab.eu)
+Dong Wenjuan (ZTE, dong.wenjuan@zte.com.cn)
 Gerald Kunzmann (NTT DOCOMO, kunzmann@docomolab-euro.com)
 Mario Cho (hephaex@gmail.com)
 Peter Lee (ClearPath Networks, plee@clearpathnet.com)
@@ -26,3 +27,4 @@ Link to TSC approval of the project: http://meetbot.opnfv.org/meetings/opnfv-mee
 Link(s) to approval of committer update:
 http://lists.opnfv.org/pipermail/opnfv-tsc/2015-June/000905.html
 http://lists.opnfv.org/pipermail/opnfv-tech-discuss/2015-June/003165.html
+http://lists.opnfv.org/pipermail/opnfv-tech-discuss/2016-June/011245.html
diff --git a/docs/requirements/05-implementation.rst b/docs/requirements/05-implementation.rst
index e0753bdd..297f34a3 100644
--- a/docs/requirements/05-implementation.rst
+++ b/docs/requirements/05-implementation.rst
@@ -644,6 +644,95 @@ Parameters:
   resources. For each resource, information about the current state, the
   firmware version, etc. is provided.
 
+
+Detailed southbound interface specification
+-------------------------------------------
+
+This section is specifying the southbound interfaces for fault management
+between the Monitors and the Inspector.
+Although southbound interfaces should be flexible to handle various events from
+different types of Monitors, we define unified event API in order to improve
+interoperability between the Monitors and the Inspector.
+This is not limiting implementation of Monitor and Inspector as these could be
+extended in order to support failures from intelligent inspection like prediction.
+
+Note: The interface definition will be aligned with current work in ETSI NFV IFA
+working group.
+
+Fault event interface
+^^^^^^^^^^^^^^^^^^^^^
+
+This interface allows the Monitors to notify the Inspector about an event which
+was captured by the Monitor and may effect resources managed in the VIM.
+
+EventNotification
+_________________
+
+
+Event notification including fault description.
+The entity of this notification is event, and not fault or error specifically.
+This allows us to use generic event format or framework build out of Doctor project.
+The parameters below shall be mandatory, but keys in 'Details' can be optional.
+
+Parameters:
+
+* Time [1]: Datetime when the fault was observed in the Monitor.
+* Type [1]: Type of event that will be used to process correlation in Inspector.
+* Details [0..1]: Details containing additional information with Key-value pair style.
+  Keys shall be defined depending on the Type of the event.
+
+E.g.:
+
+.. code-block:: bash
+
+    {
+        'event': {
+            'time': '2016-04-12T08:00:00',
+            'type': 'compute.host.down',
+            'details': {
+                'hostname': 'compute-1',
+                'source': 'sample_monitor',
+                'cause': 'link-down',
+                'severity': 'critical',
+                'status': 'down',
+                'monitor_id': 'monitor-1',
+                'monitor_event_id': '123',
+            }
+        }
+    }
+
+Optional parameters in 'Details':
+
+* Hostname: the hostname on which the event occurred.
+* Source: the display name of reporter of this event. This is not limited to monitor, other entity can be specified such as 'KVM'.
+* Cause: description of the cause of this event which could be different from the type of this event.
+* Severity: the severity of this event set by the monitor.
+* Status: the status of target object in which error occurred.
+* MonitorID: the ID of the monitor sending this event.
+* MonitorEventID: the ID of the event in the monitor. This can be used by operator while tracking the monitor log.
+* RelatedTo: the array of IDs which related to this event.
+
+Also, we can have bulk API to receive multiple events in a single HTTP POST
+message by using the 'events' wrapper as follows:
+
+.. code-block:: bash
+
+    {
+        'events': [
+            'event': {
+                'time': '2016-04-12T08:00:00',
+                'type': 'compute.host.down',
+                'details': {},
+            },
+            'event': {
+                'time': '2016-04-12T08:00:00',
+                'type': 'compute.host.nic.error',
+                'details': {},
+            }
+        ]
+    }
+
+
 Blueprints
 ----------
 
@@ -838,6 +927,3 @@ as Doctor will be doing. Also this BP might need enhancement to change server
 and service states correctly.
 
 .. [*] https://blueprints.launchpad.net/nova/+spec/pacemaker-servicegroup-driver
-
-..
- vim: set tabstop=4 expandtab textwidth=80:
diff --git a/docs/userguide/doctor_scenario_in_functest.rst b/docs/userguide/doctor_scenario_in_functest.rst
new file mode 100644
index 00000000..3a435706
--- /dev/null
+++ b/docs/userguide/doctor_scenario_in_functest.rst
@@ -0,0 +1,118 @@
+.. This work is licensed under a Creative Commons Attribution 4.0 International License.
+.. http://creativecommons.org/licenses/by/4.0
+
+Doctor
+^^^^^^
+
+Platform overview
+"""""""""""""""""
+
+Doctor platform provides these features in `Colorado Release <https://wiki.opnfv.org/display/SWREL/Colorado>`_:
+
+* Immediate Notification
+* Consistent resource state awareness for compute host down
+* Valid compute host status given to VM owner
+
+These features enable high availability of Network Services on top of
+the virtualized infrastructure. Immediate notification allows VNF managers
+(VNFM) to process recovery actions promptly once a failure has occurred.
+
+Consistency of resource state is necessary to execute recovery actions
+properly in the VIM.
+
+Ability to query host status gives VM owner the possibility to get
+consistent state information through an API in case of a compute host
+fault.
+
+The Doctor platform consists of the following components:
+
+* OpenStack Compute (Nova)
+* OpenStack Telemetry (Ceilometer)
+* OpenStack Alarming (Aodh)
+* Doctor Inspector
+* Doctor Monitor
+
+.. note::
+    Doctor Inspector and Monitor are sample implementations for reference.
+
+You can see an overview of the Doctor platform and how components interact in
+:numref:`figure-p1`.
+
+.. figure:: /platformoverview/images/figure-p1.png
+    :name: figure-p1
+    :width: 100%
+
+    Doctor platform and typical sequence (Colorado)
+
+Detailed information on the Doctor architecture can be found in the Doctor
+requirements documentation:
+http://artifacts.opnfv.org/doctor/docs/requirements/05-implementation.html
+
+
+Use case
+""""""""
+
+* A consumer of the NFVI wants to receive immediate notifications about faults
+  in the NFVI affecting the proper functioning of the virtual resources.
+  Therefore, such faults have to be detected as quickly as possible, and, when
+  a critical error is observed, the affected consumer is immediately informed
+  about the fault and can switch over to the STBY configuration.
+
+The faults to be monitored (and at which detection rate) will be configured by
+the consumer. Once a fault is detected, the Inspector in the Doctor
+architecture will check the resource map maintained by the Controller, to find
+out which virtual resources are affected and then update the resources state.
+The Notifier will receive the failure event requests sent from the Controller,
+and notify the consumer(s) of the affected resources according to the alarm
+configuration.
+
+Detailed workflow information is as follows:
+
+* Consumer(VNFM): (step 0) creates resources (network, server/instance) and an
+  event alarm on state down notification of that server/instance
+
+* Monitor: (step 1) periodically checks nodes, such as ping from/to each
+  dplane nic to/from gw of node, (step 2) once it fails to send out event
+  with "raw" fault event information to Inspector
+
+* Inspector: when it receives an event, it will (step 3) mark the host down
+  ("mark-host-down"), (step 4) map the PM to VM, and change the VM status to
+  down
+
+* Controller: (step 5) sends out instance update event to Ceilometer
+
+* Notifier: (step 6) Ceilometer transforms and passes the event to Aodh,
+  (step 7) Aodh will evaluate event with the registered alarm definitions,
+  then (step 8) it will fire the alarm to the "consumer" who owns the
+  instance
+
+* Consumer(VNFM): (step 9) receives the event and (step 10) recreates a new
+  instance
+
+Test case
+"""""""""
+
+Functest will call the "run.sh" script in Doctor to run the test job.
+
+The "run.sh" script will execute the following steps.
+
+Firstly, verify connectivity to target compute host according to different
+installer and prepare image for booting VM. Currently, only 'Apex' and
+'local' installer are supported.
+
+Secondly, the Doctor components are started, and, based on the above
+preparation, a test user (default as demo) will be created for the Doctor
+tests.
+
+Thirdly, the VM is booted, and an alarm event is created in Ceilometer.
+After sleeping for 1 minute in order to wait for the VM launch to complete,
+a failure is injected to the system, i.e. the network of compute host is
+disabled for 3 minutes. To ensure the host is down, the status of the host
+will be checked.
+
+Finally, the notification time, i.e. the time between the execution of step 2
+(Monitor detects failure) and step 9 (Consumer receives failure notification)
+is calculated.
+
+According to the Doctor requirements, the Doctor test is successful if the
+notification time is below 1 second.
diff --git a/tests/run.sh b/tests/run.sh
index 56bacca7..d7240d24 100755
--- a/tests/run.sh
+++ b/tests/run.sh
@@ -166,12 +166,17 @@ stop_consumer() {
 
 wait_for_vm_launch() {
     echo "waiting for vm launch..."
-    while true
+    count=0
+    while [[ ${count} -lt 60 ]]
     do
         state=$(nova list | grep " $VM_NAME " | awk '{print $6}')
         [[ "$state" == "ACTIVE" ]] && return 0
+        [[ "$state" == "ERROR" ]] && echo "vm state is ERROR" && exit 1
+        count=$(($count+1))
         sleep 1
     done
+    echo "ERROR: time out while waiting for vm launch"
+    exit 1
 }
 
 inject_failure() {