doc/03-Functional_Requirements.rst

   1 Functional Requirements
   2 -----------------------
   3
   4 Basic Actions
   5 ~~~~~~~~~~~~~
   6
   7 This section describes the basic functions may required by Escalator.
   8
   9 Preparation (offline)
  10 ^^^^^^^^^^^^^^^^^^^^^
  11
  12 This is the design phase when the upgrade plan (or upgrade campaign) is
  13 being designed so that it can be executed automatically with minimal
  14 service outage. It may include the following work:
  15
  16 1. Check the dependencies of the software modules and their impact,
  17    backward compatibilities to figure out the appropriate upgrade method
  18    and ordering.
  19 2. Find out if a rolling upgrade could be planned with several rolling
  20    steps to avoid any service outage due to the upgrade some
  21    parts/services at the same time.
  22 3. Collect the proper version files and check the integration for
  23    upgrading.
  24 4. The preparation step should produce an output (i.e. upgrade
  25    campaign/plan), which is executable automatically in an NFV Framework
  26    and which can be validated before execution.
  27
  28    -  The upgrade campaign should not be referring to scalable entities
  29       directly, but allow for adaptation to the system configuration and
  30       state at any given moment.
  31    -  The upgrade campaign should describe the ordering of the upgrade
  32       of different entities so that dependencies, redundancies can be
  33       maintained during the upgrade execution
  34    -  The upgrade campaign should provide information about the
  35       applicable recovery procedures and their ordering.
  36    -  The upgrade campaign should consider information about the
  37       verification/testing procedures to be performed during the upgrade
  38       so that upgrade failures can be detected as soon as possible and
  39       the appropriate recovery procedure can be identified and applied.
  40    -  The upgrade campaign should provide information on the expected
  41       execution time so that hanging execution can be identified
  42    -  The upgrade campaign should indicate any point in the upgrade when
  43       coordination with the users (VNFs) is required.
  44
  45 ==[hujie]Depends on the attributes of the object being upgraded, the
  46 upgrade plan may be slitted into step(s) and/or sub-plan(s), and even
  47 more small sub-plans in design phase. The plan(s) or sub-plan(s) my
  48 include step(s) or sub-plan(s).==
  49
  50 Validation the upgrade plan / Checking the pre-requisites of System( offline / online)
  51 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  52
  53 | The upgrade plan should be validated before the execution by testing
  54   it in a test environment which is similar to the product environment.
  55 | ==[MT]However it could also mean that we can identify some properties
  56   that it should satisfy e.g. what operations can or cannot be executed
  57   simultaneously like never take out two VMs of the same VNF.
  58 | Another question is if it requires that the system is in a particular
  59   state when the upgrade is applied. I.e. if there's certain amount of
  60   redundancy in the system, migration is enabled for VMs, when the NFVI
  61   is upgraded the VIM is healthy, when the VIM is upgraded the NFVI is
  62   healthy, etc.
  63 | I'm not sure what online validation means: Is it the validation of the
  64   upgrade plan/campaign or the validation of the system that it is in a
  65   state that the upgrade can be performed without too much risk?==
  66
  67 | Before the upgrade plan being executed, the system heathly of the
  68   online product environment should be checked and confirmed to satisfy
  69   the requirements which were described in the upgrade plan. The
  70   sysinfo, e.g. which included system alarms, performance statistics and
  71   diagnostic logs, will be collected and analyized. It is required to
  72   resolve all of the system faults or exclud the unhealthy part before
  73   executing the upgrade plan.
  74 | ==[hujie] Text merged.==
  75
  76 Backup/Snapshot (online)
  77 ^^^^^^^^^^^^^^^^^^^^^^^^
  78
  79 For avoid loss of data when a unsuccessful upgrade was encountered, the
  80 data should be backuped and the system state snapshot should be taken
  81 before the excution of upgrade plan. This would be considered in the
  82 upgrade plan.
  83
  84 Several backups/Snapshots may be generated and stored before the single
  85 steps of changes. The following data/files are required to be
  86 considered:
  87
  88 1. running version files for each node.
  89 2. system components' configuration file and database.
  90 3. image and storage, if it is necessary.
  91    ==[MT] Does 3 imply VNF image and storage? I.e. VNF state and data?==
  92
  93 | ==[hujie] The following text is derived from previous "4. Negotiate
  94   with the VNF if it's ready for the upgrade"==
  95
  96 | Although the upper layer, which include VNFs and VNFMs, is out of the
  97   scope of Escalator, but it is still recommended to let it ready for a
  98   smooth system upgrade. The escalator could not guarantee the safe of
  99   VNFs. The upper layer should have some safe guard mechanism in design,
 100   and ready for avoiding failure in system upgrade.
 101
 102 Execution (online)
 103 ^^^^^^^^^^^^^^^^^^
 104
 105 | The execution of upgrade plan should be a dynamical procedure which is
 106   controlled by Escalator.
 107 | ==[hujie] Revised text to be general.==
 108
 109 1. It is required to supporting execution ether in sequence or in
 110    parallel.
 111 2. It is required to checke the result of the execution and take the
 112    action according the situation and the policies in the upgrade plan.
 113 3. It is required to execute properly on various configurations of
 114    system object. I.e. stand-alone, HA, etc.
 115 4. It is required to excecute on the designated different parts of the
 116    system. I.e. physical server, virtualized server, rack, chassis,
 117    cluster, even different geographical places.
 118
 119 Testing (online)
 120 ^^^^^^^^^^^^^^^^
 121
 122 | The testing after upgrade the whole system or parts of system to make
 123   sure the upgraded system(object) is working normally.
 124 | ==[hujie] Revised text to be general.==
 125
 126 1. It is recommended to run the prepared test cases to see if the
 127    functionalities are available without any problem.
 128 2. It is recommended to check the sysinfo, e.g. system alarms,
 129    performance statistics and diagnostic logs to see if there are any
 130    abnormal.
 131
 132 Restore/Roll-back (online)
 133 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 134
 135 | When upgrade is failure unfortunately, a quick system restore or system
 136   roll-back should be taken to recovery the system and the services.
 137 | ==[hujie] Revised text to be general.==
 138
 139 1. It is recommend to support system restore from backup when upgrade
 140    was failed.
 141 2. It is recommend to support graceful roll-back with reverse order
 142    steps if possible.
 143
 144 Monitoring (online)
 145 ^^^^^^^^^^^^^^^^^^^
 146
 147 | Escalator should continually monitor the process of upgrade. It is
 148   keeping update status of each module, each node, each cluster into a
 149   status table during upgrade.
 150 | ==[hujie] Revised text to be general.==
 151
 152 1. It is required to collect the status of every objects being upgraded
 153    and sending abnormal alerms during the upgrade.
 154 2. It is recommend to reuse the existing monitoring system, like alarm.
 155 3. It is recommend to support pro-actively query.
 156 4. It is recommend to support passively wait for notification.
 157
 158 | **Two possible ways for monitoring:**
 159 | **Pro-Actively Query** requires NFVI/VIM provides proper API or CLI
 160   interface. If Escalator serves as a service, it should pass on these
 161   interfaces.
 162 | **Passively Wait for Notification** requires Escalator provides
 163   callback interface, which could be used by NFVI/VIM systems or upgrade
 164   agent to send back notification.
 165 | [hujie] I am not sure why not to subscribe the notification.
 166
 167 Logging (online)
 168 ^^^^^^^^^^^^^^^^
 169
 170 Record the information generated by escalator into log files. The log
 171 file is used for manual diagnostic of exceptions.
 172
 173 1. It is required to support logging.
 174 2. It is recommended to include time stamp, object id, action name,
 175    error code, etc.
 176
 177 Administrative Control (online)
 178 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 179
 180 Administrative Control is used for control the privilege to start any
 181 escalator's actions for avoiding unauthorized operations.
 182
 183 #. It is required to support administrative control mechanism
 184 #. It is recommend to reuse the system's own secure system.
 185 #. It is required to avoid conflicts when the system's own secure system
 186    being upgraded.
 187
 188 Requirements on Object being upgraded
 189 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 190
 191 | ==We can develop BPs in future from requirements of this section and
 192   gap analysis for upper stream projects==
 193 | Escalator focus on smooth upgrade. In practical implementation, it
 194   might be combined with installer/deplorer, or act as an independent
 195   tool/service. In either way, it requires targeting systems(NFVI and
 196   VIM) are developed/deployed in a way that Escalator could perform
 197   upgrade on them.
 198
 199 On NFVI system, live-migration is likely used to maintain availability
 200 because OPNFV would like to make HA transparent from end user. This
 201 requires VIM system being able to put compute node into maintenance mode
 202 and then isolated from normal service. Otherwise, new NFVI instances
 203 might risk at being schedule into the upgrading node.
 204
 205 | On VIM system, availability is likely achieved by redundancy. This
 206   impose less requirements on system/services being upgrade (see PVA
 207   comments in early version). However, there should be a way to put the
 208   target system into standby mode. Because starting upgrade on the
 209   master node in a cluster is likely a bad idea.
 210 | ==[hujie] Revised text to be general.==
 211
 212 1. It is required for NFVI/VIM to support **service handover** mechanism
 213    that minimize interruption to 0.001%(i.e. 99.999% service
 214    availability). Possible implementations are live-migration, redundant
 215    deployment, etc, (Note: for VIM, interruption could be less
 216    restrictive)
 217 2. It is required for NFVI/VIM to restore the early version in a efficient
 218    way, such as **snapshot**.
 219 3. It is required for NFVI/VIM to **migration data** efficiently between
 220    base and upgraded system.
 221    ==[hujie] What is exact meaning of "base" here?==
 222 4. It is recommend for NFV/VIM's interface to support upgrade
 223    orchestration, e.g. reading/setting system state
 224    ==[hujie] I am not sure if it reflect the previous text.==