docs/requirements/103-Background.rst

   1 ==========
   2 Background
   3 ==========
   4
   5 Upgrade Objects
   6 ===============
   7
   8 Physical Resource
   9 ^^^^^^^^^^^^^^^^^
  10
  11 Most cloud infrastructures support the dynamic addition and removal of
  12 hardware. Accordingly a hardware upgrade could be done by adding the new
  13 piece of hardware and removing the old one. From the persepctive of smooth
  14 upgrade the orchestration/scheduling of these actions is the primary concern.
  15
  16 Upgrading a physical resource may involve as well the upgrade of its firmware
  17 and/or modifying its configuration data. This may require the restart of the
  18 hardware.
  19
  20 Virtual Resources
  21 ^^^^^^^^^^^^^^^^^
  22
  23 Addition and removal of virtual resources may be initiated by the users or be
  24 a result of an elasticity action. Users may also request the upgrade of their
  25 virtual resources using a new VM image.
  26
  27 .. Needs to be moved to requirement section: Escalator should facilitate such an
  28    option and allow for a smooth upgrade.
  29
  30 On the other hand changes in the infrastructure, namely, in the hardware and/or
  31 the virtualization facility resources may result in the upgrade of the virtual
  32 resources. For example if by some reason the hypervisor is changed and
  33 the current VMs cannot be migrated to the new hypervisor - they are
  34 incompatible - then the VMs need to be upgraded too. This is not
  35 something the NFVI user (i.e. VNFs ) would know about.
  36
  37
  38 Virtualization Facility Resources
  39 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  40
  41 Based on the functionality they provide, virtualization facility
  42 resources could be divided into computing node, networking node,
  43 storage node and management node.
  44
  45 The possible upgrade objects in these nodes are considered below:
  46 (Note: hardware based virtualization may be considered as virtualization
  47 facility resource, but from escalator perspective, it is better to
  48 consider it as part of the hardware upgrade. )
  49
  50 **Computing node**
  51
  52 1. OS Kernel
  53
  54 2. Hypvervisor and virtual switch
  55
  56 3. Other kernel modules, like drivers
  57
  58 4. User space software packages, like nova-compute agents and other
  59    control plane programs.
  60
  61 Updating 1 and 2 will cause the loss of virtualzation functionality of
  62 the compute node, which may lead to the interruption of data plane services
  63 if the virtual resource is not redudant.
  64
  65 Updating 3 might have the same result.
  66
  67 Updating 4 might lead to control plane services interruption if not an
  68 HA deployment.
  69
  70 .. <MT> I'm not sure why would 4 cause control plane interruption on a
  71    compute node. My understanding is that simply the node cannot be managed.
  72    Redundancy won't help in that either.
  73
  74
  75 **Networking node**
  76
  77 1. OS kernel, optional, not all switches/routers allow the upgrade their
  78    OS since it is more like a firmware than a generic OS.
  79
  80 2. User space software package, like neutron agents and other control
  81    plane programs
  82
  83 Updating 1 if allowed will cause a node reboot and therefore leads to
  84 data plane service interruption if the virtual resource is not
  85 redundant.
  86
  87 Updating 2 might lead to control plane services interruption if not an
  88 HA deployment.
  89
  90 **Storage node**
  91
  92 1. OS kernel, optional, not all storage nodes allow the upgrade their OS
  93    since it is more like a firmware than a generic OS.
  94
  95 2. Kernel modules
  96
  97 3. User space software packages, control plane programs
  98
  99 Updating 1 if allowed will cause a node reboot and therefore leads to
 100 data plane services interruption if the virtual resource is not
 101 redundant.
 102
 103 Update 2 might result in the same.
 104
 105 Updating 3 might lead to control plane services interruption if not an
 106 HA deployment.
 107
 108 **Management node**
 109
 110 1. OS Kernel
 111
 112 2. Kernel modules, like driver
 113
 114 3. User space software packages, like database, message queue and
 115    control plane programs.
 116
 117 Updating 1 will cause a node reboot and therefore leads to control
 118 plane services interruption if not an HA deployment. Updating 2 might
 119 result in the same.
 120
 121 Updating 3 might lead to control plane services interruption if not an
 122 HA deployment.
 123
 124 Upgrade Granularity
 125 ===================
 126
 127 The granularity of an upgrade can be characterized from two perspective:
 128 - the physical dimension and
 129 - the software dimension
 130
 131 Physical Dimension
 132 ^^^^^^^^^^^^^^^^^^
 133
 134 The physical dimension characterizes the number of similar upgrade objects
 135 targeted by the upgrade, i.e. whether it is full / partial upgrade of a
 136 data centre, cluster, zone.
 137 Because of the upgrade of a data centre or a zone, it may be divided into
 138 several batches. Thus there is a need for efficiency in the execution of
 139 upgrades of potentially huge number of upgrade objects while still maintain
 140 availability to fulfill the requirement of smooth upgrade.
 141
 142 The upgrade of a cloud environment (cluster) may also
 143 be partial. For example, in one cloud environment running a number of
 144 VNFs, we may just try to upgrade one of them to check the stability and
 145 performance, before we upgrade all of them.
 146 Thus there is a need for proper organization of the artifacts associated with
 147 the different upgrade objects. Also the different versions should be able
 148 to coextist beyond the upgrade period.
 149
 150 From this perspective special attention may be needed when upgrading
 151 objects that are collaborating in a redundancy schema as in this case
 152 different versions not only need to coexist but also collaborate. This
 153 puts requirement on the upgrade objects primarily. If this is not possible
 154 the upgrade campaign should be designed in such a way that the proper
 155 isolation is ensured.
 156
 157 Software Dimension
 158 ^^^^^^^^^^^^^^^^^^
 159
 160 The software dimension of the upgrade characterizes the upgrade object
 161 type targeted and the combination in which they are upgraded together.
 162
 163 Even though the upgrade may
 164 initially target only one type of upgrade object, e.g. the hypervisor
 165 the dependency of other upgrade objects on this initial target object may
 166 require their upgrade as well. I.e. the upgrades need to be combined. From this
 167 perspective the main concern is compatibility of the dependent and
 168 sponsor objects. To take into consideration of these dependencies
 169 they need to be described together with the version compatility information.
 170 Breaking dependencies is the major cause of outages during upgrades.
 171
 172 In other cases it is more efficient to upgrade a combination of upgrade
 173 objects than to do it one by one. One aspect of the combination is how
 174 the upgrade packages can be combined, whether a new image can be created for
 175 them before hand or the different packages can be installed during the upgrade
 176 independently, but activated together.
 177
 178 The combination of upgrade objects may span across
 179 layers (e.g. software stack in the host and the VM of the VNF).
 180 Thus, it may require additional coordination between the management layers.
 181
 182 With respect to each upgrade object type and even stacks we can
 183 distingush major and minor upgrades:
 184
 185 **Major Upgrade**
 186
 187 Upgrades between major releases may introducing significant changes in
 188 function, configuration and data, such as the upgrade of OPNFV from
 189 Arno to Brahmaputra.
 190
 191 **Minor Upgrade**
 192
 193 Upgrades inside one major releases which would not leads to changing
 194 the structure of the platform and may not infect the schema of the
 195 system data.
 196
 197 Scope of Impact
 198 ===============
 199
 200 Considering availability and therefore smooth upgrade, one of the major
 201 concerns is the predictability and control of the outcome of the different
 202 upgrade operations. Ideally an upgrade can be performed without impacting any
 203 entity in the system, which means none of the operations change or potentially
 204 change the behaviour of any entity in the system in an uncotrolled manner.
 205 Accordingly the operations of such an upgrade can be performed any time while
 206 the system is running, while all the entities are online. No entity needs to be
 207 taken offline to avoid such adverse effects. Hence such upgrade operations
 208 are referred as online operations. The effects of the upgrade might be activated
 209 next time it is used, or may require a special activation action such as a
 210 restart. Note that the activation action provides more control and predictability.
 211
 212 If an entity's behavior in the system may change due to the upgrade it may
 213 be better to take it offline for the time of the relevant upgrade operations.
 214 The main question is however considering the hosting relation of an upgrade
 215 object what hosted entities are impacted. Accordingly we can identify a scope
 216 which is impacted by taking the given upgrade object offline. The entities
 217 that are in the scope of impact may need to be taken offline or moved out of
 218 this scope i.e. migrated.
 219
 220 If the impacted entity is in a different layer managed by another manager
 221 this may require coordination because taking out of service some
 222 infrastructure resources for the time of their upgrade which support virtual
 223 resources used by VNFs that should not experience outages. The hosted VNFs
 224 may or may not allow for the hot migration of their VMs. In case of migration
 225 the VMs placement policy should be considered.
 226