docs/requirements/impl_architecture.rst

   1 Detailed architecture and message flows
   2 =======================================
   3
   4 Within the Promise project we consider two different architectural options, i.e.
   5 a *shim-layer* based architecture and an architecture targeting at full
   6 OpenStack *integration*.
   7
   8 Shim-layer architecture
   9 -----------------------
  10
  11 The *shim-layer architecture* is using a layer on top of OpenStack to provide
  12 the capacity management, resource reservation, and resource allocation features.
  13
  14
  15 Detailed Message Flows
  16 ^^^^^^^^^^^^^^^^^^^^^^
  17
  18 Note, that only selected parameters for the messages are shown. Refer to
  19 :ref:`northbound_API` and Annex :ref:`yang_schema` for a full set of message
  20 parameters.
  21
  22 Resource Capacity Management
  23 """"""""""""""""""""""""""""
  24
  25 .. figure:: images/figure5_new.png
  26     :name: figure5
  27     :width: 90%
  28
  29     Capacity Management Scenario
  30
  31 :numref:`figure5` shows a detailed message flow between the consumers and the
  32 capacity management functional blocks inside the shim-layer. It has the
  33 following steps:
  34
  35     * Step 1a: The Consumer sends a *query-capacity* request to Promise
  36       using some filter like time-windows or resource type. The capacity is
  37       looked up in the shim-layer capacity map.
  38
  39     * Step 1b: The shim-layer will respond with information about the
  40       total, available, reserved, and used (allocated) capacities matching the
  41       filter.
  42
  43     * Step 2a: The Consumer can send *increase/decrease-capacity* requests
  44       to update the capacity available to the reservation system. It can be
  45       100% of available capacity in the given provider/source or only a subset,
  46       i.e., it can allow for leaving some "buffer" in the actual NFVI to be
  47       used outside the Promise shim-layer or for a different reservation
  48       service instance. It can also be used to inform the reservation system
  49       that from a certain time in the future, additional resources can be
  50       reserved (e.g. due to a planned upgrade of the capacity), or the
  51       available capacity will be reduced (e.g. due to a planned downtime of
  52       some of the resources).
  53
  54     * Step 2b: The shim-layer will respond with an ACK/NACK message.
  55
  56     * Step 3a: Consumers can subscribe for capacity-change events using a
  57       filter.
  58
  59     * Step 3b: Each successful subscription is responded with a
  60       subscription_id.
  61
  62     * Step 4: The shim-layer monitors the capacity information for the
  63       various types of resources by periodically querying the various
  64       Controllers (e.g. Nova, Neutron, Cinder) or by creating event alarms in
  65       the VIM (e.g. with Ceilometer for OpenStack) and updates capacity
  66       information in its capacity map.
  67
  68     * Step 5: Capacity changes are notified to the Consumer.
  69
  70 Resource Reservation
  71 """"""""""""""""""""
  72
  73 .. figure:: images/figure6_new.png
  74     :name: figure6
  75     :width: 90%
  76
  77     Resource Reservation for Future Use Scenario
  78
  79 :numref:`figure6` shows a detailed message flow between the Consumer and the
  80 resource reservation functional blocks inside the shim-layer. It has the
  81 following steps:
  82
  83     * Step 1a: The Consumer creates a resource reservation request for
  84       future use by setting a start and end time for the reservation as well as
  85       more detailed information about the resources to be reserved. The Promise
  86       shim-layer will check the free capacity in the given time window and in
  87       case sufficient capacity exists to meet the reservation request, will
  88       mark those resources "reserved" in its reservation map.
  89
  90     * Step 1b: If the reservation was successful, a reservation_id and
  91       status of the reservation will be returned to the Consumer. In case the
  92       reservation cannot be met, the shim-layer may return information about
  93       the maximum capacity that could be reserved during the requested time
  94       window and/or a potential time window where the requested (amount of)
  95       resources would be available.
  96
  97     * Step 2a: Reservations can be updated using an *update-reservation*,
  98       providing the reservation_id and the new reservation_data. Promise
  99       Reservation Manageer will check the feasibility to update the reservation
 100       as requested.
 101
 102     * Step 2b: If the reservation was updated successfully, a
 103       reservation_id and status of the reservation will be returned to the
 104       Consumer. Otherwise, an appropriate error message will be returned.
 105
 106     * Step 3a: A *cancel-reservation* request can be used to withdraw an
 107       existing reservation. Promise will update the reservation map by removing
 108       the reservation as well as the capacity map by adding the freed capacity.
 109
 110     * Step 3b: The response message confirms the cancelation.
 111
 112     * Step 4a: Consumers can also issue *query-reservation* requests to
 113       receive a list of reservation. An input filter can be used to narrow down
 114       the query, e.g., only provide reservations in a given time window.
 115       Promise will query its reservation map to identify reservations matching
 116       the input filter.
 117
 118     * Step 4b: The response message contains information about all
 119       reservations matching the input filter. It also provides information
 120       about the utilization in the requested time window.
 121
 122     * Step 5a: Consumers can subscribe for reservation-change events using
 123       a filter.
 124
 125     * Step 5b: Each successful subscription is responded with a
 126       subscription_id.
 127
 128     * Step 6a: Promise synchronizes the available and used capacity with
 129       the underlying VIM.
 130
 131     * Step 6b: In certain cases, e.g., due a failure in the underlying
 132       hardware, some reservations cannot be kept up anymore and have to be
 133       updated or canceled. The shim-layer will identify affected reservations
 134       among its reservation records.
 135
 136     * Step 7: Subscribed Consumers will be informed about the updated
 137       reservations. The notification contains the updated reservation_data and
 138       new status of the reservation. It is then up to the Consumer to take
 139       appropriate actions in order to ensure high priority reservations are
 140       favored over lower priority reservations.
 141
 142 Resource Allocation
 143 """""""""""""""""""
 144
 145 .. figure:: images/figure7_new.png
 146     :name: figure7
 147     :width: 90%
 148
 149     Resource Allocation
 150
 151 :numref:`figure7` shows a detailed message flow between the Consumer, the
 152 functional blocks inside the shim-layer, and the VIM. It has the following
 153 steps:
 154
 155     * Step 1a: The Consumer sends a *create-instance* request providing
 156       information about the resources to be reserved, i.e., provider_id
 157       (optional in case of only one provider), name of the instance, the
 158       requested flavour and image, etc. If the allocation is against an
 159       existing reservation, the reservation_id has to be provided.
 160
 161     * Step 1b: If a reservation_id was provided, Promise checks if a
 162       reservation with that ID exists, the reservation start time has arrived
 163       (i.e. the reservation is active), and the required capacity for the
 164       requested flavor is within the available capacity of the reservation. If
 165       those conditions are met, Promise creates a record for the allocation
 166       (VMState="INITIALIZED") and update its databases. If no reservation_id
 167       was provided in the allocation request, Promise checks whether the
 168       required capacity to meet the request can be provided from the available,
 169       non-reserved capacity. If yes, Promise creates a record for the
 170       allocation and update its databases. In any other case, Promise rejects
 171       the *create-instance* request.
 172
 173     * Step 2: In the case the *create-instance* request was rejected,
 174       Promise responds with a "status=rejected" providing the reason of the
 175       rejection. This will help the Consumer to take appropriate actions, e.g.,
 176       send an updated *create-instance* request. The allocation work flow will
 177       terminate at this step and the below steps are not executed.
 178
 179     * Step 3a: If the *create-instance* request was accepted and a related
 180       allocation record has been created, the shim-layer issues a
 181       *createServer* request to the VIM Controller providing all information to
 182       create the server instance.
 183
 184     * Step 3b: The VIM Controller sends an immediate reply with an
 185       instance_id and starts the VIM-internal allocation process.
 186
 187     * Step 4: The Consumer gets an immediate response message with
 188       allocation status "in progress" and the assigned instance_id.
 189
 190     * Step 5a+b: The consumer subscribes to receive notifications about
 191       allocation events related to the requested instance. Promise responds
 192       with an acknowledgment including a subscribe_id.
 193
 194     * Step 6: In parallel to the previous step, Promise shim-layer creates
 195       an alarm in Aodh to receive notifications about all changes to the
 196       VMState for instance_id.
 197
 198     * Step 7a: The VIM Controller notifies all instance related events to
 199       Ceilometer. After the allocation has been completed or failed, it sends
 200       an event to Ceilometer. This triggers the OpenStack alarming service Aodh
 201       to notify the new VMState (e.g. ACTIVE and ERROR) to the shim-layer that
 202       updates its internal allocation records.
 203
 204     * Step 7b: Promise sends a notification message to the subscribed
 205       Consumer with information on the allocated resources including their new
 206       VMState.
 207
 208     * Step 8a+b: Allocated instances can be terminated by the Consumer by
 209       sending a *destroy-instance* request to the shim-layer. Promise responds
 210       with an acknowledgment and the new status "DELETING" for the instance.
 211
 212     * Step 9a: Promise sends a *deleteServer* request for the instance_id
 213       to the VIM Controller.
 214
 215     * Step 10a: After the instance has been deleted, an event alarm is
 216       sent to the shim-layer that updates its internal allocation records and
 217       capacity utilization.
 218
 219     * Step 10b: The shim-layer also notifies the subscribed Consumer about
 220       the successfully destroyed instance.
 221
 222
 223 Internal operations
 224 ^^^^^^^^^^^^^^^^^^^
 225
 226 .. note:: This section is to be updated
 227
 228 In the following, the internal logic and operations of the shim-layer will be
 229 explained in more detail, e.g. the "check request" (step 1b in
 230 :numref:`figure7` of the allocation work flow).
 231
 232
 233
 234 Integrated architecture
 235 -----------------------
 236
 237 The *integrated architecture* aims at full integration with OpenStack.
 238 This means that it is planned to use the already existing OpenStack APIs
 239 extended with the reservation capabilities.
 240
 241 The advantage of this approach is that we don't need to re-model the
 242 complex resource structure we have for the virtual machines and the
 243 corresponding infrastructure.
 244
 245 The atomic item is the virtual machine with the minimum set of resources
 246 it requires to be able to start it up. It is important to state that
 247 resource reservation is handled on VM instance level as opposed to standalone
 248 resources like CPU, memory and so forth. As the placement is an important
 249 aspect in order to be able to use the reserved resources it provides the
 250 constraint to handle resources in groups.
 251
 252 The placement constraint also makes it impossible to use a quota management
 253 system to solve the base use case described earlier in this document.
 254
 255 OpenStack had a project called Blazar, which was created in order to provide
 256 resource reservation functionality in cloud environments. It uses the Shelve
 257 API of Nova, which provides a sub-optimal solution. Due to the fact that this
 258 feature blocks the reserved resources this solution cannot be considered to
 259 be final. Further work is needed to reach a more optimal stage, where the
 260 Nova scheduler is intended to be used to schedule the resources for future
 261 use to make the reservations.
 262
 263 Phases of the work
 264 ^^^^^^^^^^^^^^^^^^
 265
 266 The work has two main stages to reach the final solution. The following main work items
 267 are on the roadmap for this approach:
 268
 269 #. Sub-optimal solution by using the shelve API of Nova through the Blazar project:
 270
 271    * Fix the code base of the Blazar project:
 272
 273      Due to integration difficulties the Blazar project got suspended. Since the last
 274      activities in that repository the OpenStack code base and environment changed
 275      significantly, which means that the project's code base needs to be updated to the
 276      latest standards and has to be able to interact with the latest version of the
 277      other OpenStack services.
 278
 279    * Update the Blazar API:
 280
 281      The REST API needs to be extended to contain the attributes for the reservation
 282      defined in this document. This activity shall include testing towards the new API.
 283
 284 #. Use Nova scheduler to avoid blocking the reserved resources:
 285
 286    * Analyze the Nova scheduler:
 287
 288      The status and the possible interface between the resource reservation system and
 289      the Nova scheduler needs to be identified. It is crucial to achieve a much more
 290      optimal solution than what the current version of Blazar can provide. The goal is
 291      to be able to use the reserved resources before the reservation starts. In order to
 292      be able to achieve this we need the scheduler to do scheduling for the future
 293      considering the reservation intervals that are specified in the request.
 294
 295    * Define a new design based on the analysis and start the work on it:
 296
 297      The design for the more optimal solution can be defined only after analyzing the
 298      structure and capabilities of the Nova scheduler.
 299
 300    * This phase can be started in parallel with the previous one.
 301
 302 Detailed Message Flows
 303 ^^^^^^^^^^^^^^^^^^^^^^
 304
 305 .. note:: to be done
 306
 307 Resource Reservation
 308 """"""""""""""""""""
 309
 310 .. note:: to be specified