docs/development/requirements/03-dpdk.rst

   1 .. This work is licensed under a Creative Commons Attribution 4.0 International License.
   2 .. http://creativecommons.org/licenses/by/4.0
   3 .. (c) OPNFV, Intel Corporation and others.
   4
   5 DPDK Enhancements
   6 ==================
   7 This section will discuss the Barometer features that were integrated with DPDK.
   8
   9 Measuring Telco Traffic and Performance KPIs
  10 --------------------------------------------
  11 This section will discuss the Barometer features that enable Measuring Telco Traffic
  12 and Performance KPIs.
  13
  14 .. Figure:: stats_and_timestamps.png
  15
  16    Measuring Telco Traffic and Performance KPIs
  17
  18 * The very first thing Barometer enabled was a call-back API in DPDK and an
  19   associated application that used the API to demonstrate how to timestamp
  20   packets and measure packet latency in DPDK (the sample app is called
  21   rxtx_callbacks). This was upstreamed to DPDK 2.0 and is represented by
  22   the interfaces 1 and 2 in Figure 1.2.
  23
  24 * The second thing Barometer implemented in DPDK is the extended NIC statistics API,
  25   which exposes NIC stats including error stats to the DPDK user by reading the
  26   registers on the NIC. This is represented by interface 3 in Figure 1.2.
  27
  28   * For DPDK 2.1 this API was only implemented for the ixgbe (10Gb) NIC driver,
  29     in association with a sample application that runs as a DPDK secondary
  30     process and retrieves the extended NIC stats.
  31
  32   * For DPDK 2.2 the API was implemented for igb, i40e and all the Virtual
  33     Functions (VFs) for all drivers.
  34
  35   * For DPDK 16.07 the API migrated from using string value pairs to using id
  36     value pairs, improving the overall performance of the API.
  37
  38 Monitoring DPDK interfaces
  39 --------------------------
  40 With the features Barometer enabled in DPDK to enable measuring Telco traffic and
  41 performance KPIs, we can now retrieve NIC statistics including error stats and
  42 relay them to a DPDK user. The next step is to enable monitoring of the DPDK
  43 interfaces based on the stats that we are retrieving from the NICs, by relaying
  44 the information to a higher level Fault Management entity. To enable this Barometer
  45 has been enabling a number of plugins for collectd.
  46
  47 DPDK Keep Alive description
  48 ---------------------------
  49 SFQM aims to enable fault detection within DPDK, the very first feature to
  50 meet this goal is the DPDK Keep Alive Sample app that is part of DPDK 2.2.
  51
  52 DPDK Keep Alive or KA is a sample application that acts as a heartbeat/watchdog
  53 for DPDK packet processing cores, to detect application thread failure. The
  54 application supports the detection of ‘failed’ DPDK cores and notification to a
  55 HA/SA middleware. The purpose is to detect Packet Processing Core fails (e.g.
  56 infinite loop) and ensure the failure of the core does not result in a fault
  57 that is not detectable by a management entity.
  58
  59 .. Figure:: dpdk_ka.png
  60
  61    DPDK Keep Alive Sample Application
  62
  63 Essentially the app demonstrates how to detect 'silent outages' on DPDK packet
  64 processing cores. The application can be decomposed into two specific parts:
  65 detection and notification.
  66
  67 * The detection period is programmable/configurable but defaults to 5ms if no
  68   timeout is specified.
  69 * The Notification support is enabled by simply having a hook function that where this
  70   can be 'call back support' for a fault management application with a compliant
  71   heartbeat mechanism.
  72
  73 DPDK Keep Alive Sample App Internals
  74 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  75 This section provides some explanation of the The Keep-Alive/'Liveliness'
  76 conceptual scheme as well as the DPDK Keep Alive App. The initialization and
  77 run-time paths are very similar to those of the L2 forwarding application (see
  78 `L2 Forwarding Sample Application (in Real and Virtualized Environments)`_ for more
  79 information).
  80
  81 There are two types of cores: a Keep Alive Monitor Agent Core (master DPDK core)
  82 and Worker cores (Tx/Rx/Forwarding cores). The Keep Alive Monitor Agent Core
  83 will supervise worker cores and report any failure (2 successive missed pings).
  84 The Keep-Alive/'Liveliness' conceptual scheme is:
  85
  86 * DPDK worker cores mark their liveliness as they forward traffic.
  87 * A Keep Alive Monitor Agent Core runs a function every N Milliseconds to
  88   inspect worker core liveliness.
  89 * If keep-alive agent detects time-outs, it notifies the fault management
  90   entity through a call-back function.
  91
  92 **Note:**  Only the worker cores state is monitored. There is no mechanism or agent
  93 to monitor the Keep Alive Monitor Agent Core.
  94
  95 DPDK Keep Alive Sample App Code Internals
  96 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  97 The following section provides some explanation of the code aspects that are
  98 specific to the Keep Alive sample application.
  99
 100 The heartbeat functionality is initialized with a struct rte_heartbeat and the
 101 callback function to invoke in the case of a timeout.
 102
 103 .. code:: c
 104
 105     rte_global_keepalive_info = rte_keepalive_create(&dead_core, NULL);
 106     if (rte_global_hbeat_info == NULL)
 107         rte_exit(EXIT_FAILURE, "keepalive_create() failed");
 108
 109 The function that issues the pings hbeat_dispatch_pings() is configured to run
 110 every check_period milliseconds.
 111
 112 .. code:: c
 113
 114     if (rte_timer_reset(&hb_timer,
 115             (check_period * rte_get_timer_hz()) / 1000,
 116             PERIODICAL,
 117             rte_lcore_id(),
 118             &hbeat_dispatch_pings, rte_global_keepalive_info
 119             ) != 0 )
 120         rte_exit(EXIT_FAILURE, "Keepalive setup failure.\n");
 121
 122 The rest of the initialization and run-time path follows the same paths as the
 123 the L2 forwarding application. The only addition to the main processing loop is
 124 the mark alive functionality and the example random failures.
 125
 126 .. code:: c
 127
 128     rte_keepalive_mark_alive(&rte_global_hbeat_info);
 129     cur_tsc = rte_rdtsc();
 130
 131     /* Die randomly within 7 secs for demo purposes.. */
 132     if (cur_tsc - tsc_initial > tsc_lifetime)
 133     break;
 134
 135 The rte_keepalive_mark_alive() function simply sets the core state to alive.
 136
 137 .. code:: c
 138
 139     static inline void
 140     rte_keepalive_mark_alive(struct rte_heartbeat *keepcfg)
 141     {
 142         keepcfg->state_flags[rte_lcore_id()] = 1;
 143     }
 144
 145 Keep Alive Monitor Agent Core Monitoring Options
 146 The application can run on either a host or a guest. As such there are a number
 147 of options for monitoring the Keep Alive Monitor Agent Core through a Local
 148 Agent on the compute node:
 149
 150          ======================  ==========  =============
 151           Application Location     DPDK KA     LOCAL AGENT
 152          ======================  ==========  =============
 153                   HOST               X        HOST/GUEST
 154                   GUEST              X        HOST/GUEST
 155          ======================  ==========  =============
 156
 157
 158 For the first implementation of a Local Agent SFQM will enable:
 159
 160          ======================  ==========  =============
 161           Application Location     DPDK KA     LOCAL AGENT
 162          ======================  ==========  =============
 163                   HOST               X           HOST
 164          ======================  ==========  =============
 165
 166 Through extending the dpdkstat plugin for collectd with KA functionality, and
 167 integrating the extended plugin with Monasca for high performing, resilient,
 168 and scalable fault detection.
 169
 170 .. _L2 Forwarding Sample Application (in Real and Virtualized Environments): http://dpdk.org/doc/guides/sample_app_ug/l2_forward_real_virtual.html