src/ceph/doc/dev/osd_internals/watch_notify.rst

   1 ============
   2 Watch Notify
   3 ============
   4
   5 See librados for the watch/notify interface.
   6
   7 Overview
   8 --------
   9 The object_info (See osd/osd_types.h) tracks the set of watchers for
  10 a particular object persistently in the object_info_t::watchers map.
  11 In order to track notify progress, we also maintain some ephemeral
  12 structures associated with the ObjectContext.
  13
  14 Each Watch has an associated Watch object (See osd/Watch.h).  The
  15 ObjectContext for a watched object will have a (strong) reference
  16 to one Watch object per watch, and each Watch object holds a
  17 reference to the corresponding ObjectContext.  This circular reference
  18 is deliberate and is broken when the Watch state is discarded on
  19 a new peering interval or removed upon timeout expiration or an
  20 unwatch operation.
  21
  22 A watch tracks the associated connection via a strong
  23 ConnectionRef Watch::conn.  The associated connection has a
  24 WatchConState stashed in the OSD::Session for tracking associated
  25 Watches in order to be able to notify them upon ms_handle_reset()
  26 (via WatchConState::reset()).
  27
  28 Each Watch object tracks the set of currently un-acked notifies.
  29 start_notify() on a Watch object adds a reference to a new in-progress
  30 Notify to the Watch and either:
  31
  32 * if the Watch is *connected*, sends a Notify message to the client
  33 * if the Watch is *unconnected*, does nothing.
  34
  35 When the Watch becomes connected (in PrimaryLogPG::do_osd_op_effects),
  36 Notifies are resent to all remaining tracked Notify objects.
  37
  38 Each Notify object tracks the set of un-notified Watchers via
  39 calls to complete_watcher().  Once the remaining set is empty or the
  40 timeout expires (cb, registered in init()) a notify completion
  41 is sent to the client.
  42
  43 Watch Lifecycle
  44 ---------------
  45 A watch may be in one of 5 states:
  46
  47 1. Non existent.
  48 2. On disk, but not registered with an object context.
  49 3. Connected
  50 4. Disconnected, callback registered with timer
  51 5. Disconnected, callback in queue for scrub or is_degraded
  52
  53 Case 2 occurs between when an OSD goes active and the ObjectContext
  54 for an object with watchers is loaded into memory due to an access.
  55 During Case 2, no state is registered for the watch.  Case 2
  56 transitions to Case 4 in PrimaryLogPG::populate_obc_watchers() during
  57 PrimaryLogPG::find_object_context.  Case 1 becomes case 3 via
  58 OSD::do_osd_op_effects due to a watch operation.  Case 4,5 become case
  59 3 in the same way. Case 3 becomes case 4 when the connection resets
  60 on a watcher's session.
  61
  62 Cases 4&5 can use some explanation.  Normally, when a Watch enters Case
  63 4, a callback is registered with the OSDService::watch_timer to be
  64 called at timeout expiration.  At the time that the callback is
  65 called, however, the pg might be in a state where it cannot write
  66 to the object in order to remove the watch (i.e., during a scrub
  67 or while the object is degraded).  In that case, we use
  68 Watch::get_delayed_cb() to generate another Context for use from
  69 the callbacks_for_degraded_object and Scrubber::callbacks lists.
  70 In either case, Watch::unregister_cb() does the right thing
  71 (SafeTimer::cancel_event() is harmless for contexts not registered
  72 with the timer).
  73
  74 Notify Lifecycle
  75 ----------------
  76 The notify timeout is simpler: a timeout callback is registered when
  77 the notify is init()'d.  If all watchers ack notifies before the
  78 timeout occurs, the timeout is canceled and the client is notified
  79 of the notify completion.  Otherwise, the timeout fires, the Notify
  80 object pings each Watch via cancel_notify to remove itself, and
  81 sends the notify completion to the client early.