1 =========================
2 Tracing Ceph With BlkKin
3 =========================
5 Ceph can use Blkin, a library created by Marios Kogias and others,
6 which enables tracking a specific request from the time it enters
7 the system at higher levels till it is finally served by RADOS.
9 In general, Blkin implements the Dapper_ tracing semantics
10 in order to show the causal relationships between the different
11 processing phases that an IO request may trigger. The goal is an
12 end-to-end visualisation of the request's route in the system,
13 accompanied by information concerning latencies in each processing
14 phase. Thanks to LTTng this can happen with a minimal overhead and
15 in realtime. The LTTng traces can then be visualized with Twitter's
18 .. _Dapper: http://static.googleusercontent.com/media/research.google.com/el//pubs/archive/36356.pdf
19 .. _Zipkin: http://zipkin.io/
25 You can install Markos Kogias' upstream Blkin_ by hand.::
30 or build distribution packages using DistroReadyBlkin_, which also comes with
31 pkgconfig support. If you choose the latter, then you must generate the
32 configure and make files first.::
37 .. _Blkin: https://github.com/marioskogias/blkin
38 .. _DistroReadyBlkin: https://github.com/agshew/blkin
41 Configuring Ceph with Blkin
42 ===========================
44 If you built and installed Blkin by hand, rather than building and
45 installing packages, then set these variables before configuring
48 export BLKIN_CFLAGS=-Iblkin/
49 export BLKIN_LIBS=-lzipkin-cpp
51 Since there are separate lttng and blkin changes to Ceph, you may
52 want to configure with something like::
54 ./configure --with-blkin --without-lttng --with-debug
60 It's easy to test Ceph's Blkin tracing. Let's assume you don't have
61 Ceph already running, and you compiled Ceph with Blkin support but
62 you did't install it. Then launch Ceph with the ``vstart.sh`` script
63 in Ceph's src directgory so you can see the possible tracepoints.::
66 OSD=3 MON=3 RGW=1 ./vstart.sh -n
67 lttng list --userspace
69 You'll see something like the following:::
73 PID: 8987 - Name: ./ceph-osd
74 zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
75 zipkin:keyval (loglevel: TRACE_WARNING (4)) (type: tracepoint)
76 ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
78 PID: 8407 - Name: ./ceph-mon
79 zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
80 zipkin:keyval (loglevel: TRACE_WARNING (4)) (type: tracepoint)
81 ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
85 Next, stop Ceph so that the tracepoints can be enabled.::
89 Start up an LTTng session and enable the tracepoints.::
91 lttng create blkin-test
92 lttng enable-event --userspace zipkin:timestamp
93 lttng enable-event --userspace zipkin:keyval
96 Then start up Ceph again.::
98 OSD=3 MON=3 RGW=1 ./vstart.sh -n
100 You may want to check that ceph is up.::
104 Now put something in usin rados, check that it made it, get it back, and remove it.::
106 ./rados mkpool test-blkin
107 ./rados put test-object-1 ./vstart.sh --pool=test-blkin
108 ./rados -p test-blkin ls
109 ./ceph osd map test-blkin test-object-1
110 ./rados get test-object-1 ./vstart-copy.sh --pool=test-blkin
112 ./rados rm test-object-1 --pool=test-blkin
114 You could also use the example in ``examples/librados/`` or ``rados bench``.
116 Then stop the LTTng session and see what was collected.::
121 You'll see something like:::
123 [13:09:07.755054973] (+?.?????????) scruffy zipkin:timestamp: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, event = "Message allocated" }
124 [13:09:07.755071569] (+0.000016596) scruffy zipkin:keyval: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, key = "Type", val = "MOSDOp" }
125 [13:09:07.755074217] (+0.000002648) scruffy zipkin:keyval: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, key = "Reqid", val = "client.4126.0:1" }
131 One of the points of using Blkin is so that you can look at the traces
132 using Zipkin. Users should run Zipkin as a tracepoints collector and
133 also a web service, which means users need to run three services,
134 zipkin-collector, zipkin-query and zipkin-web.
136 Download Zipkin Package::
138 wget https://github.com/twitter/zipkin/archive/1.1.0.tar.gz
141 bin/collector cassandra &
142 bin/query cassandra &
148 Browse http://${zipkin-web-ip}:8080
151 Show Ceph's Blkin Traces in Zipkin-web
152 ======================================
153 Blkin provides a script which translates lttng result to Zipkin
156 Send lttng data to Zipkin::
158 python3 babeltrace_zipkin.py ${lttng-traces-dir}/${blkin-test}/ust/uid/0/64-bit/ -p ${zipkin-collector-port(9410 by default)} -s ${zipkin-collector-ip}
162 python3 babeltrace_zipkin.py ~/lttng-traces-dir/blkin-test-20150225-160222/ust/uid/0/64-bit/ -p 9410 -s 127.0.0.1
164 Check Ceph traces on webpage::
166 Browse http://${zipkin-web-ip}:8080