1 .. This work is licensed under a Creative Commons Attribution 4.0 International License.
2 .. http://creativecommons.org/licenses/by/4.0
11 This blueprint proposes to create a performance profiler for doctor scenarios.
16 In the verification job for notification time, we have encountered some
17 performance issues, such as
19 1. In environment deployed by APEX, it meets the criteria while in the one by
20 Fuel, the performance is much more poor.
21 2. Signification performance degradation was spotted when we increase the total
24 It takes time to dig the log and analyse the reason. People have to collect
25 timestamp at each checkpoints manually to find out the bottleneck. A performance
26 profiler will make this process automatic.
31 Current Doctor scenario covers the inspector and notifier in the whole fault
37 |monitor|inspector|notifier|manager|controller|
39 occurred +-------->+ | | |
40 | detected +------->+ | |
41 | | identified +-------+ |
42 | | notified +--------->+
43 | | | processed resolved
45 | +<-----doctor----->+ |
48 +<---------------fault management------------>+
50 The notification time can be split into several parts and visualized as a
54 0----5---10---15---20---25---30---35---40---45--> (x 10ms)
56 0-hostdown | | | | | | | | |
57 +--->+ | | | | | | | | |
58 | 1-raw failure | | | | | | |
59 | +-->+ | | | | | | | |
60 | | 2-found affected | | | | |
61 | | +-->+ | | | | | | |
62 | | 3-marked host down| | | | |
64 | | 4-set VM error| | | | |
66 | | | 5-notified VM error | |
67 | | | +----->| | | | |
68 | | | | 6-transformed event
70 | | | | | 7-evaluated event
72 | | | | | 8-fired alarm
74 | | | | | 9-received alarm
76 sample | sample | | | |10-handled alarm
77 monitor| inspector |nova| c/m | aodh |
79 +<-----------------doctor--------------->+
81 Note: c/m = ceilometer
83 And a table of components sorted by time cost from most to least
85 +----------+---------+----------+
86 |Component |Time Cost|Percentage|
87 +==========+=========+==========+
88 |inspector |160ms | 40% |
89 +----------+---------+----------+
91 +----------+---------+----------+
92 |monitor |50ms | 14% |
93 +----------+---------+----------+
95 +----------+---------+----------+
97 +----------+---------+----------+
99 Note: data in the table is for demonstration only, not actual measurement
101 Timestamps can be collected from various sources
104 2. trace point in code
106 The performance profiler will be integrated into the verification job to provide
107 detail result of the test. It can also be deployed independently to diagnose
108 performance issue in specified environment.
113 1. PoC with limited checkpoints
114 2. Integration with verification job
115 3. Collect timestamp at all checkpoints
116 4. Display the profiling result in console
117 5. Report the profiling result to test database
118 6. Independent package which can be installed to specified environment