Improve data ingestion reliability and functionality 83/63383/2
authorearrage <eddie.arrage@huawei.com>
Wed, 10 Oct 2018 18:54:51 +0000 (11:54 -0700)
committerearrage <eddie.arrage@huawei.com>
Wed, 10 Oct 2018 18:59:49 +0000 (11:59 -0700)
commitc36f82e76df1f2f506f770935093828d3d573e6b
treee5f8dadb2359c5fc5a68737f21f2e140c1763dcb
parente716285dfe01e2373984550495fd2cf02dbf959d
Improve data ingestion reliability and functionality

- Modify deployment namespace to clover-system and account
for cassandra moving to the clover-system namespace
- Increase k8s compute resource assigned to cassandra to deal
with performance issues
- Add additional fields (user-agent, request/response size,
status codes) to span schema definition and modify primary keys
- Improve exception handling to prevent collect process from
crashing
- Minor changes to support tracing/monitoring with Istio 1.0
- Inhibit logging for debug messages
- Increase time back and number of traces to fetch in
each sampling interval to deal with Jaeger REST interface
returning trace data out of order under load
(tested to 300 conn/sec; 12K connections currently)
- Move trace insert into batch mode to cassandra
- Read visibility services to analyze from redis rather than
defaults (cloverctl, UI or clover-controller REST will set)
- Remove local directory copies in docker build, as image is
based on base clover container

Change-Id: Ibae98ef5057e52a6eeddd9ebbcfaeb644caec36c
Signed-off-by: earrage <eddie.arrage@huawei.com>
clover/collector/db/cassops.py
clover/collector/db/redisops.py
clover/collector/docker/Dockerfile
clover/collector/grpc/collector_client.py
clover/collector/grpc/collector_server.py
clover/collector/process/collect.py
clover/collector/yaml/manifest.template
clover/tools/yaml/cassandra.yaml