LMA: Deployment of LMA solution. 42/70842/3
authoradi0509 <adiyadav0509@gmail.com>
Fri, 21 Aug 2020 17:42:57 +0000 (23:12 +0530)
committeradi0509 <adiyadav0509@gmail.com>
Fri, 4 Sep 2020 16:41:10 +0000 (22:11 +0530)
Docs for LMA deployment

Signed-off-by: Adarsh Yadav <adiyadav0509@gmail.com>
Change-Id: Ib58bec806ce80c6927b40ddd490d612195bd6d70

docs/lma/devguide.rst [new file with mode: 0644]
docs/lma/logs/images/elasticsearch.png [new file with mode: 0644]
docs/lma/logs/images/fluentd-cs.png [new file with mode: 0644]
docs/lma/logs/images/fluentd-ss.png [new file with mode: 0644]
docs/lma/logs/images/nginx.png [new file with mode: 0644]
docs/lma/logs/images/setup.png [new file with mode: 0644]
docs/lma/logs/userguide.rst [new file with mode: 0644]

diff --git a/docs/lma/devguide.rst b/docs/lma/devguide.rst
new file mode 100644 (file)
index 0000000..c72b8b1
--- /dev/null
@@ -0,0 +1,147 @@
+=================
+Table of Contents
+=================
+.. contents::
+.. section-numbering::
+
+Ansible Client-side
+====================
+
+Ansible File Organisation
+--------------------------
+Files Structure::
+
+    ansible-client
+    ├── ansible.cfg
+    ├── hosts
+    ├── playbooks
+    │   └── setup.yaml
+    └── roles
+        ├── clean-td-agent
+        │   └── tasks
+        │       └── main.yml
+        └── td-agent
+            ├── files
+            │   └── td-agent.conf
+            └── tasks
+                └── main.yml
+
+Summary of roles
+-----------------
+====================== ======================
+Roles                  Description
+====================== ======================
+``td-agent``           Install Td-agent & change configuration file
+``clean-td-agent``     Unistall Td-agent
+====================== ======================
+
+Configurable Parameters
+------------------------
+====================================================== ====================== ======================
+File (ansible-client/roles/)                           Parameter              Description
+====================================================== ====================== ======================
+``td-agent/files/td-agent.conf``                       host                   Fluentd-server IP
+``td-agent/files/td-agent.conf``                       port                   Fluentd-Server Port
+====================================================== ====================== ======================
+
+Ansible Server-side
+====================
+
+Ansible File Organisation
+--------------------------
+Files Structure::
+
+      ansible-server
+      ├── ansible.cfg
+      ├── group_vars
+      │   └── all.yml
+      ├── hosts
+      ├── playbooks
+      │   └── setup.yaml
+      └── roles
+          ├── clean-logging
+          │   └── tasks
+          │       └── main.yml
+          ├── k8s-master
+          │   └── tasks
+          │       └── main.yml
+          ├── k8s-pre
+          │   └── tasks
+          │       └── main.yml
+          ├── k8s-worker
+          │   └── tasks
+          │       └── main.yml
+          ├── logging
+          │   ├── files
+          │   │   ├── elastalert
+          │   │   │   ├── ealert-conf-cm.yaml
+          │   │   │   ├── ealert-key-cm.yaml
+          │   │   │   ├── ealert-rule-cm.yaml
+          │   │   │   └── elastalert.yaml
+          │   │   ├── elasticsearch
+          │   │   │   ├── elasticsearch.yaml
+          │   │   │   └── user-secret.yaml
+          │   │   ├── fluentd
+          │   │   │   ├── fluent-cm.yaml
+          │   │   │   ├── fluent-service.yaml
+          │   │   │   └── fluent.yaml
+          │   │   ├── kibana
+          │   │   │   └── kibana.yaml
+          │   │   ├── namespace.yaml
+          │   │   ├── nginx
+          │   │   │   ├── nginx-conf-cm.yaml
+          │   │   │   ├── nginx-key-cm.yaml
+          │   │   │   ├── nginx-service.yaml
+          │   │   │   └── nginx.yaml
+          │   │   ├── persistentVolume.yaml
+          │   │   └── storageClass.yaml
+          │   └── tasks
+          │       └── main.yml
+          └── nfs
+              └── tasks
+                  └── main.yml
+
+Summary of roles
+-----------------
+====================== ======================
+Roles                  Description
+====================== ======================
+``k8s-pre``            Pre-requisite for installing K8s, like installing docker & K8s, disable swap etc.
+``k8s-master``         Reset K8s & make a master
+``k8s-worker``         Join woker nodes with token
+``logging``            EFK & elastalert setup in K8s
+``clean logging``      Remove EFK & elastalert setup from K8s
+``nfs``                Start a NFS server to store Elasticsearch data
+====================== ======================
+
+Configurable Parameters
+------------------------
+========================================================================= ============================================ ======================
+File (ansible-server/roles/)                                              Parameter name                               Description
+========================================================================= ============================================ ======================
+**Role: logging**
+``logging/files/persistentVolume.yaml``                                   storage                                      Increase or Decrease Storage size of Persistent Volume size for each VM
+``logging/files/kibana/kibana.yaml``                                      version                                      To Change the Kibana Version
+``logging/files/kibana/kibana.yaml``                                      count                                        To increase or decrease the replica
+``logging/files/elasticsearch/elasticsearch.yaml``                        version                                      To Change the Elasticsearch Version
+``logging/files/elasticsearch/elasticsearch.yaml``                        nodePort                                     To Change Service Port
+``logging/files/elasticsearch/elasticsearch.yaml``                        storage                                      Increase or Decrease Storage size of Elasticsearch data for each VM
+``logging/files/elasticsearch/elasticsearch.yaml``                        nodeAffinity -> values (hostname)              In which VM Elasticsearch master or data pod will run (change the hostname to run the Elasticsearch master or data pod on a specific node)
+``logging/files/elasticsearch/user-secret.yaml``                          stringData                                   Add Elasticsearch User & its roles (`Elastic Docs <https://www.elastic.co/guide/en/cloud-on-k8s/master/k8s-users-and-roles.html#k8s_file_realm>`_)
+``logging/files/fluentd/fluent.yaml``                                     replicas                                     To increase or decrease the replica
+``logging/files/fluentd/fluent-service.yaml``                             nodePort                                     To Change Service Port
+``logging/files/fluentd/fluent-cm.yaml``                                  index_template.json -> number_of_replicas    To increase or decrease replica of data in Elasticsearch
+``logging/files/fluentd/fluent-cm.yaml``                                  fluent.conf                                  Server port & other Fluentd Configuration
+``logging/files/nginx/nginx.yaml``                                        replicas                                     To increase or decrease the replica
+``logging/files/nginx/nginx-service.yaml``                                nodePort                                     To Change Service Port
+``logging/files/nginx/nginx-key-cm.yaml``                                 kibana-access.key, kibana-access.pem         Key file for HTTPs Connection
+``logging/files/nginx/nginx-conf-cm.yaml``                                -                                            Nginx Configuration
+``logging/files/elastalert/elastalert.yaml``                              replicas                                     To increase or decrease the replica
+``logging/files/elastalert/ealert-key-cm.yaml``                           elastalert.key, elastalert.pem               Key file for HTTPs Connection
+``logging/files/elastalert/ealert-conf-cm.yaml``                          run_every                                    How often ElastAlert will query Elasticsearch
+``logging/files/elastalert/ealert-conf-cm.yaml``                          alert_time_limit                             If an alert fails for some reason, ElastAlert will retry sending the alert until this time period has elapsed
+``logging/files/elastalert/ealert-conf-cm.yaml``                          es_host, es_port                             Elasticsearch Serivce name & port in K8s
+``logging/files/elastalert/ealert-rule-cm.yaml``                          http_post_url                                Alert Receiver IP (`Elastalert Rule Config <https://elastalert.readthedocs.io/en/latest/ruletypes.html>`_)
+**Role: nfs**
+``nfs/tasks/main.yml``                                                    line                                         Path of NFS storage
+========================================================================= ============================================ ======================
diff --git a/docs/lma/logs/images/elasticsearch.png b/docs/lma/logs/images/elasticsearch.png
new file mode 100644 (file)
index 0000000..f0b876f
Binary files /dev/null and b/docs/lma/logs/images/elasticsearch.png differ
diff --git a/docs/lma/logs/images/fluentd-cs.png b/docs/lma/logs/images/fluentd-cs.png
new file mode 100644 (file)
index 0000000..513bb3e
Binary files /dev/null and b/docs/lma/logs/images/fluentd-cs.png differ
diff --git a/docs/lma/logs/images/fluentd-ss.png b/docs/lma/logs/images/fluentd-ss.png
new file mode 100644 (file)
index 0000000..4e9ab11
Binary files /dev/null and b/docs/lma/logs/images/fluentd-ss.png differ
diff --git a/docs/lma/logs/images/nginx.png b/docs/lma/logs/images/nginx.png
new file mode 100644 (file)
index 0000000..a0b0051
Binary files /dev/null and b/docs/lma/logs/images/nginx.png differ
diff --git a/docs/lma/logs/images/setup.png b/docs/lma/logs/images/setup.png
new file mode 100644 (file)
index 0000000..267685f
Binary files /dev/null and b/docs/lma/logs/images/setup.png differ
diff --git a/docs/lma/logs/userguide.rst b/docs/lma/logs/userguide.rst
new file mode 100644 (file)
index 0000000..b410ee6
--- /dev/null
@@ -0,0 +1,348 @@
+=================
+Table of Contents
+=================
+.. contents::
+.. section-numbering::
+
+Setup
+======
+
+Prerequisites
+-------------------------
+- Require 3 VMs to setup K8s
+- ``$ sudo yum install ansible``
+- ``$ pip install openshift pyyaml kubernetes`` (required for ansible K8s module)
+- Update IPs in all these files (if changed)
+   ====================================================================== ======================
+   Path                                                                   Description
+   ====================================================================== ======================
+   ``ansible-server/group_vars/all.yml``                                  IP of K8s apiserver and VM hostname
+   ``ansible-server/hosts``                                               IP of VMs to install
+   ``ansible-server/roles/logging/files/persistentVolume.yaml``           IP of NFS-Server
+   ``ansible-server/roles/logging/files/elastalert/ealert-rule-cm.yaml``  IP of alert-receiver
+   ====================================================================== ======================
+
+Architecture
+--------------
+.. image:: images/setup.png
+
+Installation - Clientside
+-------------------------
+
+Nodes
+`````
+- **Node1** = 10.10.120.21
+- **Node4** = 10.10.120.24
+
+How installation is done?
+`````````````````````````
+- TD-agent installation
+   ``$ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh``
+- Copy the TD-agent config file in **Node1**
+   ``$ cp tdagent-client-config/node1.conf /etc/td-agent/td-agent.conf``
+- Copy the TD-agent config file in **Node4**
+   ``$ cp tdagent-client-config/node4.conf /etc/td-agent/td-agent.conf``
+- Restart the service
+   ``$ sudo service td-agent restart``
+
+Installation - Serverside
+-------------------------
+
+Nodes
+`````
+Inside Jumphost - POD12
+   - **VM1** = 10.10.120.211
+   - **VM2** = 10.10.120.203
+   - **VM3** = 10.10.120.204
+
+
+How installation is done?
+`````````````````````````
+**Using Ansible:**
+   - **K8s**
+      - **Elasticsearch:** 1 Master & 1 Data node at each VM
+      - **Kibana:** 1 Replicas
+      - **Nginx:** 2 Replicas
+      - **Fluentd:** 2 Replicas
+      - **Elastalert:** 1 Replica (get duplicate alert, if increase replica)
+   - **NFS Server:** at each VM to store elasticsearch data at following path
+      - ``/srv/nfs/master``
+      - ``/srv/nfs/data``
+
+How to setup?
+`````````````
+- **To setup K8s cluster and EFK:** Run the ansible-playbook ``ansible/playbooks/setup.yaml``
+- **To clean everything:** Run the ansible-playbook ``ansible/playbooks/clean.yaml``
+
+Do we have HA?
+````````````````
+Yes
+
+Configuration
+=============
+
+K8s
+---
+Path of all yamls (Serverside)
+````````````````````````````````
+``ansible-server/roles/logging/files/``
+
+K8s namespace
+`````````````
+``logging``
+
+K8s Service details
+````````````````````
+``$ kubectl get svc -n logging``
+
+Elasticsearch Configuration
+---------------------------
+
+Elasticsearch Setup Structure
+`````````````````````````````
+.. image:: images/elasticsearch.png
+
+Elasticsearch service details
+`````````````````````````````
+| **Service Name:** ``logging-es-http``
+| **Service Port:** ``9200``
+| **Service Type:** ``ClusterIP``
+
+How to get elasticsearch default username & password?
+`````````````````````````````````````````````````````
+- User1 (custom user):
+    | **Username:** ``elasticsearch``
+    | **Password:** ``password123``
+- User2 (by default created by Elastic Operator):
+    | **Username:** ``elastic``
+    | To get default password:
+    | ``$ PASSWORD=$(kubectl get secret -n logging logging-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')``
+    | ``$ echo $PASSWORD``
+
+How to increase replica of any index?
+````````````````````````````````````````
+| $ curl -k -u "elasticsearch:password123" -H 'Content-Type: application/json' -XPUT  "https://10.10.120.211:9200/indexname*/_settings" -d '
+| {
+|   "index" : {
+|   "number_of_replicas" : "2" }
+| }'
+
+Index Life
+```````````
+**30 Days**
+
+Kibana Configuration
+--------------------
+
+Kibana Service details
+````````````````````````
+| **Service Name:** ``logging-kb-http``
+| **Service Port:** ``5601``
+| **Service Type:** ``ClusterIP``
+
+Nginx Configuration
+--------------------
+IP
+````
+https://10.10.120.211:32000
+
+Nginx Setup Structure
+`````````````````````
+.. image:: images/nginx.png
+
+Ngnix Service details
+`````````````````````
+| **Service Name:** ``nginx``
+| **Service Port:** ``32000``
+| **Service Type:** ``NodePort``
+
+Why NGINX is used?
+```````````````````
+`Securing ELK using Nginx <https://logz.io/blog/securing-elk-nginx/>`_
+
+Nginx Configuration
+````````````````````
+**Path:** ``ansible-server/roles/logging/files/nginx/nginx-conf-cm.yaml``
+
+Fluentd Configuration - Clientside (Td-agent)
+---------------------------------------------
+
+Fluentd Setup Structure
+````````````````````````
+.. image:: images/fluentd-cs.png
+
+Log collection paths
+`````````````````````
+- ``/tmp/result*/*.log``
+- ``/tmp/result*/*.dat``
+- ``/tmp/result*/*.csv``
+- ``/tmp/result*/stc-liveresults.dat.*``
+- ``/var/log/userspace*.log``
+- ``/var/log/sriovdp/*.log.*``
+- ``/var/log/pods/**/*.log``
+
+Logs sends to
+`````````````
+Another fluentd instance of K8s cluster (K8s Master: 10.10.120.211) at Jumphost.
+
+Td-agent logs
+`````````````
+Path of td-agent logs: ``/var/log/td-agent/td-agent.log``
+
+Td-agent configuration
+````````````````````````
+| Path of conf file: ``/etc/td-agent/td-agent.conf``
+| **If any changes is made in td-agent.conf then restart the td-agent service,** ``$ sudo service td-agent restart``
+
+Config Description
+````````````````````
+- Get the logs from collection path
+- | Convert to this format
+  | {
+  |   msg: "log line"
+  |   log_path: “/file/path”
+  |   file: “file.name”
+  |   host: “pod12-node4”
+  | }
+- Sends it to fluentd
+
+Fluentd Configuration - Serverside
+----------------------------------
+
+Fluentd Setup Structure
+````````````````````````
+.. image:: images/fluentd-ss.png
+
+Fluentd Service details
+````````````````````````
+| **Service Name:** ``fluentd``
+| **Service Port:** ``32224``
+| **Service Type:** ``NodePort``
+
+Logs sends to
+`````````````
+Elasticsearch service (https://logging-es-http:9200)
+
+Config Description
+````````````````````
+- **Step 1**
+   - Get the logs from Node1 & Node4
+- **Step 2**
+   ======================================== ======================
+   log_path                                 add tag (for routing)
+   ======================================== ======================
+   ``/tmp/result.*/.*errors.dat``           errordat.log
+   ``/tmp/result.*/.*counts.dat``           countdat.log
+   ``/tmp/result.*/stc-liveresults.dat.tx`` stcdattx.log
+   ``/tmp/result.*/stc-liveresults.dat.rx`` stcdatrx.log
+   ``/tmp/result.*/.*Statistics.csv``       ixia.log
+   ``/tmp/result.*/vsperf-overall*``        vsperf.log
+   ``/tmp/result.*/vswitchd*``              vswitchd.log
+   ``/var/log/userspace*``                  userspace.log
+   ``/var/log/sriovdp*``                    sriovdp.log
+   ``/var/log/pods*``                       pods.log
+   ======================================== ======================
+
+- **Step 3**
+   Then parse each type using tags.
+    - error.conf: to find any error
+    - time-series.conf: to parse time series data
+    - time-analysis.conf: to calculate time analyasis
+- **Step 4**
+   ================================ ======================
+   host                             add tag (for routing)
+   ================================ ======================
+   ``pod12-node4``                  node4
+   ``worker``                       node1
+   ================================ ======================
+- **Step 5**
+   ================================ ======================
+   Tag                              elasticsearch
+   ================================ ======================
+   ``node4``                        index “node4*”
+   ``node1``                        index “node1*”
+   ================================ ======================
+
+Elastalert
+----------
+
+Send alert if
+``````````````
+- Blacklist
+    - "Failed to run test"
+    - "Failed to execute in '30' seconds"
+    - "('Result', 'Failed')"
+    - "could not open socket: connection refused"
+    - "Input/output error"
+    - "dpdk|ERR|EAL: Error - exiting with code: 1"
+    - "Failed to execute in '30' seconds"
+    - "dpdk|ERR|EAL: Driver cannot attach the device"
+    - "dpdk|EMER|Cannot create lock on"
+    - "dpdk|ERR|VHOST_CONFIG: * device not found"
+- Time
+    - vswitch_duration > 3 sec
+
+How to configure alert?
+````````````````````````
+- Add your rule in ``ansible/roles/logging/files/elastalert/ealert-rule-cm.yaml`` (`Elastalert Rule Config <https://elastalert.readthedocs.io/en/latest/ruletypes.html>`_)
+    | name: anything
+    | type: <check-above-link> #The RuleType to use
+    | index: node4*   #index name
+    | realert:
+    |   minutes: 0    #to get alert for all cases after each interval
+    | alert: post #To send alert as HTTP POST
+    | http_post_url: "http://url"
+
+- Mount this file to elastalert pod in ``ansible/roles/logging/files/elastalert/elastalert.yaml``.
+
+Alert Format
+````````````
+{"type": "pattern-match", "label": "failed", "index": "node4-20200815", "log": "error-log-line", "log-path": "/tmp/result/file.log", "reson": "error-message" }
+
+Data Management
+===============
+
+Elasticsearch
+-------------
+
+Where data is stored now?
+`````````````````````````
+Data is stored in NFS server with 1 replica of each index (default). Path of data are following:
+  - ``/srv/nfs/data (VM1)``
+  - ``/srv/nfs/data (VM2)``
+  - ``/srv/nfs/data (VM3)``
+  - ``/srv/nfs/master (VM1)``
+  - ``/srv/nfs/master (VM2)``
+  - ``/srv/nfs/master (VM3)``
+If user wants to change from NFS to local storage
+``````````````````````````````````````````````````
+Yes, user can do this, need to configure persistent volume. (``ansible-server/roles/logging/files/persistentVolume.yaml``)
+
+Do we have backup of data?
+````````````````````````````
+1 replica of each index
+
+When K8s restart, the data is still accessible?
+`````````````````````````````````````````````````````
+Yes (If data is not deleted from /srv/nfs/data)
+
+Troubleshooting
+===============
+If no logs receiving in Elasticsearch
+--------------------------------------
+- Check IP & port of server-fluentd in client config.
+- Check client-fluentd logs, ``$ sudo tail -f /var/log/td-agent/td-agent.log``
+- Check server-fluentd logs, ``$ sudo kubectl logs -n logging <fluentd-pod-name>``
+
+If no notification received
+---------------------------
+- Search your "log" in Elasticsearch.
+- Check config of elastalert
+- Check IP of alert-receiver
+
+Reference
+=========
+- `Elastic cloud on K8s <https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html>`_
+- `HA Elasticsearch on K8s <https://www.elastic.co/blog/high-availability-elasticsearch-on-kubernetes-with-eck-and-gke>`_
+- `Fluentd Configuration <https://docs.fluentd.org/configuration/config-file>`_
+- `Elastalert Rule Config <https://elastalert.readthedocs.io/en/latest/ruletypes.html>`_
\ No newline at end of file