Monitoring Shoot workloads

The Shoot clusters are deployed without a Monitoring and Logging subsystem. It is recommended to monitor the workloads. It is the Customer's responsibility to deploy a suitable Monitoring system, or to integrate the workload with the Customer's already existing Monitoring system.

Upstream Gardener Shoot Infrastructure Monitoring and Logging

There is a pre-configured Monitoring and Logging subsystem that originates from the upstream Gardener project, which is officially not supported by the OSC. Access to this Monitoring and Logging can be found within the Gardener or OSC dashboard.

Monitoring and logging tools

There are multiple software tools, which work together to allow gathering, storing and displaying the metrics and logs, and to produce and send alerts based on them.

Info

You may use the links in the list below to access the home pages of some of these tools with their documentation. Please study them to acquire sufficient knowledge for implementation and usage.

Various specialized exporters which reads metrics from a particular source (e.g. MariaDB or Nginx) and publish them in Prometheus format via HTTP
Prometheus itself, which scrapes and stores or forward these metrics
Grafana, which displays metrics and logs in the form of dashboards.
Grafana Mimir, a distributed, multi-tenancy metric storage and alerting system
Grafana Loki, a distributed log storage system
Grafana Promtail, Fluentbit, which reads and processes logs from different sources and sends them to a central log storage
The Elastic Stack (or ELK Stack), consisting of Elasticsearch (search and analytics), Logstash (processing, transformation and transportation of log data) and Kibana (user interface)

Example for a local monitoring

The following describes how a local monitoring stack can be deployed. The Kube Prometheus Stack will be used in this example. This stack deploys Prometheus with a number of exporters and rules and Grafana with a collection of dashboards.

Add the Helm repository:

helm repo add prometheus-community \
    https://prometheus-community.github.io/helm-charts
helm repo update

Define the metrics storage: By default the stack will store metrics in the Pod's ephemeral volume. A better approach is to store it on a Persistent Volume. For this, get the default values for the Helm chart:

helm show values prometheus-community/kube-prometheus-stack > values.yaml

Adding the following to the resulting values.yaml file defines the Persistent Volumes:

prometheus:
  prometheusSpec:
    storageSpec: {}
    volumeClaimTemplate:
      spec:
        storageClassName: default
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 50Gi

Install the stack:

helm install -f values.yaml prometheus-community/kube-prometheus-stack

Obtain the password to Grafana's admin user with the following command:

k get secrets -n monitoring prometheus-grafana \
    -o jsonpath='{.data.admin-password}' | base64 -d

Open Grafana by forwarding its service's port and use the username and password from the previous step:
```
k port-forward -n monitoring services/prometheus-grafana 3000:80
```

Accessing pod logs

The logging features of Kubernetes are discussed in its documentation. In its most basic form it allows to fetch the output of a running pod. If there is a pod named myapp in the app namespace, its log can be fetched with:

kubectl logs -n app myapp

Or, for a specific container of the pod:

kubectl logs -n app myapp -c container_name

If a container restarts, the logs of the previous instance is kept, and can be accessed:

kubectl logs --previous

Note

If the pod is removed from the node, all its logs are also removed. For that reason this type of basic logging may not be sufficient for productive workloads.

Log aggregation

For productive workloads it is recommended to deploy a more elaborate log aggregation solution, utilizing the tools mentioned in the section Monitoring and logging tools.

The log scrapers are typically deployed to each node. They read the pods' logs, may pre-process them and they send them to a central location for storing and further processing.

Log storage tools store the logs for a defined time. They can index or label the logs for easier querying.

Logs from several nodes or even clusters can then be displayed via a dashboard tool.