Kubernetes/Logging

From Wikitech
Jump to navigation Jump to search

Introduction

This page describes the high level overview of how logging works in our services/main production kubernetes cluster. Other kubernetes installations/cluster might re-use this approach or do different things. While this describes the default mode and highly encouraged mode, services can always devise other ways of handling their logs if needed.

Control-plane logging

By control plane in kubernetes terminology we usual mean the following components

  • kube-apiserver
  • kube-controller-manager
  • kube-scheduler
  • kubelet
  • kube-proxy
  • etcd


Of the above components, etcd runs on dedicated VMs on Ganeti and uses standard systemd-journal logging practices. kube-apiserver, kube-controller-manager, kube-scheduler ran on the kubernetes master and follow standard systemd-journal practices. kubelet, kube-proxy run on every kubernetes node and follow standard systemd-journal logging practices. Those logs are not yet sent to logstash but are sent to centrallog

Cluster components logging

Cluster components are workloads that the kubernetes cluster relies on for normal operations, but they aren't part of the Control Plane itself. In our case those run either as DaemonSets or Deployments in specific (privileged) kubernetes namespaces. Those, at the time of this writing (2021-09-29) are:

  • Calico-node
  • Calico-typha
  • CoreDNS
  • Eventrouter

More will be added every now and then in order to accomplish various goals.

All of these components, as far as their logging goes, as treated as usual Workloads/Pods so please refer to that section.

Workload/Pod logging

There are 5 different logging schemes described in the kubernetes Logging Architecture docs. Of those we follow for almost all intents and purposes the Node logging agent pattern. In the past, we did follow the Direct Logging so one might find some leftovers from that era (old configurations mostly)

The Node logging agent pattern is best described with the below generic diagram from upstream.

kubernetes node logging agent

In our infrastructure:

  • We don't use logrotate, but rather have docker rotate logs (with no old versions) at 100MB size
  • Our logging agent doesn't run in a pod, but rather directly on the node. It's rsyslogd with specific configuration (more below)
  • The logging backend is the Logstash#Production Logstash Architecture used across the entire infrastructure (the main reason we adopted that approach is to reuse that infrastructure and not reinvent the wheel).

rsyslogd

Kubernetes services all log to stdout/stderr. Kubernetes configures docker to log these using the json-file driver to disk. The end paths are under

/var/lib/docker/containers/<container_id>-json.log

Since those are not container runtime engine independent, the kubelet sets up another set of paths

/var/log/pods/<pod_name>/<container_name/<number>.log

which is a symlink to the above one, maintained by the kubelet.

It also sets up paths of the form

/var/log/containers/<pod_container_name>.log

which are symlinks to the /var/log/pods hierarchy.

rsyslog, parses the latter form(/var/log/containers/*.log), and with the addition of the mmkubernetes plugin, which is able to talk to the kubernetes API, enriches the docker container logs with metadata from the Kubernetes API. Some examples of metadata being added to each log entry are:

  • kubernetes_namespace
  • kubernetes_namespace labels
  • pod name
  • pod labels

Exceptions

API_Gateway is a notable exception in there is also a fluentd in the pod to provide analytics. See API_Gateway#Logs_and_analytics