If you are here because you want to know how your pod will be scraped, see the Workload/Pods metrics section below.

Introduction

This page describes the high level overview of how metrics are collected in our wikikube production kubernetes clusters.

Other kubernetes installations/clusters right now re-use the approaches defined here, however they might want eventually to do different things. While this describes the default mode and strongly encouraged mode, services can always devise other ways of handling their metrics if needed (with enough justification).

Overview

Overall, almost everything is automatically discovered and scraped by our Prometheus infrastructure. On each of datacenter specific prometheus servers, we run 1 extra prometheus instance per kubernetes cluster that we have. This means that metrics are doubly scraped. This is by design for availability purposes.

Using the proper form of kubernetes_sd_config, prometheus is given a read-only and purposefully scoped token to talk to the kubernetes API, discovers various resources and adds them as prometheus targets automatically.

The following "roles" types can be configured:

node
service
pod
endpoints
endpointsslice
ingress

In the next section we discuss more how we use each

Control-plane metrics

By control plane in kubernetes terminology we usual mean the following components

kube-apiserver
kube-controller-manager
kube-scheduler
kubelet
kube-proxy
etcd

Of the above components, etcd runs on dedicated VMs on Ganeti, kube-apiserver, kube-controller-manager, kube-scheduler ran on the kubernetes master. kubelet, kube-proxy run on every kubernetes node.

All components are being automatically discovered and scraped by our Prometheus infrastructure, with the exception of etcd which is a manually set up.

apiserver

We use the endpoints role of the kubernetes_sd_config stanza to scrape each kubernetes master and get metrics out of the respective /metrics endpoint.

kube-controller-manager

We don't scrape this component yet

kube-scheduler

We don't scrape this component yet

kubelet-kube-proxy

We use the node role of the kubernetes_sd_config stanza to scrape the kubelet for 2 different endpoints. /metrics and /metrics/cadvisor. This is because of the kubelet exposing them that way. The first endpoint exposes metrics about the kubelet itself, the second exposes metrics about the containers running on the node.
Then we use the same role to also scrape the node's kube-proxy in the respective /metrics endpoint to fetch those metrics

etcd

Etcd doesn't reside on the kubernetes cluster, so it is not automatically discovered. It is scraped with the standard practices described in Prometheus

Cluster components metrics

Cluster components are workloads that the kubernetes cluster relies on for normal operations, but they aren't part of the Control Plane itself. In our case those run either as DaemonSets or Deployments in specific (privileged) kubernetes namespaces. A non exhaustive list follows:

Calico-node
Calico-typha
CoreDNS
Eventrouter

More will be added every now and then in order to accomplish various goals.

All of these components, as far as their metrics go, as treated as usual Workloads/Pods so please refer to the section below.

Workload/Pod metrics

All workloads residing on a pod will be discovered and scraped automatically provided they are hinting they want that behavior. The workloads should of course expose their metrics in a Prometheus compatible way. If the functionality isn't available but statsd functionality exists, there is a exporter that can be used for convert from statsd to prometheus. See Prometheus/statsd k8s

There are two ways to have container metrics scraped in Kubernetes:

Explicitly define a port and a metrics endpoint to scrape using the prometheus.io/port annotation
Scrape every defined containerPort within a pod for /metrics (or whatever is specified as path in prometheus.io/scrape annotation).

Scraping all containerPorts with a name ending in -metrics

This is the preferred way of configuring scraping, as it gives the developer the ability to define multiple ports that can be scraped, without needing explicit declarations in annotations, and without scraping every exposed port unconditionally.

To enable this behaviour, you need to declare prometheus.io/scrape_by_name: true in your Deployment. In terms of the WMF helm charts, if your Deployment template includes base.meta.annotations, this is enabled by setting monitoring.named_ports: true in your values.yaml file.

Scraping all containerPorts

To use the scrape-all behaviour, simply do not define the prometheus.io/port annotation in your chart. This approach will then scrape all defined containerPorts for prometheus.io/scrape path. For future-proofing purposes, please end your containerPort name in -metricsto ensure that it is scraped.

Scraping a specific port/path combination

Explicily scraping a single port is controlled by 4 helm chart annotations: These are:

prometheus.io/port. Integer. The http port on which to scrape the pod. If omitted, defaults to the pod declared port. This is only useful if the prometheus port is different from the main pod port (e.g. using statsd exporter)
prometheus.io/scrape: Boolean. Whether the pod is to be scraped or not. Defaults to false
prometheus.io/path: String. The http endpoint under which prometheus metrics are exposed. Defaults to /metrics

Note that since most have sane default, the only annotation explicitly needed to have a workload scraped is prometheus.io/scrape: "true".

A non complete example from a helm chart can be found below:

 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: {{ template "wmf.releasename" . }}
  labels:
    app: {{ template "wmf.chartname" . }}
    chart: {{ template "wmf.chartid" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
 spec:
  selector:
    matchLabels:
      app: {{ template "wmf.chartname" . }}
      release: {{ .Release.Name }}
  replicas: {{ .Values.resources.replicas }}
  template:
    metadata:
      labels:
        app: {{ template "wmf.chartname" . }}
        release: {{ .Release.Name }}
        routed_via: {{ .Values.routed_via | default .Release.Name }}
      annotations:
        checksum/config: {{ include "config.app" . | sha256sum }}
        {{ if .Values.monitoring.enabled -}}
        checksum/prometheus-statsd: {{ .Files.Get "config/prometheus-statsd.conf" | sha256sum }}
        {{ end -}}
        prometheus.io/port: "9102"
        prometheus.io/scrape: "true"
        {{- include "tls.annotations" . | indent 8 }}
    spec:
     blahblah

Note the 2 prometheus.io annotations. The usage of the port annotation is there to accomodate for the prometheus-statsd exporter. This is from an actual chart used in production.