Kubernetes/Metrics
Introduction
This page describes the high level overview of how metrics are collected in our wikikube production kubernetes clusters.
Other kubernetes installations/clusters right now re-use the approaches defined here, however they might want eventually to do different things. While this describes the default mode and strongly encouraged mode, services can always devise other ways of handling their metrics if needed (with enough justification).
Overview
Overall, almost everything is automatically discovered and scraped by our Prometheus infrastructure. On each of datacenter specific prometheus servers, we run 1 extra prometheus instance per kubernetes cluster that we have. This means that metrics are doubly scraped. This is by design for availability purposes.
Using the proper form of kubernetes_sd_config, prometheus is given a read-only and purposefully scoped token to talk to the kubernetes API, discovers various resources and adds them as prometheus targets automatically.
The following "roles" types can be configured:
- node
- service
- pod
- endpoints
- endpointsslice
- ingress
In the next section we discuss more how we use each
Control-plane metrics
By control plane in kubernetes terminology we usual mean the following components
- kube-apiserver
- kube-controller-manager
- kube-scheduler
- kubelet
- kube-proxy
- etcd
Of the above components, etcd runs on dedicated VMs on Ganeti, kube-apiserver, kube-controller-manager, kube-scheduler ran on the kubernetes master. kubelet, kube-proxy run on every kubernetes node.
All components are being automatically discovered and scraped by our Prometheus infrastructure, with the exception of etcd which is a manually set up.
apiserver
- We use the endpoints role of the kubernetes_sd_config stanza to scrape each kubernetes master and get metrics out of the respective
/metrics
endpoint.
kube-controller-manager
- We don't scrape this component yet
kube-scheduler
- We don't scrape this component yet
kubelet-kube-proxy
- We use the node role of the kubernetes_sd_config stanza to scrape the kubelet for 2 different endpoints.
/metrics
and/metrics/cadvisor.
This is because of the kubelet exposing them that way. The first endpoint exposes metrics about the kubelet itself, the second exposes metrics about the containers running on the node. - Then we use the same role to also scrape the node's kube-proxy in the respective
/metrics
endpoint to fetch those metrics
etcd
- Etcd doesn't reside on the kubernetes cluster, so it is not automatically discovered. It is scraped with the standard practices described in Prometheus
Cluster components metrics
Cluster components are workloads that the kubernetes cluster relies on for normal operations, but they aren't part of the Control Plane itself. In our case those run either as DaemonSets or Deployments in specific (privileged) kubernetes namespaces. A non exhaustive list follows:
- Calico-node
- Calico-typha
- CoreDNS
- Eventrouter
More will be added every now and then in order to accomplish various goals.
All of these components, as far as their metrics go, as treated as usual Workloads/Pods so please refer to the section below.
Workload/Pod metrics
All workloads residing on a pod will be discovered and scraped automatically provided they are hinting they want that behavior. The workloads should of course expose their metrics in a Prometheus compatible way. If the functionality isn't available but statsd functionality exists, there is a exporter that can be used for convert from statsd to prometheus. See Prometheus/statsd k8s
There are two ways to have container metrics scraped in Kubernetes:
- Explicitly define a port and a metrics endpoint to scrape using the
prometheus.io/port
annotation - Scrape every defined containerPort within a pod for /metrics (or whatever is specified as path in
prometheus.io/scrape
annotation).
Scraping all containerPorts with a name ending in -metrics
This is the preferred way of configuring scraping, as it gives the developer the ability to define multiple ports that can be scraped, without needing explicit declarations in annotations, and without scraping every exposed port unconditionally.
To enable this behaviour, you need to declare prometheus.io/scrape_by_name: true
in your Deployment. In terms of the WMF helm charts, if your Deployment template includes base.meta.annotations
, this is enabled by setting monitoring.named_ports: true
in your values.yaml file.
Scraping all containerPorts
To use the scrape-all behaviour, simply do not define the prometheus.io/port
annotation in your chart. This approach will then scrape all defined containerPorts for prometheus.io/scrape
path.
For future-proofing purposes, please end your containerPort name in -metrics
to ensure that it is scraped.
Scraping a specific port/path combination
Explicily scraping a single port is controlled by 4 helm chart annotations: These are:
- prometheus.io/port. Integer. The http port on which to scrape the pod. If omitted, defaults to the pod declared port. This is only useful if the prometheus port is different from the main pod port (e.g. using statsd exporter)
- prometheus.io/scrape: Boolean. Whether the pod is to be scraped or not. Defaults to false
- prometheus.io/path: String. The http endpoint under which prometheus metrics are exposed. Defaults to /metrics
Note that since most have sane default, the only annotation explicitly needed to have a workload scraped is prometheus.io/scrape: "true".
A non complete example from a helm chart can be found below:
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ template "wmf.releasename" . }}
labels:
app: {{ template "wmf.chartname" . }}
chart: {{ template "wmf.chartid" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
selector:
matchLabels:
app: {{ template "wmf.chartname" . }}
release: {{ .Release.Name }}
replicas: {{ .Values.resources.replicas }}
template:
metadata:
labels:
app: {{ template "wmf.chartname" . }}
release: {{ .Release.Name }}
routed_via: {{ .Values.routed_via | default .Release.Name }}
annotations:
checksum/config: {{ include "config.app" . | sha256sum }}
{{ if .Values.monitoring.enabled -}}
checksum/prometheus-statsd: {{ .Files.Get "config/prometheus-statsd.conf" | sha256sum }}
{{ end -}}
prometheus.io/port: "9102"
prometheus.io/scrape: "true"
{{- include "tls.annotations" . | indent 8 }}
spec:
blahblah
Note the 2 prometheus.io annotations. The usage of the port annotation is there to accomodate for the prometheus-statsd exporter. This is from an actual chart used in production.