Jump to content

Prometheus/statsd k8s

From Wikitech

Introduction / motivation

In parallel with moving services to kubernetes (k8s) we're also moving services away from statsd metrics and onto Prometheus instead. For more information on Prometheus see also https://wikitech.wikimedia.org/wiki/Prometheus and https://phabricator.wikimedia.org/T205870

Not all services support Prometheus out of the box; as a compatibility layer between Prometheus and statsd we're providing statsd_exporter (https://github.com/prometheus/statsd_exporter): a daemon to receive statsd metrics, map them onto Prometheus metrics and expose the result on HTTP for Prometheus to ingest.

The conversion done by statsd_exporter is driven by a set of mappings that will need to be provided as part of the service's helm chart. This document provides guidelines and best practices for service owners on how to write said mappings.

One of the biggest differences between statsd and Prometheus is the data model: statsd/graphite have a flat hierarchy derived by the metric name, whereas Prometheus associates multiple key/value pairs to a single metric name. See also this guide for more information on the data model: https://prometheus.io/docs/concepts/data_model/

Choosing metric names and tags

In WMF's deployment we're following Prometheus' naming best practices as close as possible, thus https://prometheus.io/docs/practices/naming/ is a very good read to get started on what's in a Prometheus metric name and how to best pick metric names for services.

Global aggregation considerations

One of the big paradygm shifts between statsd and Prometheus has to do with metric aggregation across hosts and service instances: at the moment the way aggregation works is that services and hosts send their statsd traffic to statsd.eqiad.wmnet and aggregation is performed centrally.

For services running in k8s and statsd_exporter the aggregation is done at the single service instance level; in other words services send their statsd traffic to localhost for statsd_exporter to turn that into Prometheus metrics. The resulting metrics are then periodically fetched (scraped) by Prometheus. Moving aggregation to service instances works most of the time for most metric data types (see also https://prometheus.io/docs/concepts/metric_types/) though notably summaries (e.g. percentiles/quantiles of how long an operation took) cannot be aggregated in a statistically meaningful way across different service instances. A similar (aggregatable) data type is provided in the form of histograms: each timing observation is put into "count of observations that took less than" buckets, and percentiles on aggregated values can be calculated server-side by Prometheus. See also https://prometheus.io/docs/practices/histograms/ for a more detail explanation of summaries and histograms.

Example mappings

The statsd_exporter mappings configuration below provides an example common to all service-runner based services: namely gc and heap stats. The mappings can be extended to add service-specific metrics (e.g. request latency)

  - match: '*.gc.*'
    name: 'service_runner_gc_${2}_nanoseconds'
    timer_type: histogram
    buckets: [ 5e+5, 1e+6, 5e+6, 10e+6, 15e+6, 30e+6, 50e+6 ]
      service: '$1'
  # workaround, service-runner sends heap stats as timers ATM 
  - match: '*.heap.*'
    name: 'service_runner_heap_${2}_bytes'
    timer_type: histogram
    buckets: [1e+6, 1e+7, 1e+8, 1e+9]
      service: '$1'

The resulting metrics e.g. for mathoid look something like this (modulo heap stats that should be a gauge not a timer):

# curl -s localhost:9112/metrics | grep -i service
# HELP service_runner_gc_minor_nanoseconds Metric autogenerated by statsd_exporter.
# TYPE service_runner_gc_minor_nanoseconds histogram
service_runner_gc_minor_nanoseconds_bucket{service="mathoid",le="500000"} 15
service_runner_gc_minor_nanoseconds_bucket{service="mathoid",le="1e+06"} 15
service_runner_gc_minor_nanoseconds_bucket{service="mathoid",le="5e+06"} 15
service_runner_gc_minor_nanoseconds_bucket{service="mathoid",le="1e+07"} 15
service_runner_gc_minor_nanoseconds_bucket{service="mathoid",le="1.5e+07"} 15
service_runner_gc_minor_nanoseconds_bucket{service="mathoid",le="3e+07"} 15
service_runner_gc_minor_nanoseconds_bucket{service="mathoid",le="5e+07"} 15
service_runner_gc_minor_nanoseconds_bucket{service="mathoid",le="+Inf"} 15
service_runner_gc_minor_nanoseconds_sum{service="mathoid"} 19873.929999999997
service_runner_gc_minor_nanoseconds_count{service="mathoid"} 15
# HELP service_runner_heap_rss_bytes Metric autogenerated by statsd_exporter.
# TYPE service_runner_heap_rss_bytes histogram
service_runner_heap_rss_bytes_bucket{service="mathoid",le="1e+07"} 1
service_runner_heap_rss_bytes_bucket{service="mathoid",le="1e+08"} 1
service_runner_heap_rss_bytes_bucket{service="mathoid",le="1e+09"} 1
service_runner_heap_rss_bytes_bucket{service="mathoid",le="+Inf"} 1
service_runner_heap_rss_bytes_sum{service="mathoid"} 132775.936
service_runner_heap_rss_bytes_count{service="mathoid"} 1
# HELP service_runner_heap_total_bytes Metric autogenerated by statsd_exporter.
# TYPE service_runner_heap_total_bytes histogram
service_runner_heap_total_bytes_bucket{service="mathoid",le="1e+07"} 1
service_runner_heap_total_bytes_bucket{service="mathoid",le="1e+08"} 1
service_runner_heap_total_bytes_bucket{service="mathoid",le="1e+09"} 1
service_runner_heap_total_bytes_bucket{service="mathoid",le="+Inf"} 1
service_runner_heap_total_bytes_sum{service="mathoid"} 98381.824
service_runner_heap_total_bytes_count{service="mathoid"} 1
# HELP service_runner_heap_used_bytes Metric autogenerated by statsd_exporter.
# TYPE service_runner_heap_used_bytes histogram
service_runner_heap_used_bytes_bucket{service="mathoid",le="1e+07"} 1
service_runner_heap_used_bytes_bucket{service="mathoid",le="1e+08"} 1
service_runner_heap_used_bytes_bucket{service="mathoid",le="1e+09"} 1
service_runner_heap_used_bytes_bucket{service="mathoid",le="+Inf"} 1
service_runner_heap_used_bytes_sum{service="mathoid"} 85099.288
service_runner_heap_used_bytes_count{service="mathoid"} 1