Jump to content

Grafana/Best practices/Getting Started with Thanos panels

From Wikitech

This is a draft/WIP space for a Getting Started/Recipe/Cookbook page/section for common graphing with Grafana and Prometheus.

Important note: 2m min interval in Query options

In almost all cases, you want to set the Min interval in the Query Options to 2m. The background-context for that is that the "scrape-interval" with which Prometheus gets a new data-point is the default 60 seconds[citation needed]. That means that at least 2 intervals are required to reliably cover the data-points. (Though note that the usual recommendations seems to be to use 4 intervals[1].)

In some cases you may wish to set a longer Min interval, for example when you want to show daily or hourly data.

Counters

Counter: rate of things per second

Use this for counters that typical count something during a web-request.

sum by(label1, label2) (
    rate(mediawiki_YourComponent_your_metric_name_total[$__rate_interval])
)

You can skip by(label1, label2) if you have no labels in your metric.

Unit: This will produce a metric that shows the number of counts per second averaged over the interval displayed (which depends dynamically on the time-range of your panel). So you should probably set the y-axis label to something like "things / s".


See also: https://grafana.wikimedia.org/d/e16fec87-72a6-405e-b931-d04a0bc26e48/cdanis-temporal-downsampling-considered-harmful-max-over-time-rate-interval-trick?orgId=1 TODO: add info here when that link is relevant

Counter: number of things per hour/day

TODO: add info here how to summarize per day

Counter from a maintenance script: number of things per hour/day

Not as useful for counters that are only incremented like once per hours via a maintenance script and actually the increment is the value of the thing.

Timings

75th percentile

The following produces something close to the 75th percentile of timings collected

histogram_quantile(
  0.75,
  sum by(le, your_label_2) (
    rate(mediawiki_YourComponent_your_metric_seconds_bucket{your_label_1="some-value"}[$__rate_interval])
  )
)
  1. https://grafana.com/events/grafanacon/2020/prometheus-rate-queries-in-grafana/