Jump to content

Runbook

From Wikitech

A runbook is a set of instructions for a human what to do. More specifically what to do when a certain monitoring alert triggers.

Prometheus alerting rules link to this page in a service-specific anchor based on the service name itself; from the service entry you can give more context, next actions, and link to other resources such as more runbooks, service dashboards, etc.

Pages that contain runbooks are linked from Icinga checks in puppet using the notes_url parameter, or alertmanager rules with the runbook annotation.

There is a Category:Runbooks for pages with runbooks.

Compare to cookbooks which are programs running in spicerack/cumin to do maintenance tasks.

service-name:port

This is the entry linked from generic alerts for service-name:port, for example ProbeDown will link here when network probes for service-name fail. See also bug T312947.

apt:80

See APT_repository

grafana:443

See Grafana.wikimedia.org

graphite:443

See Graphite

helm-charts:443

This is the service powered by chartmuseum, see ChartMuseum

jobrunner:443

See Application servers/Runbook#Jobrunners

librenms:443

See LibreNMS

netbox:443

See Netbox

puppetboard:443

See Puppet

puppetdb-api:8090

See Puppet#Micro_Service

releases:443

See Releases.wikimedia.org

thanos-query:443

See Thanos#Alerts

upload-https:443

Check NEL

videoscaler:443

See Application servers/Runbook#Jobrunners

tools-k8s-haproxy-3:30000

See Portal:Toolforge/Admin/Runbooks/k8s-haproxy

tools-k8s-haproxy-4:30000

See Portal:Toolforge/Admin/Runbooks/k8s-haproxy

gerrit1003:443

Check Apache Dashboard for worker saturation. If so, run sudo systemctl restart apache2. See also Gerrit/Operations.