Portal:Cloud VPS/Admin/Runbooks/NovaFullstackStaleStats

From Wikitech
Jump to navigation Jump to search

Overview

The procedures in this runbook require admin permissions to complete.

Error / Incident

Some of the stats used for the novafullstack alerts were not reported for some time.

Debugging

Check that the prometheus series are there, search for the alert name in the alerts.git repo, and look for the expr line, would be something like:

expr: count(cloudvps_novafullstack_instances_count) == 0 or count(cloudvps_novafullstack_instances_max) == 0

From there each of the stats inside a count function might be the one failing (or all of them!), so you can go to thanos or grafana and query them there.

This might mean:

  • That the novafullstack service is misbehaving
  • That the stats names did change
  • That the service is down
  • That the prometheus stats are not being generated (usually under /var/lib/prometheus/node.d/nofafullstack.prom in the cloudcontrol that is running the novafullstack service).

Common issues

Add any new issues you find here.

Related information

Support contacts

Communication and support

Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:

Discuss and receive general support
Receive mail announcements about critical changes
Subscribe to the cloud-announce@ mailing list (all messages are also mirrored to the cloud@ list)
Track work tasks and report bugs
Use the Phabricator workboard #Cloud-Services for bug reports and feature requests about the Cloud VPS infrastructure itself
Learn about major near-term plans
Read the News wiki page
Read news and stories about Wikimedia Cloud Services
Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)

Old incidents

Add any tasks for incidents related to this alert here.