Portal:Cloud VPS/Admin/Runbooks/NovaFullstackStaleStats

From Wikitech

Overview

The procedures in this runbook require admin permissions to complete.

Error / Incident

Some of the stats used for the novafullstack alerts were not reported for some time.

Debugging

Check that the prometheus series are there, search for the alert name in the alerts.git repo, and look for the expr line, would be something like:

expr: count(cloudvps_novafullstack_instances_count) == 0 or count(cloudvps_novafullstack_instances_max) == 0

From there each of the stats inside a count function might be the one failing (or all of them!), so you can go to thanos or grafana and query them there.

This might mean:

  • That the novafullstack service is misbehaving
  • That the stats names did change
  • That the service is down
  • That the prometheus stats are not being generated (usually under /var/lib/prometheus/node.d/nofafullstack.prom in the cloudcontrol that is running the novafullstack service).

Common issues

Add any new issues you find here.

Related information

Support contacts

Communication and support

Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:

Discuss and receive general support
Stay aware of critical changes and plans
Track work tasks and report bugs

Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself

Read stories and WMCS blog posts

Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)

Old incidents

Add any tasks for incidents related to this alert here.