Portal:Toolforge/Admin/Runbooks
This page contains basic resources for developers who want to author and publish runbooks for Toolforge on Wikitech.
Before you begin
Permissions: Some processes documented in runbooks for Toolforge will need varying levels of permission to perform. Make sure you have appropriate permissions to complete your task.
What is a runbook?
A runbook is a detailed set of instructions that explain how to perform a common task or procedure, so it can be easily and accurately repeated by others. Runbooks are particularly useful for incident response operations. Creating runbooks in response to specific incidents, makes it possible for people to repeat the steps in response to similar incidents. Because they are often used to help people respond to incidents quickly, they should be easy to read and follow, consistent, and accurate.
When should you use a runbook?
A runbook should be used whenever a common task or procedure may need to be repeated by mulitple people.
Tips for building useful runbooks
- Keep them up-to-date. Revisit the instructions after each incident to make sure they are clear and accurate.
- Keep them simple. These are sets of detailed instructions, and they do not require extra history or context.
- Publish a separate runbook for each issue or incident
- Test them. Make sure the instructions are repeatable by others by asking others to follow them and provide feedback.
- Follow a template. Make sure your runbooks follow a template, so that people can find and update information easily.
Cloud VPS runbook templates
Instructions for each issue or alert should have its own page in order to reduce unnecessary information. You can see an example here with the issue Check for VMs leaked by the nova-fullstack test. Note that the page addresses one issue only.
You can find a template/outline for creating runbooks here: Portal:Cloud_VPS/Admin/Runbooks/Runbook_template
Where to publish runbooks for Toolforge
Where to publish Toolforge runbooks |
---|
Portal:Toolforge/Admin/Runbooks/(PAGE NAME) |
Note: This is a subpage of admin documentation. Some procedures may require advanced admin permissions to complete. |
Note: Many runbooks for Toolforge will include procedures that can only be followed by individuals with admin access. In order to avoid confusion and frustration for general users, you should note at the content level when a procedure will require admin permissions.
When an entire runbook requires admin permissions to complete procedures, mark it with the following template:
When runbooks include information for general users and special instructions for admins, mark any instructions (inline) for admins with the following template: Requires admin permissions
Existing runbooks
- BuildsApiDown
- BuildsApiUpMetricUnknown
- EnvvarsAdmissionDown
- EnvvarsApiDown
- EnvvarsApiUpMetricUnknown
- HarborComponentDown
- HarborDown
- HarborProbeUnknown
- JobsApiDown
- JobsApiUpMetricUnknown
- Kyverno
- MaintainKubeusersDown
- PrometheusK8sCertExpirySoon
- Redis
- TektonDown
- TektonUpMetricUnknown
- ToolforgeKubernetesCapacity
- ToolforgeKubernetesHAproxyServerDown
- ToolforgeKubernetesHAproxyUnknown
- ToolforgeKubernetesNodeNotReady
- ToolforgeKubernetesWorkerTooManyDProcesses
- Toolforge Kyverno low policy resources
- Toolforge Kyverno no policy resources
- Toolforge Kyverno unknown state
- ToolsDBReplication
- ToolsNFSDown
- ToolsNfsAlmostFull
- ToolsToolsDBWritableState
- k8s-haproxy
Communication and support
Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:
- Chat in real time in the IRC channel #wikimedia-cloud connect or the bridged Telegram group
- Discuss via email after you have subscribed to the cloud@ mailing list
- Subscribe to the cloud-announce@ mailing list (all messages are also mirrored to the cloud@ list)
- Read the News wiki page
Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself
Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)
See Also
- Category:Runbooks - Wikitech pages marked as runbooks.
- WMCS Admin documentation - Advanced documenation for WMCS administrators.
- Runbooks entry on Wikipedia