Portal:Data Services/Admin/Runbooks

From Wikitech

Runbooks for Data Services

This page contains basic resources for developers who want to author and publish runbooks for Data Services on Wikitech.

Overview

This page contains basic resources for developers who want to author and publish runbooks for Data Services on Wikitech.

Before you begin

Permissions: Some processes documented in runbooks for Data Services will need varying levels of permission to perform. Make sure you have appropriate permissions to complete your task.

What is a runbook?

A runbook is a detailed set of instructions that explain how to perform a common task or procedure, so it can be easily and accurately repeated by others. Runbooks are particularly useful for incident response operations. Creating runbooks in response to specific incidents, makes it possible for people to repeat the steps in response to similar incidents. Because they are often used to help people respond to incidents quickly, they should be easy to read and follow, consistent, and accurate.

When should you use a runbook?

A runbook should be used whenever a common task or procedure may need to be repeated by mulitple people.

Tips for building useful runbooks

  • Keep them up-to-date. Revisit the instructions after each incident to make sure they are clear and accurate.
  • Keep them simple. These are sets of detailed instructions, and they do not require extra history or context.
    • Publish a separate runbook for each issue or incident
  • Test them. Make sure the instructions are repeatable by others by asking others to follow them and provide feedback.
  • Follow a template. Make sure your runbooks follow a template, so that people can find and update information easily.

Data Services runbook templates

Instructions for each issue or alert should have its own page in order to reduce unnecessary information. You can see an example here with the issue Failover an NFS cluster. Note that the page addresses one issue only.

You can find a template/outline for creating runbooks here: Portal:Data_Services/Admin/Runbooks/Runbook_template

Where to publish runbooks for Data Services

Where to publish Data Services runbooks
Portal:Data_Services/Admin/Runbooks/(PAGE NAME)
Note: This is a subpage of admin documentation. Some procedures may require advanced admin permissions to complete.

Note: Most runbooks for Data Services will include procedures that can only be followed by individuals with admin access. In order to avoid confusion and frustration for general users, you should note at the content level when a procedure will require admin permissions.

When an entire runbook requires admin permissions to complete procedures, mark it with the following template:

The procedures in this runbook require admin permissions to complete.

When runbooks include information for general users and special instructions for admins, mark any instructions (inline) for admins with the following template:

Requires admin permissions

Existing runbooks

Communication and support

Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:

Discuss and receive general support
Stay aware of critical changes and plans
Track work tasks and report bugs

Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself

Read stories and WMCS blog posts

Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)

See Also

Category:Runbooks