Portal:Cloud VPS/Admin/Cookbooks

From Wikitech

Cloud VPS administrators have to routinely perform some tasks like adding or removing users to a project, adding or removing a node to a cluster, etc. We have created a set of cookbooks to automate those tasks, and make sure they are performed in a way that is consistent, repeatable and traceable. Our cookbooks are implemented using the Wikimedia Spicerack library and are maintained in the wmcs-cookbook repository.

Cookbooks vs Runbooks

In WMCS, we use the word cookbook to describe the automation scripts described in this wiki page, implemented using the Spicerack library. We use instead the word Runbooks to describe manual procedures documented in Wikitech. A runbook might require you to run a cookbook to perform one of the runbook steps, or it might not involve any cookbook at all.

How to run a cookbook

You can run a cookbook from a cloudcumin host, or from your laptop.

Running from a cloudcumin host is recommended because:

  • It ensures you are using the tip of the "main" branch
  • Logs will be stored in the cloudcumin host, where other admins can see them if they need to
  • If the cookbook takes a long time to complete, you can use screen or tmux to let the cookbook continue to run even if you lose your internet connection, or if you turn your computer off.

Some reasons why you might want to run a cookbook from your laptop instead:

  • The cookbook does not work correctly when run from a cloudcumin host. If this happens, please open a bug in Phabricator as ideally all cookbooks should work fine from a cloudcumin host.
  • You don't have "sudo" privileges in cloudcumin hosts. This is something we would like to solve in the future, and is tracked in phab:T343330.
  • You are developing a new cookbook and need a quicker feedback than pushing to the repo and fetching.
  • You are developing a patch for spicerack or any dependent library and need to test it/adapt cookbooks with it.

Running a cookbook from a cloudcumin host

We have two cloudcumin hosts, cloudcumin1001.eqiad.wmnet and cloudcumin2001.codfw.wmnet.

They are almost identical, but 1001 is configured to target the eqiad1 OpenStack cluster, and 2001 is configured to target the codfw1dev OpenStack cluster (see /etc/cumin/config.yaml).

Both hosts are Ganeti virtual machines, managed by the Infra Foundation team.

At the moment, some cookbooks are failing when run from cloudcumins. These issues are tracked as subtasks of phab:T343330.

Listing available cookbooks

fnegri@cloudcumin1001:~$ sudo cookbook -l

Showing documentation for a single cookbook

fnegri@cloudcumin1001:~$ sudo cookbook wmcs.cookbook.name -h

Running a cookbook

fnegri@cloudcumin1001:~$ sudo cookbook wmcs.cookbook.name --project PROJECT_NAME --task-id PHAB_TASK_ID

Using Screen/tmux

If you are running a cookbook that you expect will take a long time to complete, you should run it inside a Screen or tmux session, so you can detach from the session and let the cookbook continue to run.

Screen example:

fnegri@cloudcumin1001:~$ screen
fnegri@cloudcumin1001:~$ sudo cookbook ...
# Press "Ctrl+a d" to detach while the cookbook is running
fnegri@cloudcumin1001:~$ screen -x # To reattach
fnegri@cloudcumin1001:~$ exit # To terminate the session

Tmux example:

fnegri@cloudcumin1001:~$ tmux
fnegri@cloudcumin1001:~$ sudo cookbook ...
# Press "Ctrl+b d" to detach while the cookbook is running
fnegri@cloudcumin1001:~$ tmux attach # To reattach
fnegri@cloudcumin1001:~$ exit # To terminate the session

Running a cookbook from your laptop

Local setup

To run locally the cookbooks, you can follow the docs here https://gerrit.wikimedia.org/g/cloud/wmcs-cookbooks

Common runbook options

All WMCS cookbooks have some common options:

  • --task-id to indicate a related Phabricator task. This ID will be included in SAL messages, and SAL messages will be displayed in Phab comments.
  • --project to indicate the related Cloud VPS project. This will make sure that SAL messages are displayed under the correct project in [sal.toolforge.org]. If the option is not included, it defaults to admin.
  • --no-dologmsg to disable sending messages to SAL. Note: this only affects messages sent with the SALLogger class and not the ones sent with sal_logger. For more information, read phab:T343528.

Some additional common options are provided by Spicerack itself and are documented at Spicerack/Cookbooks#Cookbook_Operations.

Cumin

The cloudcumin hosts are also capable of running Cumin commands. Example:

user@cloudcumin1001:~ $ sudo cumin 'O{project:toolsbeta name:toolsbeta-test-k8s-*}' 'apt-get update'


SRE cookbooks and WMCS cookbooks

WMCS admins can also run some of the SRE cookbooks (the ones with a name starting with sre.). At the moment all SRE cookbooks (from the operations/cookbooks repo) are installed in cloudcumin hosts, but some of them will not work because they require specific permissions.

TODO: split "shared" cookbooks into a separate collection that is installed in both production and cloud cumin hosts, while the "sre" cookbooks should only be installed in production cumin hosts. (phab:T343894)

Roadmap

These are some of the enhancements we would like to make to WMCS Cookbooks. We don't have a timeline for those but please leave a comment in the Phab tasks if a feature is particularly useful/relevant to you.

Additional reading