Release Engineering/Runbook/Puppet patches

This is a runbook for testing and staging Puppet patches on servers in the Beta Cluster.

Writing a patch

Testing a patch

When submitting a patch for operations/puppet.git, Jenkins typically reports within a minute or two with the results of syntax, coding convention, and unit tests.

Staging a patch

Before we deploy a patch to production, there's two kinds of tests we apply:

Puppet compiler tests. This asks Puppet to simulate what would happen given all the production realm variables. This identical to what would happen in actual production, if applied to a clean install of the HEAD-1 state on a fresh server and no private overrides.
Beta Cluster testing. This will actually apply the patch to a real server in the Beta Cluster. Catches everything that would happen on a real server. But, it runs with the betacluster realm variables instead of production. So there may be intentional differences.

Puppet compiler

Beta Cluster testing

Once the patch passes Puppet compiler without errors, and the effective changes are what you want them to be, it's time to cherry-pick the puppet patch to the Beta Cluster.

Prerequisites:

Wikimedia Developer account (same as wikitech.wikimedia.org account).
Shell access to Wikimedia Cloud VPS (see Help:Accessing Cloud VPS instances).
In user group "Administrators" for the "Beta Cluster" VPS project in Wikimedia Cloud (existing admins can add you in Horizon).

Steps:

Connect with SSH to the current puppetmaster in Beta Cluster (deployment-puppetmaster04.deployment-prep.eqiad.wmflabs).
Enter sudo mode (sudo -i).
Navigate to /var/lib/git/operations/puppet.
Ensure git status is clean.
From the change page on Gerrit, under "Download", copy the "Cherry Pick" command (using anonymous http).
Run the command on the puppetmaster in the operations/puppet directory.

Now, in a separate terminal (so that you can easily undo or fixup if something goes wrong):

Connect with SSH to the Beta Cluster server you want to apply the change to. For example, if the change affects webperf1001 in production, you'd connect with deployment-webperf11.deployment-prep.eqiad.wmflabs. If the change affects multiple, carefully consider whether it should really be a single commit. If the in-between state is harmless, then go ahead and try to do this mostly concurrently for other hosts as well in a third terminal.
Trigger a Puppet agent run on this host: sudo run-puppet-agent.

If Puppet fails with an error about compilation of the Puppet catalog, that means the Puppet master is now unable to serve any hosts in the Beta Cluster, including others. As such, undo your change on the puppetmaster by running git rebase -i and removing your cherry-pick from the list.

Once any Puppet compilation error or other error has been addressed with an amended version of the patch, confirm that the host is now has the new behaviour your patch intends to create.

Report back to Gerrit and ask SRE to merge it:

Leave a link to the clean Puppet compiler result in a Gerrit comment.
Mention in a comment that it's live on Beta Cluster and working as intended.

If the patch is needed for Beta Cluster to work properly, leave it and add hash tag "beta-cherry-picked" to the Gerrit change. Otherwise, remove the cherry-pick from the puppetmaster after you are done testing the patch.

Meta

Writing a patch

Testing a patch

Staging a patch

Puppet compiler

Beta Cluster testing