How to deploy code

From Wikitech
Jump to navigation Jump to search
Deployments

This article is mainly about deployment of changes to MediaWiki code to the Wikimedia cluster.

Introduction

  • All configuration and utilities are in version control (in the operations/mediawiki-config.git repository)
  • Each version of MediaWiki (e.g. 1.27.0-wmf.1) is in a branch of the mediawiki/core.git repository, with submodules for the extensions, skins, etc. deployed in that version.
  • This mediawiki-config repository is checked out on the deployment host at /srv/mediawiki-staging, with each branch of the MediaWiki codebase and its extensions checked out in /srv/mediawiki-staging/php-1.XXX subdirectories
  • Scap synchronizes that working copy on the deployment host onto /srv/mediawiki on hundreds of servers.

See also

Basic tips

  • Be careful. Breaking the site is surprisingly easy!
    • don't make deployment changes from a development directory, instead use a separate clean git clone just for deployments
    • check git status constantly (or set your shell prompt to show the info).
  • If you're deploying code written by someone else, ask them to be around during deployment so they can troubleshoot if necessary.
  • Make sure you know about anything hairy, such as additional prerequisites (e.g. schema changes) or potential complications when rolling back.
  • Perform operations in the right order. For example, if you're deploying code affecting the databases, you should create or edit SQL tables before deploying a change requiring these tables.
  • Join the IRC channels #wikimedia-operations connect and #wikimedia-tech connect on libera.chat and be available before and after all changes.

Deployment requirements

Access/rights needed to deploy

  • Production shell access (in particular, the deployment group)
  • Access to merge changes in wmf deploy branches (including mediawiki-config) by being added to the wmf-deployments gerrit group (requires Production shell access, including deployment access, first)
    • Ask any existing wmf-deployments group member to do this.
  • Join (and read) the operations mailing list (ops@lists.wikimedia.org)
    • This is because announcements that could impact how and/or when to deploy things are primarily sent there.
  • Join (and read) the #wikimedia-operations connect IRC channel
    • This is where real-time communications about the state of production happen

Other Prerequisites

  • Ask an experienced deployer to tag along once or twice before attempting your own.
  • Remember the tips on this page. See above
  • Some shiny code
  • A window of time to deploy during (that doesn't overlap with anyone else's window). Deployments is the calendar for planning and recording activities in these windows.
  • A clean local git repository of mediawiki/core (use ssh for speed), in which you have set up git review using git review -s
  • Be present on IRC. #wikimedia-tech and #wikimedia-operations are two places where people will come to yell at you if something goes wrong, you should be able to hear them.

Step 1: get the code in the deployment branch

Before you can deploy anything, it has to be in the deployment branch(es). Our deployment branches are named wmf/1.MAJOR-wmf.MINOR (e.g. wmf/1.27.0-wmf.7) where MAJOR and MINOR are numbers that increase over time as new branches are cut. A new branch with an incremented MINOR number is cut at the start of each deployment cycle, and after each tarball release MAJOR is incremented and MINOR is reset to 1. Strict access control is enforced on the deployment branches, but you should have access to them if you are a deployer. On the deployment host, the checkout of each deployment branch is in /srv/mediawiki-staging/php-1.MAJOR-wmf.MINOR .

Note that in most cases the cluster will be running on two deployment branches, with some wikis running version N and some running version N+1. To see what versions the cluster is currently running, on the deployment host execute:

$ scap wikiversions-inuse

To see which wiki is running which version, inspect /srv/mediawiki-staging/wikiversions.json (public mirror), consult the versions tool, or look at Special:Version on a particular wiki.

If your code or change needs to go live to all wikis, you will need to change all deployment branches that are in use. An easy way to see all of the versions currently in use is to log onto the deployment host and run mwversionsinuse from the command line. You can also run mwversionsinuse --withdb to see a wiki that is running each version.

NOTE: All examples on this page assume there is a single deployment branch called wmf/1.27.0-wmf.1 checked out on the cluster in php-1.27.0-wmf.1. You need to change this to a current branch name when you run the commands. If you are updating multiple deployment branches, simply repeat the steps for each deployment branch separately.

NOTE: Also, all git examples assume you have a clean working copy, that is, you have no uncommitted changes. To verify this, run git status, it should say nothing added to commit (working directory clean) or nothing added to commit but untracked files present . If you are doing git-fu with a dirty working copy, there is a high probability you will screw things up, so don't do that unless you know what you're doing.

Case 1a: core changes

You are deploying changes to MediaWiki core. This should be rare because core is updated from master every week, but in some cases it might be necessary. For core changes, you will simply need to push or submit changes to the wmf/1.27.0-wmf.1 branch in core. The most common use case is to take a commit that is already in the repository somewhere (usually in master, sometimes a commit that's still pending review) and cherry-pick it into the deployment branch, so only that case is documented below.

To cherry-pick a commit into the deployment branch, do the following things locally:

$ cd mediawiki/core      # go to your checkout of mediawiki/core.git

# Set up a local wmf/1.27.0-wmf.1 branch that tracks the remote
# You only need to do this once for each branch; if you've already got a wmf/1.27.0-wmf.1 branch, you can skip this step
# If you get an error, try 'git remote update' or 'git fetch' first
$ git branch --track wmf/1.27.0-wmf.1 origin/wmf/1.27.0-wmf.1
Branch wmf/1.27.0-wmf.1 set up to track remote branch wmf/1.27.0-wmf.1 from origin.

# Switch to the wmf/1.27.0-wmf.1 branch and update it from the remote
$ git checkout wmf/1.27.0-wmf.1
$ git pull
$ git submodule update --init --recursive

# Cherry-pick a commit from master, identified by its patch set hash
$ git cherry-pick ffb1b38ad83927606c539ac941e9f3eb2653a840

# If there are conflicts, this is how you fix them:
# - run 'git status' to see which files are conflicted
# - start fixing conflicted files using your favorite editor
# - use 'git add filename' to tell git you've fixed the conflicts in a file
# - once all conflicts are resolved, commit the result using 'git commit'

# Submit your cherry-pick commit for review
$ git review
# If you don't want or need this to be reviewed, you can +2 your own
# commit if you are in the wmf-deployment group

Commit message for cherry picks

If a cherry-pick does not result in a merge conflict, then the commit message is automatically amended to reference the original patch. A line will be added in the commit message, after the Change-Id line, with the following content only: (cherry picked from ...) where ... will be the commit hash of the original commit that is being cherry picked. For example, this original commit has the hash 93758c4 and in this cherry-pick the last line of the commit message references the original commit by its hash.

If a cherry-pick does result in a merge conflict, then you will have to resolve the conflict using git commands (typically, by editing the conflicting file(s), followed by git add filename and git cherry-pick --continue). However, this process may not result in the addition of the (cherry picked from ...) to the commit message. Without that statement in the commit message, when you try to submit your cherry-picked patch to Gerrit for review using git review you will get an error message stating ! [remoted rejected] ...)[1] and to resolve that you must manually add the (cherry picked from ...) piece to the commit message before retrying git review. As an example, this cherry pick of the same example from above was in a branch that caused a merge conflict, so the user first resolve the conflict, then updated the commit message, and finally submitted it to Gerrit for review.

Case 1b: extension/skin/vendor changes

You are deploying changes to an extension, but you don't just want to deploy master. Instead, you want to deploy the code that is currently deployed, plus your change. (If you do actually want to deploy master, see How to deploy current master branch of an extension.)

Starting with 1.27.0-wmf.1, all deployed extensions have automatically-created wmf/1.xx.0-wmf.yy branches (there are also branches with a different naming format going back to 1.23wmf10). Each of these extension branches should be in sync with the corresponding submodule pointer in the corresponding core branch. To deploy an extension update, you make changes to this branch, and Gerrit will update the submodule pointer in core.

Updating the deployment branch

Just like in core, the most common use case for updating a deployment branch is to cherry-pick changes from master. You can do this using the Cherry Pick To button in Gerrit, or from the command line as follows:

$ cd mediawiki/extensions/MyCoolExtension      # go to your extension checkout

# Set up a local wmf/1.27.0-wmf.1 branch that tracks the remote
# You only need to do this once for each branch; if you've already got a wmf/1.27.0-wmf.1 branch, you can skip this step
# If you get an error, try 'git remote update' or 'git fetch' first
$ git branch --track wmf/1.27.0-wmf.1 origin/wmf/1.27.0-wmf.1
Branch wmf/1.27.0-wmf.1 set up to track remote branch wmf/1.27.0-wmf.1 from origin.

# Switch to the wmf/1.27.0-wmf.1 branch and update it from the remote
$ git checkout wmf/1.27.0-wmf.1
$ git pull

# Cherry-pick a commit from master, identified by its patch set hash
$ git cherry-pick 176ffdd3b71e463d3ebaa881a6e77b82acba635d
# If there are conflicts, this is how you fix them:
# run 'git status' to see which files are conflicted
# start fixing conflicted files
# use 'git add filename' to tell git you've fixed the conflicts in a file
# once all conflicts are resolved, commit the result using 'git commit'

# Submit your commit for review
# Note: 'wmf/1.27.0-wmf.1' is the name of the remote branch you are pushing to, not the name of your local tracking
# branch (although in this example they are the same).
$ git review wmf/1.27.0-wmf.1
# If you don't want or need this to be reviewed, you can +2 your own
# commit if you are in the wmf-deployment group and will be deploying this immediately

You can repeat this process multiple times to commit or cherry-pick multiple changes. After you have submitted the updates to gerrit, you can either have them deployed in a Backport windows by adding the updates to the deployment schedule or deploy them yourself by following the other steps below.

Updating the submodule

This is no longer necessary, a submodule update commit is automatically created and merged when you merge a commit to some extension's wmf/* branch, except when commit is to the VisualEditor extension or you are updating a submodule that is not on a branch that has the same name as the core branch it is included in. In those cases, see /Core submodule update.

Case 1c: new submodule (extension, skin, etc.)

You already deployed your new extension to the beta cluster (read instructions) and tested it for weeks, right? Otherwise, STOP and talk to experts. All extensions and skins (in master branch) are automatically available on the beta cluster.

You are adding an entirely new extension that wasn't deployed before, and you're deploying from master (if you need to deploy something other than the master state, that's possible, but it generally shouldn't be done for an initial deployment; master should just be clean and deployable). The easiest way to do this is to update config.json in the release tool (see #Add new extension to extension-list and release tools) and wait two weeks so the two latest deployment branches pick up the change. If you can't do that, or the submodule uses some nonstandard setup, see /Adding_a_new_submodule.

Beta feature

If your extension creates a new beta feature, please refer to this checklist before deploying it.

Step 2: get the code on the deployment host

$ ssh deployment.eqiad.wmnet

Once the code is merged in the deployment branch in Gerrit, we pull it down on the deployment host. Avoid plain git pull to avoid unexpected changes (see #Problem: undeployed code).

deployment-host:~$ cd /srv/mediawiki-staging/php-X
# Make sure there are no uncommitted changes. Submodule changes are OK (they usually mean security patches)
deployment-host:/srv/mediawiki-staging/php-X/$ git status
# Fetch remote git commits without updating working directory yet
deployment-host:/srv/mediawiki-staging/php-X/$ git fetch
# View local log
deployment-host:/srv/mediawiki-staging/php-X/$ git log -n25 --oneline --decorate --graph
# View remote log
deployment-host:/srv/mediawiki-staging/php-X/$ git log HEAD..origin/wmf/X

View the local log to ensure there are no local patches that shouldn't be there (e.g. security patches). You may want to alias that command to something convenient like git lg (see mw:Git/aliases).

The remote log shows the commits that would be added to the working copy when we rebase the local branch (e.g. "git pull"). If there are other changes besides yours, go yell at the culprit. Otherwise you're OK to pull your changes into the deployment directory. You must always rebase in case there are security patches locally committed on the deployment host.

/srv/mediawiki-staging/php-X/$ git rebase origin/wmf/X

If you are deploying a change for an extension, you can now simply update the extension submodule with:

deployment-host:/srv/mediawiki-staging/php-1.27.0-wmf.1/$ git submodule update --init --recursive extensions/MyCoolExtension

you should see the commit ID from your work in your local deployment

Pre-deployment testing in production

Part of the deployment process (for e.g. backports) is to first deploy changes to an mwdebug host. HTTP requests for any production site can be routed to this server. See WikimediaDebug#Staging changes for how to deploy and test changes there.

Step 3: configuration and other prep work

In certain cases, you'll have to change how Wikimedia sites are configured. We generally have the same codebase everywhere, but with different configurations for each wiki.

Maybe you are just changing one configuration variable. Or, perhaps you are adding a brand-new extension, or activating an extension on some wiki where it's never been before. For all of these cases, and more, you'll have to make the changes to the config files to get the desired results.

Configuration files live in their own revision-controlled repository operations/mediawiki-config. The big difference is the configuration files are not tied to releases — there is no 1.27.0-wmf.1 branch for configuration. This means you cannot commit a configuration change and have it "roll out" across wikis on the release train, it has to work with all branches in use. In general if you're not in operations you should make changes to a local copy of this repository (as explained in How to do a configuration change#In your own repo via gerrit), submit them for gerrit review with a -1 comment to avoid early deployment, then during your deployment window (often during backport windows) +2 them and get them on the deployment host.

Everything that follows is just a convenient way to make config changes.

If you're deploying an extension or feature that can be switched off, it's usually best to leave it switched off while you deploy and carefully switch it on after that using a simple configuration change (this is called a dark launch). Even if you do this, you should build any configuration infrastructure (e.g. $wmg variable, adding entry in InitialiseSettings with default false) at this time so all you'll have to do later is flip a switch.

For specific preparations, see the sections below as well as Creating new tables, How to do a schema change, and How to do a configuration change. Best to perform schema changes before making config changes.

Add a configuration switch for an extension

In /srv/mediawiki-staging/wmf-config/CommonSettings.php, add:

if ( $wmgEnableMyExtension ) {
  require_once( "$IP/extensions/MyExtension/MyExtension.php" );
  // Set config vars if needed

  // If you want to export config vars through InitialiseSettings.php, you need to set $wmgMyExtensionThingy there and do
  #$wgMyExtensionThingy = $wmgMyExtensionThingy;
}

In /srv/mediawiki-staging/wmf-config/InitialiseSettings.php, add something like:

'wmgEnableMyExtension' => array(
  'default' => false,
  'eswikibooks' => true,
  // etc.
),
// If needed, set $wmgMyExtensionWhatever vars here too

If your extension requires a large-ish amount of configuration, consider putting it in a separate file instead. Currently, AbuseFilter, LiquidThreads and FlaggedRevs do this.

For more documentation on these files and their formats, see Configuration files.

Add new extension to extension-list and release tools

Before enabling a new extension, you need to make sure the extension code is present on the servers, i.e. it is a submodule in the current and previous deployment branch. Normally you can do this by adding the extension to the wmf_core list in make-release/settings.yaml in repos/releng/release, and then waiting for two new branches to be created (i.e. you have to do this two weeks before deployment). When in an exceptional hurry, you can also just create the submodule by hand, as described in /Adding_a_new_submodule (you should of course still update settings.yaml), but you must clear this with Release Engineering.

When adding a new extension, you need to add it to the extension-list file in mediawiki-config. This ensures that its i18n messages get picked up. For more information about this setup, see Configuration files. You must not do this until the code is in both the current and former branch, otherwise you will break production deployments. After this is done, you can add your extension to the Beta Cluster.

Disabling an extension

Conversely, when you disable an extension, remove it from wmf-config/extension-list and make-wmf-branch/config.json.

Reedy commented:

If you’re wanting to disable an extension on the cluster, please DO NOT remove it from current deployment branches. Git gets upset and breaks things like git submodule update.
Per Stackoverflow, there isn’t a “git submodule rm foo”, and it’s just a pain for other people to have to clean up their working copies.
So in future, if you’re wanting to disable and remove an extension from production, it’s fine to do so in InitialiseSettings.php/CommonSettings.php, and even remove it from extension-list, but do not remove it from the core deployment branch. Instead, remove it from make-release/settings.yaml in repos/releng/release, and as long as the commit is merged before the deployment branch is created, it won’t be branched for further usage.

When you disable a default extension, make sure it's gone from addWiki.php so that it doesn't cause issues next time someone creates a database.

Getting configuration changes on the deployment host

If you made configuration changes to your local mediawiki-config repository, then once they are merged in gerrit you need to get them on the deployment host. This is similar to step 2, but there's no deployment branch. It's covered in How to do a configuration change#In your own repo via gerrit.

Step 4: synchronize the changes to the cluster

Small changes: sync individual files or directories

If your change only touches one or a few files or directories and does not change i18n messages, you can sync the files/dirs individually with scap sync-file, rather than having to run scap sync-world. This is preferable because a scap sync-world run always shakes the cluster up a bit and takes longer to complete, while a scap sync-file run is very lightweight. However, scap sync-file is only capable of synchronizing files within directories that already exist on the cluster, so it won't work with newly added directories. Also, scap sync-file only synchronizes one file or directory at a time, and creates a log entry each time. Using it repetitively (e.g. with a for loop) to sync multiple files is fine, as long as there's not too many of them (say not more than ~5).

To sync a single file or a directory, run scap sync-file [path to file or directory] [summary]. The IRC logmsgbot uses the summary to log your sync in #wikimedia-operations, from where it'll go to the server admin log and the identi.ca and Twitter feeds.

  • PITFALL: The path argument has to be relative to the common directory, not to the current directory. To preserve your sanity (and tab-completion functionality), always cd to /srv/mediawiki-staging before running scap sync-file.
  • PITFALL: If the summary argument contains spaces, you'll have to put it in quotes or only the first word is used. If your summary contains a $, you'll either have to escape it or put your summary in single quotes, to prevent bash's variable expansion from messing it up
  • PITFALL: scap sync-file does not work correctly for syncing i18n changes. They will appear to work, but the i18n changes won't take effect. To sync i18n changes, you must use scap sync-world.
  • PITFALL: If you change a file that's accessed via a symlink you also need to touch -h the symlink and deploy the symlink or your changes will only show up from cli and not from web (T126306).

When syncing multiple files, they are not synced at the exact same moment, which might result in transient errors. Sometimes it makes sense to do multiple syncs to avoid that. The typical example of this is adding a new configuration variable, where you should sync InitializeSettings.php first and CommonSettings.php second.

When running scap sync-file, you'll occasionally see errors from a broken server (sample output with multiple broken servers below). If you see unexpected output, ask in #wikimedia-operations. scap sync-file usually completes within a few seconds, but in cases where is has trouble connecting to hosts, it may hang for 1 or 2 minutes.

catrope@deployment-host:/srv/mediawiki-staging$ scap sync-file php-1.27.0-wmf.1/api.php 'API security fix'
No syntax errors detected in /srv/mediawiki-staging/php-1.27.0-wmf.1/api.php
copying to apaches
mw60: ssh: connect to host mw60 port 22: Connection timed out
srv189: ssh: connect to host srv189 port 22: Connection timed out
srv174: ssh: connect to host srv174 port 22: Connection timed out
srv266: ssh: connect to host srv266 port 22: Connection timed out

More complex changes: sync everything

If you're adding directories, changing many files, changing i18n messages, or otherwise have a reason why scap sync-file wouldn't work or would be impractical, you'll have to run scap sync-world, which syncs everything and rebuilds caches. scap sync-world logs to the server admin log, and reports in #wikimedia-operations (without !log) when it finishes.

awjrichards@deployment-host:/srv/mediawiki-staging$ scap sync-world 'Log message here'
Checking syntax...
Copying to deployment-host...Done.
Updating serialized data files...
Warning: messages are no longer serialized by this makefile.
Updating ExtensionMessages-1.26.php...
Updating ExtensionMessages-1.27.0-wmf.1.php...
Updating LocalisationCache for 1.26...
Updating LocalisationCache for 1.27.0-wmf.1...
...snip...

Running scap sync-world takes at least 6 or 7 minutes (but potentially upwards of 45 minutes depending how much i18n changed and on a new branch); the LocalisationCache rebuilds (usually two of them, one for each deployed wmf version) cause most of this delay.

Add / remove a dblist

scap sync-file dblists/ can be used to sync the addition/removal of a dblist. Referencing a non-existent dblist in the wiki tags in CommonSettings.php; will result in an error; make sure that the dblist is synced first when adding / last when removing.

Changing files in /static

These may have to be purged from the CDN. See also Backport windows/Deployers#Purging.

Test and monitor your live code

Is it doing what you expected? Unfortunately, testwiki is not like a real wiki: extensions respond to a trigger hooks, CentralNotice or Common.js might effect the browser environment, etc. No one environment can simulate all the wikis that we operate, so test your change afterwards on a live wiki to confirm. test2.wikipedia.org is a test wiki that operates as a member of the cluster. Keep in mind also that different projects are configured differently, have different extensions enabled, use different alphabets, etc; it can be worthwhile to double check your changes on multiple projects, particularly to ensure that character encoding and right-to-left formatting is behaving as expected. Also remember that the caching infrastructure on the cluster is likely different than your local or testing environments; keep the different production caching layers/strategies in mind as you're assessing your changes in production.

We use open-source tools such as Grafana, Prometheus, and Icinga to monitor our production cluster; you should review their output post-deploy for unexpected spikes.

Use Logstash to query, aggregate and visualise runtime exceptions:

All PHP error logs are routed to the server mwlog1002 in /a/mw-log. Exceptions and fatals happen constantly, so you need to get a sense of changes over time. For example, to see trends in "Maximum execution time exceeded" errors this month, you might run

mwlog1002$ cd /srv/mw-log/archive
mwlog1002$ zgrep -c 'Maximum execution time' fatal.log-201304*

You can also run the logspam-watch script from mwlog1002 to watch for spikes in errors or warnings.

For a summary of all of the logs in use, see Logs.

Don't leave town

Even if your deploy appears to be working, it's important to be reachable in the hours immediately following your deploy. Ideally, stay online and in IRC channels like #wikimedia-tech and #wikimedia-operations for a couple of hours. Update Deployments with what happened in your deployment window.

If you must go offline, let people know how to reach you (and keep your mobile phone or other communications device on your person). You can use /away messages on IRC, or perhaps send a short email to the ops list.

If you are on Wikimedia staff, now might be a great time to check if your contact info is up to date. If you aren't on staff, ask a staffer to add your contact info to that page, under "Important volunteers".

A note on JavaScript and CSS

Since we have ResourceLoader, there is no need to e.g manually do a "build" (to re-minify/re-cache static files). ResourceLoader does this automatically on-demand. Depending on when the timestamp cache gets a cache-miss, it can take up to five minutes for that to occur.

ResourceLoader and l10n messages

In the case of a localization update that affects JavaScript and is loaded via the ResourceLoader the live string may remain unchanged after running scap.

  • Check to see if the message is present at /wiki/MediaWiki:[message-string]/en, i.e., https://en.wikipedia.org/wiki/MediaWiki:Popups-send-feedback/en
  • If it is correct there, but absent/outdated in a JS response, then the following can be used to force ResourceLoader to recache a message:
    you@tin:~$ mwscript eval.php enwiki
    > $rl = new ResourceLoader;
    > $mbs = $rl->getMessageBlobStore();
    > $mbs->updateMessage('popups-send-feedback');
    

Security patches

The last step in fixing security issues in MediaWiki before releasing the fixes publicly is deploying the patches on the cluster. When this happens:

  • All patches / fixes will be committed changes in the local repo
  • An email will be sent to the Ops list to notify everyone that the patches are there, and where the raw patches live on the deployment host in case they need to be modified or reapplied

Please do not revert these. If you are unsure if local, committed changes are security related, please ask someone in platform privately. Please do not discuss the patches publicly (including IRC). In most cases the commit message and knowing the files the commit affected would be enough for a malicious person to figure out the vulnerability.

When there are security patches in deployment, please rebase them on top of any changes you are deploying. This makes it easier to see what's been deployed (no more "Merge branch...." commits), and makes the fact that security patches are live immediately clear.

The only times that these should interfere with your deployment is if the changes conflict. In this case, please contact someone from the platform team to work out the best way to handle the situation.

Creating a Security Patch

See also previous documentation How to perform security fixes

Before

  1. Create a Phabricator security report if one does not already exist.
  2. Create and test your patch locally (preferably on a branch); then commit locally. Do not commit the patch to Gerrit at all. Drafts are not secure. Prefix your commit message with SECURITY: (not [SECURITY]).
  3. Create the patch by running git format-patch HEAD^ --stdout > Txxxxx.patch (where "Txxxxx" is the task id) which will produce a patch file in your working directory.
  4. Upload the patch by attaching it to the Phabricator task. Coordinate with other developers to review your patch.
Guidelines for creating patches

As security changes initially bypass CI, they should be written and reviewed especially carefully. Run any relevant tests (including new ones you add) locally, and preferably also run PHPCS against all modified files – otherwise, the patch will take longer to get merged once it is published on Gerrit.

Try to keep security patches small: they will have to be applied many times – rebased onto the master branch every week until the next security release, and then backported to supported release branches on Gerrit. Avoid refactoring affected files beyond what’s currently necessary; if it keeps the patch smaller, it may be worth temporarily skipping best practices such as Dependency Injection. (Such things can always be cleaned up after the patch has been published.)

Avoid touching i18n files in security changes if at all possible. A security change that touches i18n files will make the patch deployment very slow (~40 minutes), as it causes an i18n rebuild. If you are adding new messages, consider temporarily hard-coding them – after all, a security patch message isn’t going to be translated on translatewiki.net anyways (until the patch has been published).

Deployment: Manual

  1. Apply the patch in the current/affected wmf branches on the deployment host:
    • Check that the patch applies with git apply --check /path/to/patchfile
    • Apply patch with git am < /path/to/patchfile
    Note that some config files are made public via noc.wikimedia.org; don't put anything non-public in those.
  2. Deploy as usual with scap but use --no-log-message to prevent the automatic logging from revealing too much information about which file or component. Log the deployment manually by typing "!log Deployed patch for Txxxxx" in the #wikimedia-operations connect IRC channel.
  3. Ensure the security patch will be applied to Kubernetes as well as future deployment branches:
    • The .patch file should be stored on the deployment host under /srv/patches/<branch>/, for whichever branches have the security patch applied.
      • Patches to MediaWiki core should be stored in /srv/patches/<branch>/core/
      • Patches to extensions should be stored under /srv/patches/<branch>/extensions/ExtensionName.
        • Filenames should be prefixed with a 2-digit number to indicate the order in which patches should be applied in the repo. Your file should be prefixed with '01-' if it is the first patch in the directory, or the next highest number if other patches already exist.
        • Files should be git committed to the local repository.

Deployment: via script

Get the patch to your home directory in deployment.eqiad.wmnet:

scp /path/to/patch/TXXXXXX.patch deployment.eqiad.wmnet:

Download the security deployment script:

wget https://gitlab.wikimedia.org/repos/releng/release/-/raw/main/deploy_security.py

Then run it like this:

python3 deploy_security.py /path/to/patch/T1234.patch REPO 
  • By default, this does a dry-run. Do this first, and make sure the output makes sense. When you're ready to run it for real, add the --run parameter.
    • Sometimes dry runs error because it can't do the work, for example, it needs to create a directory, but as a dry run it won't, so the next step errors. That is fine.
  • REPO is either "core" for mediawiki core or "extensions/EXTENSIONNAME" (e.g. "extensions/Wikibase") for extensions (similar for skins)
  • You can run it on one branch only if you want. Use "--branch". For example:
python3 deploy_security.py --branch 1.38.0-wmf.12 --run /path/to/patch/T1234.patch core 
  • When you run it, sometimes it might look like it's stuck. Don't worry, it's doing stuff. Once done, it will show you.

After

    • Add a note to the relevant ticket saying that you deployed the patch
    • If security team isn't already aware of what's going on, be sure to inform them you deployed the patch to prevent duplicate effort.
    • The patches are directly copied off the active deployment server with no delay. The listed patches are in the build's output, so you can verify your patch is included.
    • You may have to ask a releng/SRE person to check that the build worked correctly if you don’t have the necessary access yourself.
  1. Work with the Security Team to make sure the vulnerability is resolved and that your patch makes it into the next security release.
  2. Perform any necessary backports within Gerrit to supported release branches (assuming patch applies to previous versions). As an extra step, once the branch is backported to master, it can be removed from /srv/patches/ as it will no longer apply to future production release branches. This will make Release Engineering's life a little easier.
  3. Request a CVE, if appropriate. Note the CVE ID on the task and within any relevant security release tasks.

Problem: Submodule security patches not committed

Sometimes you may find a security patch for an extension that has not been committed:

[you@deploy1002 php-1.999.0-wmf.1 (wmf/1.999.0-wmf.1 * u)]$ git status
On branch wmf/1.999.0-wmf.1
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   extensions/InsecurityExtension (new commits)

Submodules changed but not updated:

* extensions/InsecurityExtension fffffff...eeeeeee (1):
  > SECURITY: Make the Insecurity extension secure

no changes added to commit (use "git add" and/or "git commit -a")

This is normal! The reason for this state is that subsequent updates to the submodules require human intervention to fix merge/rebase conflicts.

Security mitigations

Sometimes it is necessary to deploy application-layer security mitigations and secrets to Wikimedia production environments. At the MediaWiki layer, this is typically done via various configuration files that live within /private under the mediawiki-staging directory on the deployment hosts. The most common injection point for this code is a file named PrivateSettings.php. If one finds themselves needing to edit this or similar files under the /private directory, the following best practices should be used:

  1. When working on a security mitigation to be deployed via PrivateSettings.php or similar files, only discuss and review relevant code within private channels, such as a secure Phabricator task, secure IRC channel, etc.
  2. Do not download PrivateSettings.php or similar files to any non-Wikimedia-deployment environment for testing. These files contain sensitive production code and secrets and should not ever be copied from secure deployment environments. Instead, test proposed security mitigations or secrets via LocalSettings.php within a local MediaWiki-docker environment. Please do not attempt to test these changes within patchdemo, gerrit CI or other public environments.
  3. One should carefully review their code prior to committing and deploying it. Since there are no automated tests for anything under /private, one must be extra careful about changes made to files under that environment. At minimum, one should double-check their code and run php -1 against the files one has edited.
  4. If one has Wikimedia deployment rights and they are happy with their code changes, they should commit their changes to the local git repository. A minimal commit message and Bug: TXXXXXX reference should be included with the commit. The git log can be reviewed for any additional guidance. If one does not have deployment rights, they will need to seek assistance from Release Engineering, SRE or the Security-Team.
  5. If one has Wikimedia deployment rights, they can check in with Release Engineering and/or irc:#wikimedia-operations to confirm whether or not it is ok to deploy their changes via scap. If one does not have deployment rights, they should coordinate such a deployment in advance with Release Engineering, SRE or the Security-Team. Deployments via scap should use the --no-log-message option flag (e.g., scap sync-file --no-log-message private/<file>.php '<message>') and include minimal, benign log messages.
  6. Once deployed, monitor mediawiki-errors within logstash to ensure no unintended errors occur via the deployed changes.

Problem: undeployed code

If you need to deploy something but you find undeployed changes or local changes that are not security fixes[2], revert all of them and !log your revert, then proceed to your deploy.

If it's uncommitted live-hacks (as in, not even in gerrit), the polite thing is to stash them, so you don't erase someone's work forever.

Background

Roan commented in October 2012:

The problem is that sometimes, people merge things into a deployment branch and then don't deploy them. This is a terrible habit that should be squashed. If you merge something into a wmf branch, you have a responsibility to either deploy it yourself very soon, make sure that someone deploys it very soon, or revert it if you can't make those things happen. The deployment branch should reflect the current state of the cluster, except during those brief moments where something is about to be deployed or in the process of being deployed.

If you are concerned about other commits being pulled in (which should never happen, unless someone has been naughty), then in Step 2 you can run git fetch followed by git log HEAD..@{upstream}. This will list the commits that would be pulled by 'git pull'. In that list, it should be easy to spot commits that aren't yours and identify the person to yell at. If you run git pull and it ends up pulling things you didn't expect, you can use git log to examine what happened, and git reflog (or the output of git pull) to find the hash of the commit you were at before pulling, so you can roll back to it if needed. But if this happens to you, feel free to start yelling at people and/or asking for help.

Problem: file permissions errors

If you encounter permission denied errors (errno 13) on local disk files under /srv/mediawiki-staging when attempting to cut the branch, clean up old branches, running syncs, etc, you can run /usr/local/sbin/fix-staging-perms.

Emergency runbook for production config changes

This is the procedure for making an emergency MediaWiki production config change; it should not be necessary. Please liaise in security IRC if you think this is needed.

  1. Shell to the production deployment server: ssh deployment.eqiad.wmnet (always redirects to the right cluster despite saying 'eqiad')
  2. Move to the deployment config directory: cd /srv/mediawiki-staging/.
  3. Find the config variable you wish to edit; this will generally be in wmf-config/InitialiseSettings.php if it's a per-wiki flag, or wmf-config/CommonSettings.php if it's a global flag.
  4. Double-check the changes you have made with git diff ot make sure they're sensible.
  5. Sync your uncommitted change to production via scap sync-file wmf-config/filename 'Summary of change'.
  6. Commit your change locally and push to gerrit for posterity: git add wmf-config/filename && git commit ** git push origin HEAD:refs/for/master.

Footnotes

  1. Specifically, Gerrit doesn't allow a second patch with an identical change ID and identical commit message; however, once you change the commit message for the secondary patch, you can have as many secondary patches with that commit message, i.e. so long as the commit message is different from the primary patch, it doesn't matter if the cherry picks for several branches have the same commit message as each other. In the example provided here, the two cherry picks (for REL1_36 and REL1_37 branches) have commit messages that are identical to each other, but different from the original commit and Gerrit accepts this.
    Also note that you can circumvent this by changing the commit message in any other way, including by removing the change ID so that a new one is assigned by Gerrit; however, this will break the link between multiple related commits on Gerrit, so it is best to use the "standard" commit message addendum of (cherry picked from ...) and preserve the original change ID.
  2. How do I know they are not security fixes vs uncommited live-hacks, etc.? The git commit message will begin "SECURITY" for security fixes.