SWAT deploys/Deployers

From Wikitech
Jump to: navigation, search

This document is intended to provide detailed instructions for SWAT deployers. Hopefully, this document will prove useful to new SWAT deployers as well as provide a place for more experienced deployers to take notes on any tips and tricks they have discovered in the course of doing deployments.

General Advice

  • Claim the SWAT window early to avoid confusion.
When jouncebot pings deployers in #wikimedia-operations connect, if you want to run that SWAT, say so I can SWAT today!
  • Try to think out loud and be explicit.
If you are nervous about deploying a particular patch, mention it to the patch owner. It's better to have a conversation than to quietly fret over patches
  • Be prepared.
Open all your SSH connections and error logs before you start deploying code.

SSH Connections and Error Logs

When running SWAT, it is helpful to watch error logs as you deploy so that you can be sure nothing you have just deployed is broken. Also, there are several machines on which you may need to run commands depending on the nature of the SWAT; it's good to open all SSH connections before you have to think about them.

Browser Tabs

SSH Connections/Commands

you@laptop$ ssh mwlog1001.eqiad.wmnet                    
INFO:ssh:SSH_AUTH_SOCK=/run/user/1000/e5bc46d0d0eeda42c05e6ad845544b58.sock
Linux mwlog1001 4.4.0-3-amd64 #1 SMP Debian 4.4.2-3+wmf8 (2016-12-22) x86_64
Debian GNU/Linux 8.7 (jessie)
mwlog1001 is a MediaWiki log collector (role::logging:mediawiki::udp2log)
mwlog1001 is a udp2log data collection server (udp2log::logger)
The last Puppet run was at Wed Mar  8 18:22:21 UTC 2017 (4 minutes ago). 
Debian GNU/Linux 8 auto-installed on Thu Dec 15 23:24:40 UTC 2016.
you@mwlog1001:~$ fatalmonitor

Merging and Fetching Patches

For SWAT, you will be merging code for the mediawiki/config, mediawiki/core, or any of the deployed MediaWiki extensions.

Merging

When +2ing patches, it's often helpful to have the Zuul Dashboard open to ensure that (a) zuul is picking up your changes and (b) to see how long (approximately) it will take a patch to merge.

It's a good practice to put SWAT as the comment when you +2 before you click Publish Comments to ensure that there is a record of why you merged this code.

Fetching Patches

After code has merged, you need to fetch it down to deployment.eqiad.wmnet.

Make sure that the code you fetch down to deployment.eqiad.wmnet is the code you expected to fetch down.

Use git log -p HEAD..@{u} after you git fetch to check that the patch(es) you fetched down were the same ones the you +2'd. If they aren't, poke the person that wrote the patch in #wikimedia-operations connect to figure out what to do with the fetched code. It's always better to ask than to do something silently and unilaterally.

you@laptop$ ssh deployment.eqiad.wmnet
Linux tin 3.13.0-91-generic #138-Ubuntu SMP Fri Jun 24 17:00:34 UTC 2016 x86_64
Ubuntu 14.04 LTS
The last Puppet run was at Tue Jul 12 14:06:19 UTC 2016 (16 minutes ago). 
Ubuntu 14.04 LTS auto-installed on Tue Feb 2 08:05:43 UTC 2016.
Last login: Mon Jul 11 20:34:29 2016 from bast4001.wikimedia.org
you@tin:~ $ cd /srv/mediawiki-staging
you@tin:/srv/mediawiki-staging $ git fetch
From https://gerrit.wikimedia.org/r/p/operations/mediawiki-config
   e08446d..08a2ec5  master     -> origin/master
you@tin:/srv/mediawiki-staging $ git log -p HEAD..@{u}
commit 08a2ec5a9fb09a25dc3bbaf228d42d48b27444bd
Author: Martin Urbanec <martin@urbanec.cz>
Date:   Tue Jul 12 13:02:44 2016 +0200

    Logo update for trwikimedia
    
    Bug: T140015
    Change-Id: I767f4949c44775db9639cb729b9c7f9eb2ce2b0c

diff --git a/static/images/project-logos/trwikimedia.png b/static/images/project-logos/trwikimedia.png
index f5de55a..070bfda 100644
Binary files a/static/images/project-logos/trwikimedia.png and b/static/images/project-logos/trwikimedia.png differ
you@tin:/srv/mediawiki-staging $ git rebase
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/origin/master.

if you are deploying a change on extensions don't forget to run git submodule update too.

Deploying Changes

Test Canary

After changes have been fetched and otherwise git-wrangled on deployment.eqiad.wmnet, changes can be fetched down to mwdebug1002.eqiad.wmnet and tested via the X-Wikimedia-Debug header.

you@laptop:~$ ssh mwdebug1002.eqiad.wmnet
Linux mwdebug1002 4.4.0-3-amd64 #1 SMP Debian 4.4.2-3+wmf7 (2016-11-04) x86_64
Debian GNU/Linux 8.6 (jessie)                                       
mwdebug1002 is role::mediawiki::appserver                      
The last Puppet run was at Tue Nov 22 18:55:38 UTC 2016 (10 minutes ago). 
Debian GNU/Linux 8 auto-installed on Tue Nov 22 12:50:32 UTC 2016.
Last login: Thu Jul  7 15:04:56 2016 from bast4001.wikimedia.org
you@mwdebug1002:~$ scap pull
16:47:12 Copying to mwdebug1002.eqiad.wmnet from deployment.eqiad.wmnet
16:47:12 Started rsync common
16:47:19 Finished rsync common (duration: 00m 07s)
you@mwdebug1002:~$

After changed have been fetched, ask patch-owner to test changes on mwdebug1002.

Full deployment

After a change has been tested on mwdebug1002 it can be deployed to all machines. To deploy the code you will run: scap sync-file <file> [message for SAL]. The code path passed to scap sync-file should be relative to /srv/mediawiki-staging.

The message you type after the file or directory name to be synced will appear in the Server Admin Log — wikitext is legal and can be useful. Copy/pasting the wikitext for that SWAT item from the Deployments calendar is easy. If the Gerrit change has an associated Phabricator task, mention the task ID in the message as appropriate. This will trigger Stashbot to reply back on tasks and indicate that the associated change was synced.

you@tin:/srv/mediawiki-staging $ scap sync-file static/images/project-logos/trwikimedia.png 'SWAT: [[gerrit:298441|Logo update for trwikimedia (T140015)]]'                                                          
           ___ ____
         ⎛   ⎛ ,----
          \  //==--'
     _//|,.·//==--'    ____________________________
    _OO≣=-  ︶ ᴹw ⎞_§ ______  ___\ ___\ ,\__ \/ __ \
   (∞)_, )  (     |  ______/__  \/ /__ / /_/ / /_/ /
     ¨--¨|| |- (  / ______\____/ \___/ \__^_/  .__/
         ««_/  «_/ jgs/bd808                /_/

15:08:50 Started sync-masters
sync-masters: 100% (ok: 1; fail: 0; left: 0)                                    
15:09:00 Finished sync-masters (duration: 00m 09s)
15:09:00 Started sync-proxies
sync-proxies: 100% (ok: 9; fail: 0; left: 0)                                    
15:09:03 Finished sync-proxies (duration: 00m 02s)
15:09:03 Started sync-apaches
sync-apaches: 100% (ok: 383; fail: 0; left: 0)                                  
15:09:24 Finished sync-apaches (duration: 00m 21s)
15:09:24 Synchronized static/images/project-logos/trwikimedia.png: SWAT: [[gerrit:298441|Logo update for trwikimedia (T140015)]] (duration: 00m 33s)

After sync has completed, monitor logs and ask patch-owner to check that the patch was deployed successfully.

Reverting

If a patch doesn't work as expected, or causes errors, it will have to be reverted.

Process

1. Revert commit causing errors

git revert [SHA1]. If the patch being reverted is a merge commit you will have to supply -m like: git revert [SHA1] -m1

2. Push code live

scap sync-file [affected-file] the changes BEFORE pushing patches to gerrit

3. Push revert patch to gerrit

On the deployment machine, push the patch to gerrit via: git push origin HEAD:refs/for/[branch]/revert-[SHA1]. You will be prompted for your gerrit https password.

Maintenance Scripts

During the course of SWAT, you may encounter a patch that needs a maintenance script to be run as part of deployment. As noted earlier, maintenance scripts are run from Terbium.eqiad.wmnet or Wasat.codfw.wmnet.

For long running scripts, it is recommended they are run in screen as follows: screen [below listed command]

For convenience, the most frequently run maintenance scripts are presented below:

namespaceDupes

When a new namespace is added to an existing wiki, the namespaceDupes maintenance script should be run for that wiki:

you@terbium:~ $ mwscript namespaceDupes.php ptwikinews --fix

Image Cache Purges

See also Multicast HTCP purging#One-off purge

When a project logo is updated, it's best to purge the old logo from cache. This purge is done on en.wikipedia.org regardless of the wiki for which the logo is being purged:

you@terbium:~$ echo "https://en.wikipedia.org/static/images/project-logos/newikibooks.png" | mwscript purgeList.php --wiki=enwiki

updateCollation

When the default collation changes for a wiki, the updateCollation maintenance script will need to be run:

you@terbium:~$ mwscript updateCollation.php --wiki=iswiki --previous-collation=uppercase

(Note that --previous-collation=uppercase might not be the case for wikis where they had set a different category collation and are changing it to another, in that case you'll have to set that parameter accordingly).