Jump to content

Obsolete:Eqiad Migration Planning/Steps

From Wikitech

Day 1: Tue Jan 22

Preparation (before maintenance window)

Check LVS pools apaches, api and rendering for down/depooled machines. A few machines may be broken (and should be removed from the config from the time being), but all others should be up and happy in health checks.

# ipvsadm -l
# less /var/log/pybal.log

Check whether the Nagios check for these LVS pools exists and is up.

Check whether all pooled application servers have the right LVS service IPs bound to loopback.

Check deployed MediaWiki revision / git status on all application servers

The new eqiad MySQL databases can be warmed up at this point. Asher has a script which plays recorded SELECT (read-only) queries against them to warm up their cashes.

Ensure media writes to the NetApp are disabled

Migrate bits apaches to eqiad

Check whether the 4 bits apaches are healthy according to a bits Varnish server:

# varnishlog -i Backend_health -O

Test a few top bits URLs manually from the new bits app servers to see if valid content is being returned. To retrieve the most requested URLs, on a bits Varnish server:

# varnishtop -i RxURL

To test such a URL, use CURL, or:

fenari: $ /home/mark/firstbyte.py apache_host_name 80 bits.wikimedia.org URI

Run varnishtop for a histogram of HTTP status codes, and compare before/after migration:

# varnishtop -i TxStatus

Deploy Gerrit patch set 44251 and run Puppet for node group XXX. This will change the apache backends for the eqiad Varnish servers only, giving us a chance to fall back on pmtpa bits Varnish servers quickly if needed.

Check if the distribution of HTTP status codes changes drastically, esp. HTTP 2xx vs. 4xx/5xx.

If bits@eqiad is confirmed to work correctly, after a while deploy Gerrit patchset 44252 and run Puppet for node group XXX. This will switch the pmtpa bits Varnish servers to use the eqiad bits appservers as well.

Mobile Varnish backend changes

Merge Gerrit patch set 45091 (set $::mw_primary = eqiad) and Gerrit patch set 44257, and run Puppet on hosts cp1041 .. cp1044.

Set MediaWiki to read-only in pmtpa

$wgReadOnly - already set in db-eqiad.php. Needs to be set in db-pmtpa.php.

Merge Gerrit patch set 44845, sync-file wmf-config/db-pmtpa.php

Redis

Make eqiad redis instances masters:

root@fenari:# dsh -cM -F8 -g mc_eqiad "redis-cli SLAVEOF NO ONE"

Make pmtpa redis instances slave from eqiad - this step doesn't have to happen directly after the above, can be done last: After $mw_primary puppet change merged:

root@fenari:# dsh -cM -F8 -g mc_pmtpa "puppetd -t ; /etc/init.d/redis-server restart"

Memcached

Note that because memcached cache content is not replicated between the data center sites, Tampa's memcached servers will need to be cleared prior to switch back.

Text Squids backend changes

This is the actual switch of directing clients to eqiad Apaches.

The Squid configuration resides in /home/w/conf/squid on fenari, and is backed by a git repository nowadays. Mark has prepared 3 commits in branch eqiad-switchover, that migrate the image scalers, the API application servers and the regular application servers to eqiad.

For each of these commits, use the following sequence:

Merge the commit onto master:

$ git merge XXX
  1. As root, run make to generate the new configuration files. Make sure there are no permission errors.
# make

Now, run a diff of all new configurations against the configurations currently deployed. Make sure the differences reflect the backend changes you expect.

# diff -ru deployed/ generated/ | less

Finally, deploy the configurations to all Squids. Make sure you have ssh agent forwarding enabled for this step. The configurations will be deployed directly and become active immediately, but will also be pushed to Puppet's volatile file module.

# ./deploy cache

(you can deploy to just pmtpa.text and eqiad.text if you prefer, as long as you do both.)

First migrate the image scalers. They run a limited subset of MediaWiki, and any problems are unlikely to cause harm.

Next, the API application servers.

Finally, normal clients: the regular application servers.

Master switch on all database + es shards

root@db1001:/etc/mha# /usr/local/bin/mha_site_switch -s eqiad -y

Swap the $shard-primary and $shard-secondary dns records. This is required for heartbeat monitoring, will erroneously show CRIT until then.

MediaWiki read-write in eqiad

Application servers in pmtpa need to stay in read-only mode, but eqiad application servers can go read-write at this point.

Merge Gerrit patch set 44847, sync-file wmf-config/db-eqiad.php