Switch Datacenter/MediaWiki

From Wikitech
Jump to: navigation, search

Introduction

This page describes the usage of the Switchdc software to perform the datacenter switchover. In DRY-RUN mode the software prints at stderr all the steps that would have been done, performing all the read-only tasks that are safe to make the DRY-RUN mode as much realistic as possible. Hence you will find lines saying Executing commands... that are not followed by any output because not executed, and others that instead are followed by the actual execution because it's safe to run them also in DRY-RUN mode.

The code for each task can be found in the switchdc/stages directory, where each task file has the stage it belongs to in the name, following the convention tNN_description.py, where NN is the stage number, left-padded with zero.

To simplify multiple people following the execution, the logging is performed in two different files, /var/log/switchdc.log with INFO level and above, that has enough details to follow the execution, and /var/log/switchdc-extended.log with DEBUG level and above to have all the details required to audit and debug any issue that might arise.

Switch from EQIAD to CODFW

Here is a description of the menu items, complete with a dry-run output and rollback instructions. All rollback steps assume puppet and mediawiki patches that have been applied are reverted.

Switchdc Menu

$ sudo switchdc --dry-run -f eqiad -t codfw
DRY-RUN: Config file /etc/switchdc/stages.d/t03_cache_wipe/config.yaml not found, using defaults
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Datacenter switchover automation
  0 [0/2] Stage 00
  1 [0/1] Stage 01
  2 [0/1] Stage 02
  3 [0/1] Stage 03
  4 [0/1] Stage 04
  5 [0/2] Stage 05
  6 [0/1] Stage 06
  7 [0/1] Stage 07
  8 [0/1] Stage 08
  9 [0/4] Stage 09
  q - Quit
  >>>

Stage 00

rollback

  1. cumin 'R:class = profile::mediawiki::jobrunner' 'puppet-enable --force' ; cumin 'R:class = role::mediawiki::maintenance' 'puppet-enable --force'
  2. switchdc -f $dc_to -t $dc_from --task t09_restore_ttl

output

t00_disable_puppet
>>> 0
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 00
  1 [TODO] Disabling puppet on MediaWiki jobrunners, videoscalers and maintenace systems
  2 [TODO] Reduce the TTL of all the MediaWiki discovery records
  b - Back to parent menu
  q - Quit
>>> 1
DRY-RUN: START TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Disabling puppet on MediaWiki jobrunners, videoscalers and maintenace systems
DRY-RUN: Fetched hosts for query: R:class = profile::mediawiki::jobrunner
DRY-RUN: Fetched hosts for query: R:class = role::mediawiki::maintenance
DRY-RUN: Executing commands ('disable-puppet "MediaWiki Switch datacenter"',) on '40' hosts: mw[2118-2119,2152-2162,2243,2246-2250].codfw.wmnet,mw[1161-1169,1259-1260,1299-1306].eqiad.wmnet,terbium.eqiad.wmnet,wasat.codfw.wmnet
DRY-RUN: END TASK - switchdc.stages.t00_disable_puppet(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 00
  1 [PASS] Disabling puppet on MediaWiki jobrunners, videoscalers and maintenace systems
  2 [TODO] Reduce the TTL of all the MediaWiki discovery records
  b - Back to parent menu
  q - Quit
t00_reduce_ttl
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 00
  1 [PASS] Disabling puppet on MediaWiki jobrunners, videoscalers and maintenace systems
  2 [TODO] Reduce the TTL of all the MediaWiki discovery records
  b - Back to parent menu
  q - Quit
>>> 2
DRY-RUN: START TASK - switchdc.stages.t00_reduce_ttl(eqiad, codfw) Reduce the TTL of all the MediaWiki discovery records
DRY-RUN: Fetched hosts for query: R:class = role::authdns::server
/usr/lib/python2.7/dist-packages/urllib3/connection.py:337: SubjectAltNameWarning: Certificate for conf1003.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
DRY-RUN: Updating the TTL of (appservers-rw|api-rw|imagescaler-rw) to 10 seconds
DRY-RUN: Updating conftool matching tags: {'dnsdisc': '(appservers-rw|api-rw|imagescaler-rw)'}
DRY-RUN: Updating conftool: {"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=imagescaler-rw"} -> {'ttl': 10}
DRY-RUN: Updating conftool: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=imagescaler-rw"} -> {'ttl': 10}
DRY-RUN: Updating conftool: {"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=api-rw"} -> {'ttl': 10}
DRY-RUN: Updating conftool: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=api-rw"} -> {'ttl': 10}
DRY-RUN: Updating conftool: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=appservers-rw"} -> {'ttl': 10}
DRY-RUN: Updating conftool: {"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=appservers-rw"} -> {'ttl': 10}
DRY-RUN: Checking that TTL=10 for ('appservers-rw', 'api-rw', 'imagescaler-rw').discovery.wmnet records
DRY-RUN: eeden.wikimedia.org:appservers-rw: 10.2.2.1 TTL 300
DRY-RUN: eeden.wikimedia.org:api-rw: 10.2.2.22 TTL 300
DRY-RUN: eeden.wikimedia.org:imagescaler-rw: 10.2.2.21 TTL 300
DRY-RUN: radon.wikimedia.org:appservers-rw: 10.2.2.1 TTL 300
DRY-RUN: radon.wikimedia.org:api-rw: 10.2.2.22 TTL 300
DRY-RUN: radon.wikimedia.org:imagescaler-rw: 10.2.2.21 TTL 300
DRY-RUN: baham.wikimedia.org:appservers-rw: 10.2.2.1 TTL 300
DRY-RUN: baham.wikimedia.org:api-rw: 10.2.2.22 TTL 300
DRY-RUN: baham.wikimedia.org:imagescaler-rw: 10.2.2.21 TTL 300
DRY-RUN: END TASK - switchdc.stages.t00_reduce_ttl(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 00
  1 [PASS] Disabling puppet on MediaWiki jobrunners, videoscalers and maintenace systems
  2 [PASS] Reduce the TTL of all the MediaWiki discovery records
  b - Back to parent menu
  q - Quit

Stage 01

rollback

  1. switchdc -f $dc_to -t $dc_from --task t09_start_maintenance

output

>>> 1
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 01
  1 [TODO] Stop MediaWiki maintenance in the old master DC
  b - Back to parent menu
  q - Quit
>>> 1
DRY-RUN: START TASK - switchdc.stages.t01_stop_maintenance(eqiad, codfw) Stop MediaWiki maintenance in the old master DC
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Stopping jobrunners in eqiad
DRY-RUN: Fetched hosts for query: R:class = role::mediawiki::jobrunner
DRY-RUN: Fetched hosts for query: R:class = role::mediawiki::videoscaler
DRY-RUN: Executing commands ('service jobrunner stop', 'service jobchron stop', 'service hhvm restart') on '19' hosts: mw[1161-1169,1259-1260,1299-1306].eqiad.wmnet
DRY-RUN: Executing commands ('! service jobrunner status > /dev/null', '! service jobchron status > /dev/null') on '15' hosts: mw[1161-1167,1299-1306].eqiad.wmnet
===== NO OUTPUT =====
PASS |                                                                                                         |   0% (0/15) [00:00<?, ?hosts/s]
FAIL |████████████████████████████████████████████████████████████████████████████████████████████████| 100% (15/15) [00:00<00:00, 22.85hosts/s]
100.0% (15/15) of nodes failed to execute command '! service jobrun...atus > /dev/null': mw[1161-1167,1299-1306].eqiad.wmnet
0.0% (0/15) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting.
DRY-RUN: Executing commands ('status jobrunner | grep -qv running', 'status jobchron | grep -qv running') on '4' hosts: mw[1168-1169,1259-1260].eqiad.wmnet
DRY-RUN: Disabling MediaWiki cronjobs in eqiad
DRY-RUN: Fetched hosts for query: R:class = role::mediawiki::maintenance
DRY-RUN: Executing commands ('crontab -u www-data -r', 'killall -r php', 'sleep 5', 'killall -9 -r php') on '1' hosts: terbium.eqiad.wmnet
DRY-RUN: Executing commands ('test -z "$(crontab -u www-data -l | sed -r  \'/^(#|$)/d\')"',) on '1' hosts: terbium.eqiad.wmnet
===== NO OUTPUT =====
PASS |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
FAIL |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.63hosts/s]
100.0% (1/1) of nodes failed to execute command 'test -z "$(cront...r  '/^(#|$)/d')"': terbium.eqiad.wmnet
0.0% (0/1) success ratio (< 100.0% threshold) for command: 'test -z "$(cront...r  '/^(#|$)/d')"'. Aborting.
0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting.
DRY-RUN: Executing commands ('pgrep -c php',) on '1' hosts: terbium.eqiad.wmnet
===== NODE GROUP =====
(1) terbium.eqiad.wmnet
----- OUTPUT of 'pgrep -c php' -----
14
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.60hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'pgrep -c php'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Stray php processes still present on the maintenance host, please check
DRY-RUN: END TASK - switchdc.stages.t01_stop_maintenance(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 01
  1 [PASS] Stop MediaWiki maintenance in the old master DC
  b - Back to parent menu
  q - Quit

Stage 02

rollback

  1. First revert the MediaWiki patch; switchdc -f $dc_to -t $dc_from --task t08_stop_mediawiki_readonly

output

#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 02
  1 [TODO] Set MediaWiki in read-only mode (db_from config already merged and git pulled)
  b - Back to parent menu
  q - Quit
>>> 1
DRY-RUN: START TASK - switchdc.stages.t02_start_mediawiki_readonly(eqiad, codfw) Set MediaWiki in read-only mode (db_from config already merged and git pulled)
DRY-RUN: Fetched hosts for query: R:Class = Role::Noc::Site
DRY-RUN: Checked message (found=False) in MediaWiki config http://terbium.eqiad.wmnet/conf/db-eqiad.php.txt:
'readOnlyBySection' => [
	's1'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
	's2'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
	'DEFAULT' => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes', # s3
	's4'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
	's5'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
	's6'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
	's7'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
],
DRY-RUN: MediaWiki read-only period starts at: 2017-04-11 08:29:49.707385
DRY-RUN: Syncing MediaWiki wmf-config/db-eqiad.php
DRY-RUN: Fetched hosts for query: R:Class = Deployment::Rsync and R:Class%cron_ensure = absent
DRY-RUN: Executing commands ('su - volans -c \'scap sync-file --force wmf-config/db-eqiad.php "Set MediaWiki in read-only mode in datacenter eqiad"\'',) on '1' hosts: tin.eqiad.wmnet
DRY-RUN: Fetched hosts for query: R:Class = Role::Noc::Site
DRY-RUN: Checked message (found=False) in MediaWiki config http://terbium.eqiad.wmnet/conf/db-eqiad.php.txt:
'readOnlyBySection' => [
	's1'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
	's2'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
	'DEFAULT' => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes', # s3
	's4'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
	's5'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
	's6'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
	's7'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
],
DRY-RUN: Read-only mode not changed in the MediaWiki config db-eqiad?
DRY-RUN: END TASK - switchdc.stages.t02_start_mediawiki_readonly(eqiad, codfw) Failed to execute
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 02
  1 [FAIL] Set MediaWiki in read-only mode (db_from config already merged and git pulled)
  b - Back to parent menu
  q - Quit

Stage 03

rollback

  1. switchdc -f $dc_to -t $dc_from --task t07_coredb_masters_readwrite

output

>>> 3
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 03
  1 [TODO] set core DB masters in read-only mode
  b - Back to parent menu
  q - Quit
>>> 1
DRY-RUN: START TASK - switchdc.stages.t03_coredb_masters_readonly(eqiad, codfw) set core DB masters in read-only mode
DRY-RUN: Setting core DB masters in eqiad to have read-only=True
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SET GLOBAL read_only=True"',) on '10' hosts: db[1018,1031,1040-1041,1049-1050,1052,1075].eqiad.wmnet,es[1011,1014].eqiad.wmnet
DRY-RUN: Verifying core DB masters in eqiad have read-only=True
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@global.read_only"',) on '10' hosts: db[1018,1031,1040-1041,1049-1050,1052,1075].eqiad.wmnet,es[1011,1014].eqiad.wmnet
===== NODE GROUP =====
(10) db[1018,1031,1040-1041,1049-1050,1052,1075].eqiad.wmnet,es[1011,1014].eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...lobal.read_only"' -----
0
================
PASS |████████████████████████████████████████████████████████████████████████████████████████████████| 100% (10/10) [00:00<00:00, 10.38hosts/s]
FAIL |                                                                                                         |   0% (0/10) [00:00<?, ?hosts/s]
100.0% (10/10) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...lobal.read_only"'.
100.0% (10/10) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Expected output to be '1', got '0' for hosts ['db1018.eqiad.wmnet', 'db1031.eqiad.wmnet', 'db1040.eqiad.wmnet', 'db1041.eqiad.wmnet', 'db1049.eqiad.wmnet', 'db1050.eqiad.wmnet', 'db1052.eqiad.wmnet', 'db1075.eqiad.wmnet', 'es1011.eqiad.wmnet', 'es1014.eqiad.wmnet']
DRY-RUN: Verifying core DB masters in codfw have read-only=True
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@global.read_only"',) on '10' hosts: db[2016-2019,2023,2028-2029,2033].codfw.wmnet,es[2016,2018].codfw.wmnet
===== NODE GROUP =====
(10) db[2016-2019,2023,2028-2029,2033].codfw.wmnet,es[2016,2018].codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...lobal.read_only"' -----
1
================
PASS |████████████████████████████████████████████████████████████████████████████████████████████████| 100% (10/10) [00:00<00:00, 13.40hosts/s]
FAIL |                                                                                                         |   0% (0/10) [00:00<?, ?hosts/s]
100.0% (10/10) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...lobal.read_only"'.
100.0% (10/10) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: END TASK - switchdc.stages.t03_coredb_masters_readonly(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 03
  1 [PASS] set core DB masters in read-only mode
  b - Back to parent menu
  q - Quit

Stage 04

rollback

  1. only if you have got all the way to flipping the read-only switch, cumin -f $dc_to -t $dc_from --stage 04. Else, nothing is needed.

output

>>> 4
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 04
  1 [TODO] wipe and warmup caches
  b - Back to parent menu
  q - Quit
>>> 1
DRY-RUN: START TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) wipe and warmup caches
DRY-RUN: Waiting for the core DB masters in codfw to catch up
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s1"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@GLOBAL.gtid_binlog_pos"',) on '1' hosts: db1052.eqiad.wmnet
===== NODE GROUP =====
(1) db1052.eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...gtid_binlog_pos"' -----
171970637-171970637-133627133,0-180359172-5444649212
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.60hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...gtid_binlog_pos"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s1"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT MASTER_GTID_WAIT(\'171970637-171970637-133627133,0-180359172-5444649212\', 30)"',) on '1' hosts: db2016.codfw.wmnet
===== NODE GROUP =====
(1) db2016.codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...444649212', 30)"' -----
0
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  3.81hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...444649212', 30)"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s2"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@GLOBAL.gtid_binlog_pos"',) on '1' hosts: db1018.eqiad.wmnet
===== NODE GROUP =====
(1) db1018.eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...gtid_binlog_pos"' -----
171970567-171970567-343852787,0-180359173-4856363526
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.65hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...gtid_binlog_pos"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s2"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT MASTER_GTID_WAIT(\'171970567-171970567-343852787,0-180359173-4856363526\', 30)"',) on '1' hosts: db2017.codfw.wmnet
===== NODE GROUP =====
(1) db2017.codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...856363526', 30)"' -----
0
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  3.64hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...856363526', 30)"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s3"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@GLOBAL.gtid_binlog_pos"',) on '1' hosts: db1075.eqiad.wmnet
===== NODE GROUP =====
(1) db1075.eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...gtid_binlog_pos"' -----
171966669-171966669-135540332,0-180359174-4030168881
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.66hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...gtid_binlog_pos"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s3"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT MASTER_GTID_WAIT(\'171966669-171966669-135540332,0-180359174-4030168881\', 30)"',) on '1' hosts: db2018.codfw.wmnet
===== NODE GROUP =====
(1) db2018.codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...030168881', 30)"' -----
0
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  3.65hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...030168881', 30)"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s4"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@GLOBAL.gtid_binlog_pos"',) on '1' hosts: db1040.eqiad.wmnet
===== NODE GROUP =====
(1) db1040.eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...gtid_binlog_pos"' -----
171970589-171970589-152076896,0-180359175-3365971595
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.67hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...gtid_binlog_pos"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s4"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT MASTER_GTID_WAIT(\'171970589-171970589-152076896,0-180359175-3365971595\', 30)"',) on '1' hosts: db2019.codfw.wmnet
===== NODE GROUP =====
(1) db2019.codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...365971595', 30)"' -----
0
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  3.81hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...365971595', 30)"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s5"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@GLOBAL.gtid_binlog_pos"',) on '1' hosts: db1049.eqiad.wmnet
===== NODE GROUP =====
(1) db1049.eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...gtid_binlog_pos"' -----
0-180359179-5732174236,171970704-171970704-294688680
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.54hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...gtid_binlog_pos"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s5"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT MASTER_GTID_WAIT(\'0-180359179-5732174236,171970704-171970704-294688680\', 30)"',) on '1' hosts: db2023.codfw.wmnet
===== NODE GROUP =====
(1) db2023.codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...294688680', 30)"' -----
0
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  2.58hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...294688680', 30)"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s6"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@GLOBAL.gtid_binlog_pos"',) on '1' hosts: db1050.eqiad.wmnet
===== NODE GROUP =====
(1) db1050.eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...gtid_binlog_pos"' -----
171970705-171970705-214392224,0-180359184-3046938522
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.53hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...gtid_binlog_pos"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s6"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT MASTER_GTID_WAIT(\'171970705-171970705-214392224,0-180359184-3046938522\', 30)"',) on '1' hosts: db2028.codfw.wmnet
===== NODE GROUP =====
(1) db2028.codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...046938522', 30)"' -----
0
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  3.72hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...046938522', 30)"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s7"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@GLOBAL.gtid_binlog_pos"',) on '1' hosts: db1041.eqiad.wmnet
===== NODE GROUP =====
(1) db1041.eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...gtid_binlog_pos"' -----
0-180359185-3357123512,171970590-171970590-161648476
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.53hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...gtid_binlog_pos"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s7"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT MASTER_GTID_WAIT(\'0-180359185-3357123512,171970590-171970590-161648476\', 30)"',) on '1' hosts: db2029.codfw.wmnet
===== NODE GROUP =====
(1) db2029.codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...161648476', 30)"' -----
0
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  3.64hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...161648476', 30)"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "x1"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@GLOBAL.gtid_binlog_pos"',) on '1' hosts: db1031.eqiad.wmnet
===== NODE GROUP =====
(1) db1031.eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...gtid_binlog_pos"' -----
1-171970580-1,180363268-180363268-432499,0-171970580-681400508,171970580-171970580-41416716
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.55hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...gtid_binlog_pos"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "x1"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT MASTER_GTID_WAIT(\'1-171970580-1,180363268-180363268-432499,0-171970580-681400508,171970580-171970580-41416716\', 30)"',) on '1' hosts: db2033.codfw.wmnet
===== NODE GROUP =====
(1) db2033.codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...-41416716', 30)"' -----
0
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  4.52hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...-41416716', 30)"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "es2"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@GLOBAL.gtid_binlog_pos"',) on '1' hosts: es1011.eqiad.wmnet
===== NODE GROUP =====
(1) es1011.eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...gtid_binlog_pos"' -----
180367401-180367401-1642880,0-171966470-405836271
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.51hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...gtid_binlog_pos"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "es2"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT MASTER_GTID_WAIT(\'180367401-180367401-1642880,0-171966470-405836271\', 30)"',) on '1' hosts: es2016.codfw.wmnet
===== NODE GROUP =====
(1) es2016.codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...405836271', 30)"' -----
0
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  2.17hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...405836271', 30)"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "es3"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@GLOBAL.gtid_binlog_pos"',) on '1' hosts: es1014.eqiad.wmnet
===== NODE GROUP =====
(1) es1014.eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...gtid_binlog_pos"' -----
0-171970747-404916406
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.65hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...gtid_binlog_pos"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "es3"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT MASTER_GTID_WAIT(\'0-171970747-404916406\', 30)"',) on '1' hosts: es2018.codfw.wmnet
===== NODE GROUP =====
(1) es2018.codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...404916406', 30)"' -----
0
================
PASS |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  1.74hosts/s]
FAIL |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...404916406', 30)"'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Wiping out the MediaWiki caches in codfw
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:class = role::memcached
DRY-RUN: Executing commands ('service memcached restart',) on '18' hosts: mc[2019-2036].codfw.wmnet
DRY-RUN: Fetched hosts for query: R:class = role::mediawiki::webserver
DRY-RUN: Executing commands ('service hhvm restart',) on '146' hosts: mw[2017,2097-2117,2120-2151,2163-2242,2244-2245,2251-2260].codfw.wmnet
DRY-RUN: Running the global warmup job in codfw
DRY-RUN: Executing commands ('nodejs /var/lib/mediawiki-cache-warmup/warmup.js /var/lib/mediawiki-cache-warmup/urls-cluster.txt spread appservers.svc.codfw.wmnet', 'nodejs /var/lib/mediawiki-cache-warmup/warmup.js /var/lib/mediawiki-cache-warmupurls-server.txt clone codfw appserver', 'nodejs /var/lib/mediawiki-cache-warmup/warmup.js /var/lib/mediawiki-cache-warmupurls-server.txt clone codfw api_appserver') on '1' hosts: wasat.codfw.wmnet
DRY-RUN: END TASK - switchdc.stages.t04_cache_wipe(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 04
  1 [PASS] wipe and warmup caches
  b - Back to parent menu
  q - Quit

Stage 05

rollback

Run switchdc -f $dc_to -t $dc_from and execute all stages from 5 to 9

output

t05_switch_datacenter
>>> 5
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 05
  1 [TODO] Switch MediaWiki configuration to the new datacenter
  2 [TODO] Switch traffic flow to the appservers in the new datacenter
  b - Back to parent menu
  q - Quit
>>> 1
DRY-RUN: START TASK - switchdc.stages.t05_switch_datacenter(eqiad, codfw) Switch MediaWiki configuration to the new datacenter
/usr/lib/python2.7/dist-packages/urllib3/connection.py:337: SubjectAltNameWarning: Certificate for conf1001.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
DRY-RUN: Updating conftool matching tags: {'dnsdisc': '(appservers|api|imagescaler)-rw', 'name': 'codfw'}
DRY-RUN: Updating conftool: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=api-rw"} -> {'pooled': True}
DRY-RUN: Updating conftool: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=imagescaler-rw"} -> {'pooled': True}
DRY-RUN: Updating conftool: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=appservers-rw"} -> {'pooled': True}
DRY-RUN: Selected conftool object: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=appservers-rw"}
DRY-RUN: Selected conftool object: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=api-rw"}
DRY-RUN: Selected conftool object: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=imagescaler-rw"}
DRY-RUN: Fetched hosts for query: R:Class = Role::Noc::Site
DRY-RUN: Checked message (found=False) in MediaWiki config http://terbium.eqiad.wmnet/conf/CommonSettings.php.txt:
$wmfMasterDatacenter = 'codfw';
DRY-RUN: Syncing MediaWiki wmf-config/CommonSettings.php
DRY-RUN: Fetched hosts for query: R:Class = Deployment::Rsync and R:Class%cron_ensure = absent
DRY-RUN: Executing commands ('su - volans -c \'scap sync-file --force wmf-config/CommonSettings.php "Switch MediaWiki active datacenter to codfw"\'',) on '1' hosts: tin.eqiad.wmnet
DRY-RUN: Fetched hosts for query: R:Class = Role::Noc::Site
DRY-RUN: Checked message (found=False) in MediaWiki config http://terbium.eqiad.wmnet/conf/CommonSettings.php.txt:
$wmfMasterDatacenter = 'codfw';
DRY-RUN: Datacenter not changed in the MediaWiki config?
DRY-RUN: END TASK - switchdc.stages.t05_switch_datacenter(eqiad, codfw) Failed to execute
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 05
  1 [FAIL] Switch MediaWiki configuration to the new datacenter
  2 [TODO] Switch traffic flow to the appservers in the new datacenter
  b - Back to parent menu
  q - Quit
t05_switch_traffic
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 05
  1 [FAIL] Switch MediaWiki configuration to the new datacenter
  2 [TODO] Switch traffic flow to the appservers in the new datacenter
  b - Back to parent menu
  q - Quit
>>> 2
DRY-RUN: START TASK - switchdc.stages.t05_switch_traffic(eqiad, codfw) Switch traffic flow to the appservers in the new datacenter
DRY-RUN: Fetched hosts for query: R:class = profile::cumin::target and R:class%site = codfw and R:class%cluster = cache_text
DRY-RUN: Fetched hosts for query: R:class = profile::cumin::target and R:class%site = eqiad and R:class%cluster = cache_text
DRY-RUN: Executing commands ('disable-puppet "MediaWiki Switch datacenter"',) on '17' hosts: cp[2001,2004,2007,2010,2013,2016,2019,2023].codfw.wmnet,cp[1052-1055,1065-1068].eqiad.wmnet,cp1008.wikimedia.org
Please puppet-merge the varnish change, and type "merged"
> merged
DRY-RUN: Running puppet in codfw
DRY-RUN: Executing commands ('run-puppet-agent --enable "MediaWiki Switch datacenter"',) on '8' hosts: cp[2001,2004,2007,2010,2013,2016,2019,2023].codfw.wmnet
DRY-RUN: Varnish traffic is now active-active, running now puppet in eqiad
DRY-RUN: Executing commands ('run-puppet-agent --enable "MediaWiki Switch datacenter"',) on '9' hosts: cp[1052-1055,1065-1068].eqiad.wmnet,cp1008.wikimedia.org
DRY-RUN: Varnish traffic is now active only in codfw
DRY-RUN: END TASK - switchdc.stages.t05_switch_traffic(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 05
  1 [FAIL] Switch MediaWiki configuration to the new datacenter
  2 [PASS] Switch traffic flow to the appservers in the new datacenter
  b - Back to parent menu
  q - Quit

Stage 06

rollback

Run switchdc -f $dc_to -t $dc_from stages 5 through 9

output

>>> 6
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 06
  1 [TODO] Switch the Redis replication
  b - Back to parent menu
  q - Quit
>>> 1
DRY-RUN: START TASK - switchdc.stages.t06_redis(eqiad, codfw) Switch the Redis replication
DRY-RUN: Stopping replication for all instances in codfw, cluster jobqueue
DRY-RUN: Stopping replica on 10.192.0.119:6378
DRY-RUN: Stopping replica on 10.192.16.122:6380
DRY-RUN: Stopping replica on 10.192.32.133:6480
DRY-RUN: Stopping replica on 10.192.16.122:6381
DRY-RUN: Stopping replica on 10.192.32.133:6478
DRY-RUN: Stopping replica on 10.192.0.119:6379
DRY-RUN: Stopping replica on 10.192.32.133:6481
DRY-RUN: Stopping replica on 10.192.0.119:6380
DRY-RUN: Stopping replica on 10.192.0.119:6381
DRY-RUN: Stopping replica on 10.192.16.122:6378
DRY-RUN: Stopping replica on 10.192.32.133:6380
DRY-RUN: Stopping replica on 10.192.32.133:6381
DRY-RUN: Stopping replica on 10.192.32.133:6378
DRY-RUN: Stopping replica on 10.192.16.122:6379
DRY-RUN: Stopping replica on 10.192.32.133:6379
DRY-RUN: Stopping replica on 10.192.32.133:6479
DRY-RUN: Starting replication for all instances in eqiad, cluster jobqueue
DRY-RUN: Starting replica 10.192.0.119:6378 => 10.64.32.76:6378
DRY-RUN: Starting replica 10.192.16.122:6380 => 10.64.0.201:6380
DRY-RUN: Starting replica 10.192.32.133:6480 => 10.64.32.18:6380
DRY-RUN: Starting replica 10.192.16.122:6381 => 10.64.0.201:6381
DRY-RUN: Starting replica 10.192.32.133:6478 => 10.64.32.18:6378
DRY-RUN: Starting replica 10.192.0.119:6379 => 10.64.32.76:6379
DRY-RUN: Starting replica 10.192.32.133:6481 => 10.64.32.18:6381
DRY-RUN: Starting replica 10.192.0.119:6380 => 10.64.32.76:6380
DRY-RUN: Starting replica 10.192.0.119:6381 => 10.64.32.76:6381
DRY-RUN: Starting replica 10.192.16.122:6378 => 10.64.0.201:6378
DRY-RUN: Starting replica 10.192.32.133:6380 => 10.64.0.24:6380
DRY-RUN: Starting replica 10.192.32.133:6381 => 10.64.0.24:6381
DRY-RUN: Starting replica 10.192.32.133:6378 => 10.64.0.24:6378
DRY-RUN: Starting replica 10.192.16.122:6379 => 10.64.0.201:6379
DRY-RUN: Starting replica 10.192.32.133:6379 => 10.64.0.24:6379
DRY-RUN: Starting replica 10.192.32.133:6479 => 10.64.32.18:6379
DRY-RUN: Stopping replication for all instances in codfw, cluster sessions
DRY-RUN: Stopping replica on 10.192.0.86:6379
DRY-RUN: Stopping replica on 10.192.48.80:6379
DRY-RUN: Stopping replica on 10.192.32.159:6379
DRY-RUN: Stopping replica on 10.192.16.194:6379
DRY-RUN: Stopping replica on 10.192.16.62:6379
DRY-RUN: Stopping replica on 10.192.16.61:6379
DRY-RUN: Stopping replica on 10.192.16.60:6379
DRY-RUN: Stopping replica on 10.192.48.79:6379
DRY-RUN: Stopping replica on 10.192.32.160:6379
DRY-RUN: Stopping replica on 10.192.0.84:6379
DRY-RUN: Stopping replica on 10.192.0.83:6379
DRY-RUN: Stopping replica on 10.192.32.161:6379
DRY-RUN: Stopping replica on 10.192.48.76:6379
DRY-RUN: Stopping replica on 10.192.32.162:6379
DRY-RUN: Stopping replica on 10.192.48.77:6379
DRY-RUN: Stopping replica on 10.192.32.163:6379
DRY-RUN: Stopping replica on 10.192.0.85:6379
DRY-RUN: Stopping replica on 10.192.48.78:6379
DRY-RUN: Starting replication for all instances in eqiad, cluster sessions
DRY-RUN: Starting replica 10.192.0.86:6379 => 10.64.0.183:6379
DRY-RUN: Starting replica 10.192.48.80:6379 => 10.64.48.96:6379
DRY-RUN: Starting replica 10.192.32.159:6379 => 10.64.32.163:6379
DRY-RUN: Starting replica 10.192.16.194:6379 => 10.64.32.162:6379
DRY-RUN: Starting replica 10.192.16.62:6379 => 10.64.32.161:6379
DRY-RUN: Starting replica 10.192.16.61:6379 => 10.64.0.185:6379
DRY-RUN: Starting replica 10.192.16.60:6379 => 10.64.0.184:6379
DRY-RUN: Starting replica 10.192.48.79:6379 => 10.64.48.95:6379
DRY-RUN: Starting replica 10.192.32.160:6379 => 10.64.32.164:6379
DRY-RUN: Starting replica 10.192.0.84:6379 => 10.64.0.181:6379
DRY-RUN: Starting replica 10.192.0.83:6379 => 10.64.0.180:6379
DRY-RUN: Starting replica 10.192.32.161:6379 => 10.64.32.165:6379
DRY-RUN: Starting replica 10.192.48.76:6379 => 10.64.48.102:6379
DRY-RUN: Starting replica 10.192.32.162:6379 => 10.64.32.166:6379
DRY-RUN: Starting replica 10.192.48.77:6379 => 10.64.48.103:6379
DRY-RUN: Starting replica 10.192.32.163:6379 => 10.64.48.101:6379
DRY-RUN: Starting replica 10.192.0.85:6379 => 10.64.0.182:6379
DRY-RUN: Starting replica 10.192.48.78:6379 => 10.64.48.104:6379
DRY-RUN: END TASK - switchdc.stages.t06_redis(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 06
  1 [PASS] Switch the Redis replication
  b - Back to parent menu
  q - Quit

Stage 07

rollback

Run switchdc -f $dc_to -t $dc_from stages 3 thorugh 9

output

>>> 7
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 07
  1 [TODO] set core DB masters in read-write mode
  b - Back to parent menu
  q - Quit
>>> 1
DRY-RUN: START TASK - switchdc.stages.t07_coredb_masters_readwrite(eqiad, codfw) set core DB masters in read-write mode
DRY-RUN: Setting core DB masters in codfw to have read-only=False
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SET GLOBAL read_only=False"',) on '10' hosts: db[2016-2019,2023,2028-2029,2033].codfw.wmnet,es[2016,2018].codfw.wmnet
DRY-RUN: Verifying core DB masters in codfw have read-only=False
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@global.read_only"',) on '10' hosts: db[2016-2019,2023,2028-2029,2033].codfw.wmnet,es[2016,2018].codfw.wmnet
===== NODE GROUP =====
(10) db[2016-2019,2023,2028-2029,2033].codfw.wmnet,es[2016,2018].codfw.wmnet
----- OUTPUT of 'mysql --skip-ssl...lobal.read_only"' -----
1
================
PASS |████████████████████████████████████████████████████████████████████████████████████████████████| 100% (10/10) [00:00<00:00,  6.61hosts/s]
FAIL |                                                                                                         |   0% (0/10) [00:00<?, ?hosts/s]
100.0% (10/10) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...lobal.read_only"'.
100.0% (10/10) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Expected output to be '0', got '1' for hosts ['db2016.codfw.wmnet', 'db2017.codfw.wmnet', 'db2018.codfw.wmnet', 'db2019.codfw.wmnet', 'db2023.codfw.wmnet', 'db2028.codfw.wmnet', 'db2029.codfw.wmnet', 'db2033.codfw.wmnet', 'es2016.codfw.wmnet', 'es2018.codfw.wmnet']
DRY-RUN: Verifying core DB masters in eqiad have read-only=True
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "SELECT @@global.read_only"',) on '10' hosts: db[1018,1031,1040-1041,1049-1050,1052,1075].eqiad.wmnet,es[1011,1014].eqiad.wmnet
===== NODE GROUP =====
(10) db[1018,1031,1040-1041,1049-1050,1052,1075].eqiad.wmnet,es[1011,1014].eqiad.wmnet
----- OUTPUT of 'mysql --skip-ssl...lobal.read_only"' -----
0
================
PASS |████████████████████████████████████████████████████████████████████████████████████████████████| 100% (10/10) [00:00<00:00,  2.39hosts/s]
FAIL |                                                                                                         |   0% (0/10) [00:00<?, ?hosts/s]
100.0% (10/10) success ratio (>= 100.0% threshold) for command: 'mysql --skip-ssl...lobal.read_only"'.
100.0% (10/10) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.
DRY-RUN: Expected output to be '1', got '0' for hosts ['db1018.eqiad.wmnet', 'db1031.eqiad.wmnet', 'db1040.eqiad.wmnet', 'db1041.eqiad.wmnet', 'db1049.eqiad.wmnet', 'db1050.eqiad.wmnet', 'db1052.eqiad.wmnet', 'db1075.eqiad.wmnet', 'es1011.eqiad.wmnet', 'es1014.eqiad.wmnet']
DRY-RUN: END TASK - switchdc.stages.t07_coredb_masters_readwrite(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 07
  1 [PASS] set core DB masters in read-write mode
  b - Back to parent menu
  q - Quit

Stage 08

rollback

Run switchdc -f $dc_to -t $dc_from stages 2 through 9

output

>>> 8
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 08
  1 [TODO] Set MediaWiki in read-write mode (db_to config already merged and git pulled)
  b - Back to parent menu
  q - Quit
>>> 1
DRY-RUN: START TASK - switchdc.stages.t08_stop_mediawiki_readonly(eqiad, codfw) Set MediaWiki in read-write mode (db_to config already merged and git pulled)
DRY-RUN: Fetched hosts for query: R:Class = Role::Noc::Site
DRY-RUN: Checked message (found=False) in MediaWiki config http://terbium.eqiad.wmnet/conf/db-codfw.php.txt:
'readOnlyBySection' => [
#	's1'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
#	's2'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
#	'DEFAULT' => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes', # s3
#	's4'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
#	's5'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
#	's6'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
#	's7'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
],
DRY-RUN: Syncing MediaWiki wmf-config/db-codfw.php
DRY-RUN: Fetched hosts for query: R:Class = Deployment::Rsync and R:Class%cron_ensure = absent
DRY-RUN: Executing commands ('su - volans -c \'scap sync-file --force wmf-config/db-codfw.php "Set MediaWiki in read-write mode in datacenter codfw"\'',) on '1' hosts: tin.eqiad.wmnet
DRY-RUN: Fetched hosts for query: R:Class = Role::Noc::Site
DRY-RUN: Checked message (found=False) in MediaWiki config http://terbium.eqiad.wmnet/conf/db-codfw.php.txt:
'readOnlyBySection' => [
#	's1'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
#	's2'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
#	'DEFAULT' => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes', # s3
#	's4'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
#	's5'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
#	's6'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
#	's7'      => 'MediaWiki is in read-only mode for maintenance. Please try again in 15 minutes',
],
DRY-RUN: Read-only mode not changed in the MediaWiki config db-codfw?
DRY-RUN: END TASK - switchdc.stages.t08_stop_mediawiki_readonly(eqiad, codfw) Failed to execute
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 08
  1 [FAIL] Set MediaWiki in read-write mode (db_to config already merged and git pulled)
  b - Back to parent menu
  q - Quit

Stage 09

rollback

This is technically a full switchover in the other direction.

output

t09_restart_parsoid
>>> 9
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 09
  1 [TODO] Rolling restart parsoid in eqiad and codfw
  2 [TODO] Restore the TTL of all the MediaWiki discovery records
  3 [TODO] Start MediaWiki maintenance in the new master DC
  4 [TODO] Update Tendril configuration for the new masters
  b - Back to parent menu
  q - Quit
>>> 1
DRY-RUN: START TASK - switchdc.stages.t09_restart_parsoid(eqiad, codfw) Rolling restart parsoid in eqiad and codfw
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = eqiad
DRY-RUN: Filtering host selection for site: eqiad
DRY-RUN: Fetched hosts for query: R:class = role::parsoid
DRY-RUN: Executing commands ('restart-parsoid',) on '24' hosts: wtp[1001-1024].eqiad.wmnet
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:class = role::parsoid
DRY-RUN: Executing commands ('restart-parsoid',) on '20' hosts: wtp[2001-2020].codfw.wmnet
DRY-RUN: END TASK - switchdc.stages.t09_restart_parsoid(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 09
  1 [PASS] Rolling restart parsoid in eqiad and codfw
  2 [TODO] Restore the TTL of all the MediaWiki discovery records
  3 [TODO] Start MediaWiki maintenance in the new master DC
  4 [TODO] Update Tendril configuration for the new masters
  b - Back to parent menu
  q - Quit
t09_restore_ttl
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 09
  1 [PASS] Rolling restart parsoid in eqiad and codfw
  2 [TODO] Restore the TTL of all the MediaWiki discovery records
  3 [TODO] Start MediaWiki maintenance in the new master DC
  4 [TODO] Update Tendril configuration for the new masters
  b - Back to parent menu
  q - Quit
>>> 2
DRY-RUN: START TASK - switchdc.stages.t09_restore_ttl(eqiad, codfw) Restore the TTL of all the MediaWiki discovery records
DRY-RUN: Fetched hosts for query: R:class = role::authdns::server
/usr/lib/python2.7/dist-packages/urllib3/connection.py:337: SubjectAltNameWarning: Certificate for conf1003.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
DRY-RUN: Updating the TTL of (appservers-rw|api-rw|imagescaler-rw) to 300 seconds
DRY-RUN: Updating conftool matching tags: {'dnsdisc': '(appservers-rw|api-rw|imagescaler-rw)'}
DRY-RUN: Updating conftool: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=imagescaler-rw"} -> {'ttl': 300}
DRY-RUN: Updating conftool: {"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=imagescaler-rw"} -> {'ttl': 300}
DRY-RUN: Updating conftool: {"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=api-rw"} -> {'ttl': 300}
DRY-RUN: Updating conftool: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=api-rw"} -> {'ttl': 300}
DRY-RUN: Updating conftool: {"codfw": {"pooled": false, "references": [], "ttl": 300}, "tags": "dnsdisc=appservers-rw"} -> {'ttl': 300}
DRY-RUN: Updating conftool: {"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=appservers-rw"} -> {'ttl': 300}
DRY-RUN: Checking that TTL=300 for ('appservers-rw', 'api-rw', 'imagescaler-rw').discovery.wmnet records
DRY-RUN: eeden.wikimedia.org:appservers-rw: 10.2.2.1 TTL 300
DRY-RUN: eeden.wikimedia.org:api-rw: 10.2.2.22 TTL 300
DRY-RUN: eeden.wikimedia.org:imagescaler-rw: 10.2.2.21 TTL 300
DRY-RUN: radon.wikimedia.org:appservers-rw: 10.2.2.1 TTL 300
DRY-RUN: radon.wikimedia.org:api-rw: 10.2.2.22 TTL 300
DRY-RUN: radon.wikimedia.org:imagescaler-rw: 10.2.2.21 TTL 300
DRY-RUN: baham.wikimedia.org:appservers-rw: 10.2.2.1 TTL 300
DRY-RUN: baham.wikimedia.org:api-rw: 10.2.2.22 TTL 300
DRY-RUN: baham.wikimedia.org:imagescaler-rw: 10.2.2.21 TTL 300
DRY-RUN: END TASK - switchdc.stages.t09_restore_ttl(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 09
  1 [PASS] Rolling restart parsoid in eqiad and codfw
  2 [PASS] Restore the TTL of all the MediaWiki discovery records
  3 [TODO] Start MediaWiki maintenance in the new master DC
  4 [TODO] Update Tendril configuration for the new masters
  b - Back to parent menu
  q - Quit
t09_start_maintenance
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 09
  1 [PASS] Rolling restart parsoid in eqiad and codfw
  2 [PASS] Restore the TTL of all the MediaWiki discovery records
  3 [TODO] Start MediaWiki maintenance in the new master DC
  4 [TODO] Update Tendril configuration for the new masters
  b - Back to parent menu
  q - Quit
>>> 3
DRY-RUN: START TASK - switchdc.stages.t09_start_maintenance(eqiad, codfw) Start MediaWiki maintenance in the new master DC
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:class = role::mediawiki::jobrunner
DRY-RUN: Fetched hosts for query: R:class = role::mediawiki::videoscaler
DRY-RUN: Fetched hosts for query: R:class = role::mediawiki::maintenance
DRY-RUN: Executing commands ('run-puppet-agent --enable "MediaWiki Switch datacenter"',) on '20' hosts: mw[2118-2119,2152-2162,2243,2246-2250].codfw.wmnet,wasat.codfw.wmnet
DRY-RUN: Executing commands ('service jobrunner status > /dev/null', 'service jobchron status > /dev/null') on '15' hosts: mw[2153-2162,2243,2247-2250].codfw.wmnet
===== NO OUTPUT =====
PASS |                                                                                                         |   0% (0/15) [00:00<?, ?hosts/s]
FAIL |████████████████████████████████████████████████████████████████████████████████████████████████| 100% (15/15) [00:00<00:00, 52.57hosts/s]
100.0% (15/15) of nodes failed to execute command 'service jobrunne...atus > /dev/null': mw[2153-2162,2243,2247-2250].codfw.wmnet
0.0% (0/15) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting.
DRY-RUN: Executing commands ('status jobrunner | grep -q running', 'status jobchron | grep -q running') on '4' hosts: mw[2118-2119,2152,2246].codfw.wmnet
DRY-RUN: Executing commands ('test "$(crontab -u www-data -l | sed -r \'/^(#|$)/d\')"',) on '1' hosts: wasat.codfw.wmnet
===== NODE GROUP =====
(1) wasat.codfw.wmnet
----- OUTPUT of 'test "$(crontab ...-r '/^(#|$)/d')"' -----
no crontab for www-data
================
PASS |                                                                                                          |   0% (0/1) [00:00<?, ?hosts/s]
FAIL |██████████████████████████████████████████████████████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  4.06hosts/s]
100.0% (1/1) of nodes failed to execute command 'test "$(crontab ...-r '/^(#|$)/d')"': wasat.codfw.wmnet
0.0% (0/1) success ratio (< 100.0% threshold) for command: 'test "$(crontab ...-r '/^(#|$)/d')"'. Aborting.
0.0% (0/1) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting.
DRY-RUN: END TASK - switchdc.stages.t09_start_maintenance(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 09
  1 [PASS] Rolling restart parsoid in eqiad and codfw
  2 [PASS] Restore the TTL of all the MediaWiki discovery records
  3 [PASS] Start MediaWiki maintenance in the new master DC
  4 [TODO] Update Tendril configuration for the new masters
  b - Back to parent menu
  q - Quit
t09_tendril
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 09
  1 [PASS] Rolling restart parsoid in eqiad and codfw
  2 [PASS] Restore the TTL of all the MediaWiki discovery records
  3 [PASS] Start MediaWiki maintenance in the new master DC
  4 [TODO] Update Tendril configuration for the new masters
  b - Back to parent menu
  q - Quit
>>> 4
DRY-RUN: START TASK - switchdc.stages.t09_tendril(eqiad, codfw) Update Tendril configuration for the new masters
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Tendril
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s1"
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s2"
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s3"
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s4"
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s5"
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s6"
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "s7"
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "x1"
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "es2"
DRY-RUN: Fetched hosts for query: R:Ganglia::Cluster%site = codfw
DRY-RUN: Filtering host selection for site: codfw
DRY-RUN: Fetched hosts for query: R:Class = Role::Mariadb::Groups and R:Class%mysql_role = "master" and R:Class%mysql_group = "core" and R:Class%mysql_shard = "es3"
DRY-RUN: Executing commands ('mysql --skip-ssl --skip-column-names --batch -e "UPDATE shards SET master_id = (SELECT id FROM servers WHERE host = \'db2016.codfw.wmnet\') WHERE name = \'s1\'" tendril', 'mysql --skip-ssl --skip-column-names --batch -e "UPDATE shards SET master_id = (SELECT id FROM servers WHERE host = \'db2017.codfw.wmnet\') WHERE name = \'s2\'" tendril', 'mysql --skip-ssl --skip-column-names --batch -e "UPDATE shards SET master_id = (SELECT id FROM servers WHERE host = \'db2018.codfw.wmnet\') WHERE name = \'s3\'" tendril', 'mysql --skip-ssl --skip-column-names --batch -e "UPDATE shards SET master_id = (SELECT id FROM servers WHERE host = \'db2019.codfw.wmnet\') WHERE name = \'s4\'" tendril', 'mysql --skip-ssl --skip-column-names --batch -e "UPDATE shards SET master_id = (SELECT id FROM servers WHERE host = \'db2023.codfw.wmnet\') WHERE name = \'s5\'" tendril', 'mysql --skip-ssl --skip-column-names --batch -e "UPDATE shards SET master_id = (SELECT id FROM servers WHERE host = \'db2028.codfw.wmnet\') WHERE name = \'s6\'" tendril', 'mysql --skip-ssl --skip-column-names --batch -e "UPDATE shards SET master_id = (SELECT id FROM servers WHERE host = \'db2029.codfw.wmnet\') WHERE name = \'s7\'" tendril', 'mysql --skip-ssl --skip-column-names --batch -e "UPDATE shards SET master_id = (SELECT id FROM servers WHERE host = \'db2033.codfw.wmnet\') WHERE name = \'x1\'" tendril', 'mysql --skip-ssl --skip-column-names --batch -e "UPDATE shards SET master_id = (SELECT id FROM servers WHERE host = \'es2016.codfw.wmnet\') WHERE name = \'es2\'" tendril', 'mysql --skip-ssl --skip-column-names --batch -e "UPDATE shards SET master_id = (SELECT id FROM servers WHERE host = \'es2018.codfw.wmnet\') WHERE name = \'es3\'" tendril') on '1' hosts: db1011.eqiad.wmnet
DRY-RUN: END TASK - switchdc.stages.t09_tendril(eqiad, codfw) Successfully completed
#--- DATACENTER SWITCHOVER FROM eqiad TO codfw ---#
Stage 09
  1 [PASS] Rolling restart parsoid in eqiad and codfw
  2 [PASS] Restore the TTL of all the MediaWiki discovery records
  3 [PASS] Start MediaWiki maintenance in the new master DC
  4 [PASS] Update Tendril configuration for the new masters
  b - Back to parent menu
  q - Quit