Server Admin Log/Archive 44

2021-04-30

21:54 mutante: people1003 - rsycncing /home from peopel1002
15:30 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
15:29 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
15:25 bstorm: hard rebooting cloudmetrics1002 T275605
11:40 ladsgroup@deploy1002: Synchronized static/favicon/wikitech.ico: Config: Update wikitech logo (duration: 00m 56s)
11:36 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-1.5x.png: Config: Update wikitech logo (duration: 00m 56s)
11:34 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-2x.png: Config: Update wikitech logo (duration: 00m 57s)
11:33 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech.png: Config: Update wikitech logo (duration: 00m 57s)
11:31 ladsgroup@deploy1002: Synchronized logos/config.yaml: Config: Update wikitech logo (duration: 00m 57s)
09:04 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
09:03 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
08:11 moritzm: remove mc1027 from debmonitor, server is broken and won't return (T276415)
07:38 moritzm: installing iputils updates from Buster point release
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15667 and previous config saved to /var/cache/conftool/dbconfig/20210430-061549-root.json
06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15666 and previous config saved to /var/cache/conftool/dbconfig/20210430-060046-root.json
05:51 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15665 and previous config saved to /var/cache/conftool/dbconfig/20210430-054542-root.json
05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15664 and previous config saved to /var/cache/conftool/dbconfig/20210430-053038-root.json
05:16 marostegui: Upgrade kernel on db1114
05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 to enable report_host T266483', diff saved to https://phabricator.wikimedia.org/P15663 and previous config saved to /var/cache/conftool/dbconfig/20210430-051558-marostegui.json
05:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1080.eqiad.wmnet
04:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1080.eqiad.wmnet
04:56 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo systemctl restart wdqs-blazegraph`
04:43 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
04:43 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
04:42 ryankemper: T261239 `elastic2033`, which is known to be in a state of hardware failure (we have a ticket open), is holding up the reboot of codfw. I don't think we have a good way to exclude a node currently. Going to just proceed to `eqiad` for now
04:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
04:39 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
04:39 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
04:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
04:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
04:03 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
03:50 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1010.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
03:47 ryankemper: T280563 about half of codfw nodes have been rebooted before the failure caused by write queue not emptying fast enough, kicking it off again:`sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
03:45 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
01:08 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563

2021-04-29

23:36 thcipriani@deploy1002: Synchronized README: Config: Revert "DEMO: Add newline to README" (duration: 00m 56s)
23:18 ryankemper: T280563 successful reboot of `relforge100[3,4]`; `relforge` cluster is back to green status.
23:16 thcipriani@deploy1002: Synchronized README: Config: DEMO: Add newline to README (duration: 00m 56s)
23:08 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts` (amended command)
23:06 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
23:05 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
22:46 ryankemper: T280563 Current master is `relforge1003-relforge-eqiad`, will reboot `1004` first then `1003` after
22:44 ryankemper: T280563 Bleh, we never moved the new config into spicerack, so it's trying to talk to the old relforge hosts which no longer exist. Will reboot relforge manually and use the cookbook for codfw/eqiad, and circle back later for the spicerack change
22:37 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
22:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
22:32 ryankemper: T280563 Spotted the issue; forgot to set `--without-lvs` for relforge reboot
22:27 ryankemper: T280563 `urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7fbe4bb8a518>: Failed to establish a new connection: [Errno -2] Name or service not known`
22:26 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - T280563
22:26 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - T280563
22:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
22:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
22:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
22:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - T280563
21:36 mutante: icinga - enabling disabled notifications for random an-worker nodes where mgmt interface had enabled alerts but the actual host didnt
21:32 mutante: icinga - enabled notifications for checks on ms-backup1001 - they were all manually disabled but none of the checks had any status change since 50 days which indicates it was forgotten to turn them back on which is a common issue with disabling notifications
21:16 mutante: backup1001 - sudo check_bacula.py --icinga
20:54 marostegui: Stop mysql on tendril for the UTC night, dbtree and tendrill will remain down for a few hours T281486
20:16 marostegui: Restart tendril database - T281486
20:00 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.3 refs T278347
19:46 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 refs T278347 (duration: 01m 08s)
19:45 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3 refs T278347
19:32 dpifke@deploy1002: Finished deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484 (duration: 00m 05s)
19:32 dpifke@deploy1002: Started deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484
19:01 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki/wanobjectcache/revision_row_1/ (bad data from Sep 2019)
18:59 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/rl-minify-* (bad data from Aug 2018)
18:58 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki_ExternalGuidance_init_Google_tr_fr (bad data from Nov 2019)
18:38 krinkle@deploy1002: Synchronized php-1.37.0-wmf.1/includes/libs/objectcache/MemcachedBagOStuff.php: I926797, T281480 (duration: 01m 08s)
18:33 mutante: LDAP - added mmandere to wmf group (T281344)
18:10 krinkle@deploy1002: Synchronized php-1.37.0-wmf.3/includes/libs/objectcache/MemcachedBagOStuff.php: I926797, T281480 (duration: 01m 09s)
17:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
17:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
16:29 ryankemper: T281498 `sudo -E cumin 'C:role::lvs::balancer' 'sudo run-puppet-agent'`
16:28 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
16:27 liw@deploy1002: sync-wikiversions aborted: Revert "group[0|1] wikis to [VERSION]" (duration: 00m 01s)
16:22 ryankemper: T281498 `ryankemper@wdqs2004:~$ sudo depool`
16:20 ryankemper: T281498 `ryankemper@wdqs2004:~$ sudo run-puppet-agent`
16:18 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - T273789 (duration: 02m 39s)
16:15 otto@deploy1002: Started deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - T273789
16:12 papaul: powerdown thanos-fe2001 for memory swap
15:44 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (trying reimaging this host one final time, if this fails again will need to do a deeper investigation into what's going wrong here)
15:43 ryankemper: [WDQS] `wdqs2001` is high on update lag but otherwise functioning; will repool when lag is caught up
15:37 ryankemper: [WDQS] `sudo systemctl restart wdqs-blazegraph` && `sudo systemctl restart wdqs-updater` on `wdqs2001`
15:35 ryankemper: [WDQS] ^ scratch that, depooled `wdqs2001`
15:34 ryankemper: [WDQS] pooled `wdqs2001`
14:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
13:44 moritzm: installing Java security updates on stat* hosts
13:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
13:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
13:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
13:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
13:40 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789 (duration: 02m 59s)
13:37 otto@deploy1002: Started deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - T273789
13:11 moritzm: installing postgresql-11 security updates
13:08 jbond42: merge netbase change to manage /etc/services
13:07 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
13:06 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
12:36 Amir1: upgrading Quiddity to admin in mailman3
12:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
12:36 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
12:26 moritzm: installing grub2 updates from buster point release
12:06 jbond42: update debmonitor.discover.wmnet ssl cert
11:59 ladsgroup@deploy1002: Synchronized wmf-config/extension-list: Config: Undeploy JADE from production, Part III (T281418) (duration: 01m 07s)
11:54 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Undeploy JADE from production, Part II (T281418), Part I (duration: 01m 06s)
11:49 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Undeploy JADE from production, Part I (T281418) (duration: 01m 07s)
11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
11:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
11:38 mbsantos@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable suggested values in TemplateData and VisualEditor CommonSettings (T273857) (duration: 01m 07s)
11:34 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: Another fix for token cookie handling (T281346) (duration: 01m 07s)
11:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: Another fix for token cookie handling (T281346) (duration: 01m 08s)
11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15658 and previous config saved to /var/cache/conftool/dbconfig/20210429-113211-root.json
11:24 mbsantos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable suggested values in TemplateData and VisualEditor InitialiseSettings (T273857) (duration: 01m 07s)
11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15657 and previous config saved to /var/cache/conftool/dbconfig/20210429-111708-root.json
11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15656 and previous config saved to /var/cache/conftool/dbconfig/20210429-110204-root.json
10:59 moritzm: updating apt on buster (SUA 198), which eases bullseye upgrades T275873
10:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: Fix CX token cookie (T281346) (duration: 01m 08s)
10:54 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: Fix CX token cookie (T281346) (duration: 01m 09s)
10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15655 and previous config saved to /var/cache/conftool/dbconfig/20210429-104700-root.json
10:27 marostegui: Upgrade kernel on db1110
10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15654 and previous config saved to /var/cache/conftool/dbconfig/20210429-102447-marostegui.json
09:42 volans: uploaded pynetbox 5.3.0-2 to bullseye-wikimedia on qpt.w.o
09:39 volans@deploy1002: Finished deploy [homer/deploy@e394769]: Release v0.2.8 (duration: 03m 30s)
09:35 volans@deploy1002: Started deploy [homer/deploy@e394769]: Release v0.2.8
09:01 jynus: stop replication and checking data of db2100:s7
08:57 marostegui: Upgrade kernel on db2133
08:51 marostegui: Upgrade kernel on db2125
08:50 marostegui: Upgrade kernel on db2124
08:46 marostegui: Upgrade kernel on db2122
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 100%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15652 and previous config saved to /var/cache/conftool/dbconfig/20210429-084011-root.json
08:39 marostegui: Upgrade kernel on db2121
08:33 marostegui: Upgrade kernel on db2120
08:28 volans@deploy1002: Finished deploy [homer/deploy@89cd07c]: Release v0.2.7 (duration: 03m 08s)
08:27 marostegui: Upgrade kernel on db2115
08:25 volans@deploy1002: Started deploy [homer/deploy@89cd07c]: Release v0.2.7
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 80%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15651 and previous config saved to /var/cache/conftool/dbconfig/20210429-082507-root.json
08:19 marostegui: Upgrade kernel on db2114
08:12 marostegui: Upgrade kernel on db2109
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 70%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15649 and previous config saved to /var/cache/conftool/dbconfig/20210429-081004-root.json
07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 60%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15648 and previous config saved to /var/cache/conftool/dbconfig/20210429-075500-root.json
07:54 marostegui: Upgrade kernel on db2089
07:48 jynus: rolling restart of bacula hosts T273182
07:48 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 01m 07s)
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15647 and previous config saved to /var/cache/conftool/dbconfig/20210429-074625-root.json
07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 50%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15646 and previous config saved to /var/cache/conftool/dbconfig/20210429-073956-root.json
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 90%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15645 and previous config saved to /var/cache/conftool/dbconfig/20210429-073122-root.json
07:28 marostegui: Stop mysql and upgrade kernel on pc1007
07:28 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Depool pc1007 (duration: 01m 08s)
07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 40%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15644 and previous config saved to /var/cache/conftool/dbconfig/20210429-072453-root.json
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 80%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15643 and previous config saved to /var/cache/conftool/dbconfig/20210429-071618-root.json
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 25%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15642 and previous config saved to /var/cache/conftool/dbconfig/20210429-070949-root.json
07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15641 and previous config saved to /var/cache/conftool/dbconfig/20210429-070114-root.json
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 10%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15640 and previous config saved to /var/cache/conftool/dbconfig/20210429-065445-root.json
06:53 godog: add 100G to prometheus/ops in eqiad
06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 60%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15639 and previous config saved to /var/cache/conftool/dbconfig/20210429-064611-root.json
06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15637 and previous config saved to /var/cache/conftool/dbconfig/20210429-063107-root.json
06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 40%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15636 and previous config saved to /var/cache/conftool/dbconfig/20210429-061603-root.json
06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 30%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15635 and previous config saved to /var/cache/conftool/dbconfig/20210429-060100-root.json
05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15634 and previous config saved to /var/cache/conftool/dbconfig/20210429-054556-root.json
05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 20%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15633 and previous config saved to /var/cache/conftool/dbconfig/20210429-053052-root.json
05:22 marostegui: Check tables on db1121 (this will cause lag on s4 commonswiki, on wikireplicas)
05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for tables checking', diff saved to https://phabricator.wikimedia.org/P15632 and previous config saved to /var/cache/conftool/dbconfig/20210429-052146-marostegui.json
05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 15%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15631 and previous config saved to /var/cache/conftool/dbconfig/20210429-051549-root.json
05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15630 and previous config saved to /var/cache/conftool/dbconfig/20210429-050045-root.json
04:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15629 and previous config saved to /var/cache/conftool/dbconfig/20210429-045557-marostegui.json
04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15627 and previous config saved to /var/cache/conftool/dbconfig/20210429-045015-marostegui.json
04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15626 and previous config saved to /var/cache/conftool/dbconfig/20210429-044458-marostegui.json
04:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
04:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15625 and previous config saved to /var/cache/conftool/dbconfig/20210429-043857-marostegui.json
04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1156 to dbctl T258361', diff saved to https://phabricator.wikimedia.org/P15624 and previous config saved to /var/cache/conftool/dbconfig/20210429-043812-marostegui.json
04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for reimage', diff saved to https://phabricator.wikimedia.org/P15623 and previous config saved to /var/cache/conftool/dbconfig/20210429-042757-marostegui.json
02:59 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job (duration: 00m 06s)
02:59 milimetric@deploy1002: Started deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job
02:58 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b]: Hotfix for referrer job (duration: 14m 40s)
02:44 milimetric@deploy1002: Started deploy [analytics/refinery@740226b]: Hotfix for referrer job
01:44 krinkle@deploy1002: Synchronized wmf-config/mc.php: I5869b3c3ba4a (duration: 01m 08s)
01:23 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
01:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
01:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
01:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
01:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
01:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
01:19 ryankemper: T280382 Aborted data transfer; `wdqs2007` is hosed (see https://phabricator.wikimedia.org/T281437)
01:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
00:40 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/specials/pagers/ImageListPager.php: T281405 (duration: 01m 08s)
00:11 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
00:06 ryankemper: T280382 `wdqs1013.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`

2021-04-28

23:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
23:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
23:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
23:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
23:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
23:06 dpifke@deploy1002: Finished deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886 (duration: 00m 05s)
23:06 dpifke@deploy1002: Started deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886
22:44 dwisehaupt: civiproxy revision changed to 99cecb924a - initial rollout of code for testing
22:26 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
22:26 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
22:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
22:18 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
22:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
22:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
22:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
21:49 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
21:46 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
21:44 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
21:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
21:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
21:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:39 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
21:38 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
21:37 ryankemper: T280382 `wdqs2007` is reachable again; glancing at `/srv/wdqs` its `wikidata.jnl` is `839G` when it should be `975G` so I'll re-do the wikidata journal transfer
21:32 ryankemper: T280382 [WDQS] `wdqs2007` ssh is unreachable; power cycling via `racadm>>racadm serveraction powercycle`
21:24 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (previous reimage timed out, instance appears to have rebooted)
21:07 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
21:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
21:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
20:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:57 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
19:13 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 refs T278347 (duration: 01m 07s)
19:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3 refs T278347
18:21 legoktm: added mvolz as listadmin for services@ and reset admin pw (T278516)
17:12 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Wikibase/client/includes/DataAccess/Scribunto/WikibaseLanguageIndependentLuaBindings.php: b392dba: Fix incorrect ItemId typehint in Lua bindings (T281361) (duration: 01m 09s)
16:52 papaul: powerdown logstash2034 for relocation
16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
16:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
16:27 pt1979@cumin2001: START - Cookbook sre.dns.netbox
16:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
16:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
16:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
16:21 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
16:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
16:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
15:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
15:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:20 jayme@cumin1001: START - Cookbook sre.dns.netbox
15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts conf[2001-2003].codfw.wmnet
15:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
15:03 pt1979@cumin2001: START - Cookbook sre.dns.netbox
15:00 moritzm: imported python-poolcounter 0.0.2-1+deb11u1 to apt.wikimedia.org T275873
14:53 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts conf[2001-2003].codfw.wmnet
14:44 moritzm: imported gitlab-ce 13.9.7-ce.0 to apt.wikimedia.org
14:40 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d] (duration: 04m 59s)
14:35 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d]
14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d] (duration: 00m 06s)
14:34 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d]
14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 03m 07s)
14:32 moritzm: installing iproute2 updates from buster point release
14:31 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
14:30 milimetric@deploy1002: deploy aborted: - (duration: 00m 00s)
14:30 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: -
14:30 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 12m 31s)
14:26 moritzm: installing net-snmp updates from buster point release
14:17 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
13:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
13:15 jayme: restarting pybal on lvs5001,lvs4005,lvs2007 - T271573
13:14 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 3.17.0-wmf.1"
13:10 jayme: restarting pybal on lvs5002,lvs4006,lvs2008 - T271573
13:04 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
13:03 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
13:03 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
13:02 moritzm: upgrading deployment servers to PHP 7.4.32
12:55 moritzm: upgrading snapshot hosts to PHP 7.4.32
12:48 jayme: restarting pybal on lvs2009 - T271573
12:45 moritzm: upgrading labweb to PHP 7.4.32
12:43 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
12:42 jayme: restarting pybal on lvs5003,lvs4007 - T271573
12:39 jayme: restarting pybal on lvs2010 - T271573
12:36 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
12:28 apergos: manually edited /srv/deployment/dumps/dumps-cache/config on snapshots1011,12,13 to change deploy1001 to deploy1002 (where did it get the old value from? these are new installs!)
12:16 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
12:15 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
11:53 jayme: switching SRV record _etcd._tcp to new etcd cluster (for codfw, eqsin, ulsfo)
11:22 Urbanecm: EU B&C window done
11:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: 8d0ae5e: Separate reference preview settings in beta & non-beta (T281235) (duration: 01m 08s)
11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ddbc378: Enable partial action blocks on testwiki (T280528) (duration: 01m 07s)
11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
11:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
11:03 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
11:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
10:44 jbond42: updated the check-raid nrpe script to python3
09:40 moritzm: restarting Tomcat on idp-test1001 to pick up Java security updates
09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15618 and previous config saved to /var/cache/conftool/dbconfig/20210428-092103-root.json
09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1001.wikimedia.org
09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint1001.wikimedia.org
09:09 moritzm: restarting jenkins* on releases to pick up Java security updates
09:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2001.wikimedia.org
09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15617 and previous config saved to /var/cache/conftool/dbconfig/20210428-090559-root.json
08:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org
08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15616 and previous config saved to /var/cache/conftool/dbconfig/20210428-085056-root.json
08:42 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: 96ad0d4: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 01m 08s)
08:41 urbanecm@deploy1002: sync-file aborted: 96ad0d4: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 00m 02s)
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15615 and previous config saved to /var/cache/conftool/dbconfig/20210428-083625-marostegui.json
08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15614 and previous config saved to /var/cache/conftool/dbconfig/20210428-083552-root.json
08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15613 and previous config saved to /var/cache/conftool/dbconfig/20210428-083458-root.json
08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15612 and previous config saved to /var/cache/conftool/dbconfig/20210428-082625-root.json
08:25 effie: update php7.2 on jobrunners and parsoid servers && rolling php7.2-fpm restarts
08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15611 and previous config saved to /var/cache/conftool/dbconfig/20210428-081121-root.json
07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15610 and previous config saved to /var/cache/conftool/dbconfig/20210428-075618-root.json
07:52 effie: update php7.2 on api servers && rolling php7.2-fpm restarts
07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15609 and previous config saved to /var/cache/conftool/dbconfig/20210428-074114-root.json
07:40 marostegui: Deploy schema change on db1098:3316 and db1098:3316 T266486 T268392 T273360
07:27 effie: update php7.2 on appservers && rolling php7.2-fpm restarts
07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 for schema change and kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15608 and previous config saved to /var/cache/conftool/dbconfig/20210428-072609-marostegui.json
07:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:14 elukey@cumin1001: START - Cookbook sre.dns.netbox
07:12 elukey: add AAAA record for kafka-main200[3,4,5].codfw.wmnet
07:10 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:05 elukey@cumin1001: START - Cookbook sre.dns.netbox
07:04 elukey: add AAAA record for kafka-main2002.codfw.wmnet
07:03 marostegui: Deploy schema change on db2089:3316 and db1098:3316 T266486 T268392 T273360
06:26 legoktm: created mailman3 superusers for Administrator (noc@), Ladsgroup and Legoktm
06:23 legoktm: legoktm@lists1001:~$ sudo mailman-web set_default_site --name lists.wikimedia.org --domain lists.wikimedia.org
06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15607 and previous config saved to /var/cache/conftool/dbconfig/20210428-061426-root.json
06:00 marostegui: Stop MySQL on db2096 (x1 codfw) T281135
05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15606 and previous config saved to /var/cache/conftool/dbconfig/20210428-055922-root.json
05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1167 in s8 T258361', diff saved to https://phabricator.wikimedia.org/P15605 and previous config saved to /var/cache/conftool/dbconfig/20210428-055144-marostegui.json
05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15604 and previous config saved to /var/cache/conftool/dbconfig/20210428-054419-root.json
05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15603 and previous config saved to /var/cache/conftool/dbconfig/20210428-052915-root.json
05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P15602 and previous config saved to /var/cache/conftool/dbconfig/20210428-051526-marostegui.json
05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 (old s1 master) for schema change', diff saved to https://phabricator.wikimedia.org/P15601 and previous config saved to /var/cache/conftool/dbconfig/20210428-050754-marostegui.json
05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1163 to s1 master and remove read-only from s1 T278214', diff saved to https://phabricator.wikimedia.org/P15600 and previous config saved to /var/cache/conftool/dbconfig/20210428-050138-marostegui.json
05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s1 as read-only for maintenance T278214', diff saved to https://phabricator.wikimedia.org/P15599 and previous config saved to /var/cache/conftool/dbconfig/20210428-050041-marostegui.json
05:00 marostegui: Starting s1 eqiad failover from db1083 to db1163 - T278214
04:14 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
04:14 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
04:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
04:08 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
04:08 marostegui: Start replication changes, connect everything to db1163 T278214
04:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1163 with weight 0 before the switchover T278214', diff saved to https://phabricator.wikimedia.org/P15598 and previous config saved to /var/cache/conftool/dbconfig/20210428-040718-marostegui.json
03:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
03:51 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
03:49 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2007.codfw.wmnet
03:48 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1013.eqiad.wmnet
03:33 ryankemper: `sudo systemctl restart wdqs-blazegraph` on `wdqs1012` to clear the `WDQS SPARQL` warning
03:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2007.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
03:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
02:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:28 robh@cumin1001: START - Cookbook sre.dns.netbox
01:06 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:00 robh@cumin1001: START - Cookbook sre.dns.netbox
00:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
00:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE

2021-04-27

23:58 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
23:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE
23:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
23:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
23:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
23:52 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
23:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
21:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2005-2006].codfw.wmnet
20:55 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2005-2006].codfw.wmnet
20:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2003-2004].codfw.wmnet
20:42 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2003-2004].codfw.wmnet
20:32 bblack: re-pooling codfw public traffic - T279457
20:11 jhuneidi@deploy1002: Synchronized php-1.37.0-wmf.3/includes/rcfeed/IRCColourfulRCFeedFormatter.php: Backport rcfeed: Remove reference assignment (T281226) to 1.37.0-wmf.3 (duration: 01m 12s)
20:08 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
20:06 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
19:44 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
19:35 papaul: powerdown ms-backup2001 for maintenance
19:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
19:07 papaul: powerdown logstash2035 for maintenance
19:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
19:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1003.eqiad.wmnet
18:50 mutante: people1003 - destroying VM and recreating again from scratch to test if issue of no console and no access is repeatable
18:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1003.eqiad.wmnet
18:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
18:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
18:33 mutante: people1003 - rebooting, trying to get new VM to work
18:33 Urbanecm: Morning B&C window done
18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 91a85f2: ac770bf: Enable language in header for office and testwiki users (T280526) (duration: 01m 19s)
18:32 bblack: lvs2009 - restart pybal + re-run puppet agent - T279457
18:23 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:20 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[56].codfw.wmnet
18:20 bblack: cp203[56] - repooling in etcd - T279457
18:19 robh@cumin1001: START - Cookbook sre.dns.netbox
18:17 robh@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
18:17 robh@cumin1001: START - Cookbook sre.dns.netbox
18:16 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:12 robh@cumin1001: START - Cookbook sre.dns.netbox
18:11 bblack: dns2001 - restarting bird to repool, then re-enabling puppet - T279457
18:04 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
18:02 ejegg: update payments-wiki from 9a4eef1375 to 44570561f2
18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
17:58 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
17:34 papaul: powerdown moss-fe2001 for maintenance
17:32 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
17:29 robh@cumin1001: START - Cookbook sre.dns.netbox
17:25 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
17:23 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
17:21 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
17:19 ryankemper: T281215 Banned `elastic2043` from codfw cirrussearch cluster
17:16 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
17:14 papaul: powerdown kafka-logging2003 for maintenance
17:14 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
17:09 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
17:07 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
17:04 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
16:52 papaul: powerdown elastic2045 for maintenance
16:49 papaul: powerdown ms-be2042 for maintenance
16:39 dcaro: reprepro updating packages on thirdparty/ceph-nautilus-buster
16:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 39 hosts with reason: upgrading openstack
16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 39 hosts with reason: upgrading openstack
16:22 effie: upgrading scap 3.17.1-1 on mediawiki canaries - T279695
16:18 effie: uploading scap_3.17.1-1
16:18 effie: uploading cap_3.17.1-1
15:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1026.eqiad.wmnet
14:48 moritzm: installing file/libmagic updates from buster point release
14:47 bblack: lvs2009 - disable puppet + stop pybal (internal services will move to lvs2010, please avoid LVS service definition changes for now!) - T279457
14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2003.codfw.wmnet
14:36 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[56].codfw.wmnet
14:36 bblack: cp203[56] - depool all etcd services via confctl - T279457
14:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2003.codfw.wmnet
14:33 bblack: dns2001 - depooling for T279457 (disable puppet + stop bird)
14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2002.codfw.wmnet
14:31 moritzm: installing imagemagick security updates
14:28 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2002.codfw.wmnet
14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
14:23 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
14:20 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
14:19 moritzm: installing xen security updates
14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
14:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
14:09 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
14:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 105 hosts with reason: upgrading openstack
14:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 105 hosts with reason: upgrading openstack
14:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 9 hosts with reason: upgrading openstack
14:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 9 hosts with reason: upgrading openstack
13:58 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
13:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
13:55 moritzm: imported jenkins 2.277.3 to thirdparty/ci
13:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
13:48 moritzm: uploaded openjdk-8 8u292-b10-0~deb10u1 (buster forward port of latest Java 8 security release)
13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
13:45 akosiaris: switchover api-gateway, changeprop, cpjobqueue to use the new redis cluster servers (rdb2007-rdb2010)
13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
13:30 hashar: Upgrading CI Jenkins from 2.263.3 to 2.277.2
13:23 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
13:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1020-1026].eqiad.wmnet
13:19 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
13:13 liw@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.3
13:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/includes/Config/WikiPageConfigValidation.php: fe2a042: WikiPageConfigValidation: Mentor lists and help desk can be null (T281229) (duration: 01m 06s)
13:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
13:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
13:06 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1020-1026].eqiad.wmnet
13:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be1019.eqiad.wmnet
12:55 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be1019.eqiad.wmnet
12:46 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "URGENT: Disable GlobalUsage" (T281242) (duration: 01m 08s)
12:44 hashar: Restarted CI Jenkins for plugins upgrade
12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15592 and previous config saved to /var/cache/conftool/dbconfig/20210427-122619-root.json
12:20 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GlobalUsage: Backport: Avoid reading primary unless absolutely necessary (T281238) (duration: 01m 09s)
12:12 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GlobalUsage: Backport: Avoid reading primary unless absolutely necessary (T281238) (duration: 01m 09s)
12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15591 and previous config saved to /var/cache/conftool/dbconfig/20210427-121115-root.json
12:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: T281045
12:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: T281045
11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15590 and previous config saved to /var/cache/conftool/dbconfig/20210427-115612-root.json
11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15589 and previous config saved to /var/cache/conftool/dbconfig/20210427-114108-root.json
11:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
11:30 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Remove RW from commonswiki', diff saved to https://phabricator.wikimedia.org/P15588 and previous config saved to /var/cache/conftool/dbconfig/20210427-111016-marostegui.json
11:09 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Disable GlobalUsage (duration: 01m 08s)
10:40 volans@cumin1001: dbctl commit (dc=all): 'S4 RO, outage', diff saved to https://phabricator.wikimedia.org/P15585 and previous config saved to /var/cache/conftool/dbconfig/20210427-104057-volans.json
10:18 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836
10:06 XioNoX: standardize management routers ACLs with Capirca - mr1-eqiad (last one)
10:01 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 02m 16s)
09:59 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
09:56 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 00m 22s)
09:56 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P15584 and previous config saved to /var/cache/conftool/dbconfig/20210427-093536-marostegui.json
09:35 XioNoX: standardize management routers ACLs with Capirca - mr1-eqsin
09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15583 and previous config saved to /var/cache/conftool/dbconfig/20210427-093501-root.json
09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
09:33 moritzm: rolling restart of elastic in relforge* to pick up Java updates
09:32 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15582 and previous config saved to /var/cache/conftool/dbconfig/20210427-091957-root.json
09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
09:17 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
09:16 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host rdb2010.codfw.wmnet
09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
09:16 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
09:11 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
09:11 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
09:09 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
09:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
09:06 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
09:05 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
09:05 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15581 and previous config saved to /var/cache/conftool/dbconfig/20210427-090454-root.json
09:04 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
09:04 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
09:03 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
09:01 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15580 and previous config saved to /var/cache/conftool/dbconfig/20210427-084950-root.json
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P15579 and previous config saved to /var/cache/conftool/dbconfig/20210427-084651-marostegui.json
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15578 and previous config saved to /var/cache/conftool/dbconfig/20210427-084630-root.json
08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114 into main and api', diff saved to https://phabricator.wikimedia.org/P15577 and previous config saved to /var/cache/conftool/dbconfig/20210427-083910-marostegui.json
08:36 XioNoX: standardize management routers ACLs with Capirca
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15576 and previous config saved to /var/cache/conftool/dbconfig/20210427-083145-marostegui.json
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15575 and previous config saved to /var/cache/conftool/dbconfig/20210427-083126-root.json
08:24 hashar: Restarting CI Jenkins for plugins upgrade
08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15574 and previous config saved to /var/cache/conftool/dbconfig/20210427-081911-marostegui.json
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 100%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15573 and previous config saved to /var/cache/conftool/dbconfig/20210427-081846-root.json
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15572 and previous config saved to /var/cache/conftool/dbconfig/20210427-081623-root.json
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15571 and previous config saved to /var/cache/conftool/dbconfig/20210427-081325-root.json
08:12 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
08:11 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
08:10 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
08:10 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
08:08 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
08:03 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 90%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15570 and previous config saved to /var/cache/conftool/dbconfig/20210427-080342-root.json
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15569 and previous config saved to /var/cache/conftool/dbconfig/20210427-080119-root.json
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15568 and previous config saved to /var/cache/conftool/dbconfig/20210427-075822-root.json
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P15567 and previous config saved to /var/cache/conftool/dbconfig/20210427-075759-marostegui.json
07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15566 and previous config saved to /var/cache/conftool/dbconfig/20210427-075738-root.json
07:52 liw@deploy1002: Pruned MediaWiki: 1.36.0-wmf.38 (duration: 03m 17s)
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 80%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15565 and previous config saved to /var/cache/conftool/dbconfig/20210427-074839-root.json
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15564 and previous config saved to /var/cache/conftool/dbconfig/20210427-074318-root.json
07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15563 and previous config saved to /var/cache/conftool/dbconfig/20210427-074234-root.json
07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 75%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15562 and previous config saved to /var/cache/conftool/dbconfig/20210427-073335-root.json
07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15561 and previous config saved to /var/cache/conftool/dbconfig/20210427-072814-root.json
07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15560 and previous config saved to /var/cache/conftool/dbconfig/20210427-072731-root.json
07:26 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836
07:24 liw@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.3 (duration: 30m 54s)
07:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
07:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
07:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
07:19 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 60%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15559 and previous config saved to /var/cache/conftool/dbconfig/20210427-071831-root.json
07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15558 and previous config saved to /var/cache/conftool/dbconfig/20210427-071227-root.json
07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 50%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15557 and previous config saved to /var/cache/conftool/dbconfig/20210427-070328-root.json
06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 for schema change', diff saved to https://phabricator.wikimedia.org/P15556 and previous config saved to /var/cache/conftool/dbconfig/20210427-065628-marostegui.json
06:55 elukey: upgrade mariadb to 10.4.18-1 + reboot on db1108 - T279281
06:54 liw@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.3
06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 40%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15555 and previous config saved to /var/cache/conftool/dbconfig/20210427-064824-root.json
06:37 liw: version 1.37.0-wmf.3 was branched at 20ab303 for T278347
06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 30%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15554 and previous config saved to /var/cache/conftool/dbconfig/20210427-063320-root.json
06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 25%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15553 and previous config saved to /var/cache/conftool/dbconfig/20210427-061817-root.json
06:11 elukey: powercycle elastic2043 - no ssh, no tty remote console available
06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 20%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15552 and previous config saved to /var/cache/conftool/dbconfig/20210427-060313-root.json
05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 15%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15551 and previous config saved to /var/cache/conftool/dbconfig/20210427-054809-root.json
05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 10%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15550 and previous config saved to /var/cache/conftool/dbconfig/20210427-053306-root.json
05:30 XioNoX: push pfw fw policies - T281137
05:27 legoktm: imported hyperkitty_1.3.4-2~bpo10+2 to apt.wm.o (T281213)
05:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15549 and previous config saved to /var/cache/conftool/dbconfig/20210427-052236-root.json
05:21 marostegui: Stop mysql on db1087 to clone db1167 (lag will appear on wikidata on wikireplicas) T258361
05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1114 temporarily as db1087 will be depooled', diff saved to https://phabricator.wikimedia.org/P15547 and previous config saved to /var/cache/conftool/dbconfig/20210427-052026-marostegui.json
05:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 5%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15546 and previous config saved to /var/cache/conftool/dbconfig/20210427-051802-root.json
05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 T258361', diff saved to https://phabricator.wikimedia.org/P15545 and previous config saved to /var/cache/conftool/dbconfig/20210427-050826-marostegui.json
05:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15544 and previous config saved to /var/cache/conftool/dbconfig/20210427-050732-root.json
05:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1077.eqiad.wmnet
04:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1077.eqiad.wmnet
04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15543 and previous config saved to /var/cache/conftool/dbconfig/20210427-045229-root.json
04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 T258361', diff saved to https://phabricator.wikimedia.org/P15541 and previous config saved to /var/cache/conftool/dbconfig/20210427-044609-marostegui.json
04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 to dbctl, depooled, T258361', diff saved to https://phabricator.wikimedia.org/P15540 and previous config saved to /var/cache/conftool/dbconfig/20210427-044520-marostegui.json
04:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15539 and previous config saved to /var/cache/conftool/dbconfig/20210427-043725-root.json
04:25 legoktm: upgrading lists-next.wikimedia.org to mailman3-from-bullseye (T280887)
04:19 marostegui: Set phabricator on read only T279625
03:37 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
03:36 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@08ad17a]: 0.3.70 (duration: 08m 18s)
03:28 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.70` on canary `wdqs1003`; proceeding to rest of fleet
03:28 ryankemper@deploy1002: Started deploy [wdqs/wdqs@08ad17a]: 0.3.70
03:27 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.70`. Pre-deploy tests passing on canary `wdqs1003`
03:17 ryankemper: T280382 `wdqs1006` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to raid0: `/dev/md2 2.6T 998G 1.5T 40% /srv`
02:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
01:29 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph --task-id T280382` on `ryankemper@cumin1001` tmux session `reimage`
01:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
01:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
01:21 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer

2021-04-26

23:28 mutante: renewing TLS cert for peopleweb.discovery.wmnet, adding *3 hosts
23:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
22:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
22:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
22:11 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1006.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
20:48 twentyafterfour: restarting php-fpm on phab1001 to deploy phabricator hotfix d238db8
20:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
20:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1003.eqiad.wmnet
20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts planet1003.eqiad.wmnet
19:45 legoktm: uploaded python3-falcon, python3-mimeparse, python3-mujson, openstack-pkg-tools to mailman3 component on apt.wm.o
18:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
18:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
18:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
18:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
18:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2d16f62: elwiki: Update Growth experiments configuration (T280172) (duration: 00m 58s)
18:06 urbanecm@deploy1002: Synchronized multiversion/MWScript.php: 5ace4e1: Fix error message if MWScript.php is run without arguments (duration: 00m 58s)
17:28 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
17:26 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
17:18 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
17:06 legoktm: imported postorius_1.3.4-2~bpo10+2 to apt.wm.o
16:49 mutante: gerrit - restarted apache (hard) to remove time out from gerrit:682502
16:40 mutante: gerrit1001 - reload apache2
16:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1025.eqiad.wmnet
16:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1025.eqiad.wmnet
15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
15:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
15:21 elukey: restart zookeeper on conf2004 to pick up the -javaagent setting for the prometheus exporter
15:06 moritzm: installing jquery security updates on stretch
15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
14:48 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
14:47 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
14:28 moritzm: installing ldap-replica1003/1004
14:03 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
14:03 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15537 and previous config saved to /var/cache/conftool/dbconfig/20210426-133922-root.json
13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15536 and previous config saved to /var/cache/conftool/dbconfig/20210426-133905-root.json
13:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: for zookeeper migration
13:27 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: for zookeeper migration
13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15535 and previous config saved to /var/cache/conftool/dbconfig/20210426-132533-root.json
13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15534 and previous config saved to /var/cache/conftool/dbconfig/20210426-132417-root.json
13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15533 and previous config saved to /var/cache/conftool/dbconfig/20210426-132402-root.json
13:14 moritzm: installing ldap-replica2005/2006
13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15532 and previous config saved to /var/cache/conftool/dbconfig/20210426-131029-root.json
13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15531 and previous config saved to /var/cache/conftool/dbconfig/20210426-130913-root.json
13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15530 and previous config saved to /var/cache/conftool/dbconfig/20210426-130858-root.json
12:57 moritzm: installing gst-plugins-base1.0 security updates
12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15529 and previous config saved to /var/cache/conftool/dbconfig/20210426-125526-root.json
12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15528 and previous config saved to /var/cache/conftool/dbconfig/20210426-125409-root.json
12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15527 and previous config saved to /var/cache/conftool/dbconfig/20210426-125354-root.json
12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15526 and previous config saved to /var/cache/conftool/dbconfig/20210426-124141-marostegui.json
12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15525 and previous config saved to /var/cache/conftool/dbconfig/20210426-124022-root.json
12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15524 and previous config saved to /var/cache/conftool/dbconfig/20210426-123020-marostegui.json
12:28 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
12:27 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
12:24 Amir1: cleaning watchlist of QuickStatementsBot in wikidatawiki
12:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
12:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
12:00 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Enable writes on es4 T279281 (duration: 00m 56s)
11:57 marostegui: Restart es4 primary master - T279281
11:55 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Disable writes on es4 T279281 (duration: 00m 56s)
11:51 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:49 hashar@deploy1002: Finished deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki (duration: 00m 13s)
11:49 hashar@deploy1002: Started deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki
11:46 Urbanecm: EU B&C done
11:45 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/TemplateData/modules/ext.templateDataGenerator.editTemplatePage/Dialog.js: a347517: Fix suggested values not being shown when the params type isnt specified (T280688) (duration: 00m 57s)
11:31 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Set wgPageImagesAPIDefaultLicense to 'any' for wikidata" (duration: 00m 57s)
11:30 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2b5b640: Enable ContentTranslation as a default tool for 11 Wikipedias (T279422) (duration: 00m 57s)
10:58 effie: restarting php-fpm in mw* clusters in codfw to pick up php7.2 update
10:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
10:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
10:38 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1004.wikimedia.org
10:37 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Setup wmgUseFooterCodeOfConductLink for later usage (duration: 00m 57s)
10:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
10:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
10:26 effie: upgrading mw* servers php7.2 in codfw
10:25 marostegui: Deploy schema change on s4 codfw, lag will appear T276292
10:24 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wmgUseFooterTechCodeOfConductLink instead of wmgUseFooterCodeOfConductLink (duration: 00m 57s)
10:24 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1004.wikimedia.org
10:22 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add wmgUseFooterTechCodeOfConductLink (duration: 00m 59s)
10:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1003.wikimedia.org
10:18 moritzm: installing systemd updates from buster 10.9 point release
10:07 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1003.wikimedia.org
10:00 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
09:53 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2006.wikimedia.org
09:42 moritzm: installing clamav security updates on otrs1001
09:38 godog: reboot ms-be1062, kernel backtrace saved
09:26 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
09:26 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2006.wikimedia.org
09:24 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2005.wikimedia.org
09:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
09:15 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
09:13 jayme: imported etcd-mirror_0.0.6-2 to buster-wikimedia
09:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005.wikimedia.org
09:07 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica2005failoid1002.wikimedia.org
09:04 jayme: imported etcd-mirror_0.0.6-1 to buster-wikimedia
08:55 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005failoid1002.wikimedia.org
08:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: f01a6da: GrowthExperiments: Enable community configuration on testwiki (T274520) (duration: 00m 57s)
08:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: 88da822: GrowthExperiments: Do not enable community configuration outside of beta wikis (T274520) (duration: 00m 59s)
08:28 moritzm: update debmonitor to 0.2.9 on remaining hosts T281090
08:13 moritzm: installing lxml security updates on stretch
07:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
07:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
07:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
07:32 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836
07:24 moritzm: installing pear security updates
07:09 moritzm: removed rawdog from bullseye-wikimedia, needs Py2 T280989
06:24 elukey: reboot an-coord1001 to pick up kernel security settings (after reimage)
05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1158 to dbctl, depooled, T258361', diff saved to https://phabricator.wikimedia.org/P15521 and previous config saved to /var/cache/conftool/dbconfig/20210426-054700-marostegui.json
05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
03:43 kart_: Updated cxserver to 2021-04-21-044024-production (T279045)
03:41 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
03:37 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
03:32 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .

2021-04-25

15:23 Amir1: sudo -u list /var/lib/mailman/bin/change_pw -l wikica-l -p $(pwgen -c1 -s 12) (T281066)

2021-04-24

22:24 bstorm: Rebooting labstore1007 from ilo after crash

2021-04-23

21:36 foks: removing 1 file for legal compliance
20:15 mutante: [apt1001:~] $ sudo -i reprepro -C main includedeb bullseye-wikimedia /home/dzahn/rawdog_2.23-2_all.deb (T280989)
19:41 mutante: [apt1001:~] $ sudo -i reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy - copy envoy package from buster to bullseye T280989
19:09 ebernhardson: closing duplicate/wrong cluster indices in cloudelastic
17:02 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:32 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
14:59 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
14:59 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
14:25 moritzm: revert back bullseye image to daily build from last week (to rule out potential reimage issue)
13:33 elukey: roll restart of all thanos-swift proxies to pick up new ML account - T280773
12:50 jbond42: upload new debmonitor-client packages
11:50 moritzm: installing perf updates from Buster 10.9 point release
10:06 moritzm: installing Linux 4.19.181 updates from Buster 10.9 point release (no reboots, just updating the packages)
09:54 moritzm: installing xen security updates
09:49 moritzm: installing xorg-server security updates
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15512 and previous config saved to /var/cache/conftool/dbconfig/20210423-093723-root.json
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15511 and previous config saved to /var/cache/conftool/dbconfig/20210423-092220-root.json
09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15510 and previous config saved to /var/cache/conftool/dbconfig/20210423-090716-root.json
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15509 and previous config saved to /var/cache/conftool/dbconfig/20210423-085212-root.json
08:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
08:21 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1021.eqiad.wmnet
08:13 moritzm: upgrading d-i image for bullseye to RC1 release T275873
08:12 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1021.eqiad.wmnet
08:12 moritzm: upgrading d-i image for bullseye to RC1 release
08:12 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1019.eqiad.wmnet
07:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1019.eqiad.wmnet
07:56 jynus: deleting db1156 s2 database and reloading it from logical backups T280492
07:22 Amir1: removing junk bounced email addresses from yahoo from all mailing lists
05:40 marostegui: Stop db1079 to clone db1158 (lag will appear on s7 on wiki replicas)
05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 to clone db1158 T258361', diff saved to https://phabricator.wikimedia.org/P15506 and previous config saved to /var/cache/conftool/dbconfig/20210423-053907-marostegui.json

2021-04-22

17:26 marostegui: Stop mysql on tendril/dbtree database
16:33 volker-e@deploy1002: Finished deploy [design/style-guide@e914e8a]: Deploy design/style-guide: e914e8a icons: Add 'share' icon (#455) (duration: 00m 06s)
16:32 volker-e@deploy1002: Started deploy [design/style-guide@e914e8a]: Deploy design/style-guide: e914e8a icons: Add 'share' icon (#455)
13:23 marostegui: Tendril and dbtree are up but on a degraded status (slow reponse)
13:19 marostegui: Tendril and dbtree are down at the moment
12:46 Urbanecm: Start server-side upload for 2 video files (T280763, T280524)
12:31 marostegui: Restart mysql on db1115 (tendril/dbtree will fail)
04:55 eileen: civicrm revision changed from 42ca3cf65a to 33a63d5789, config revision is cf07e7ba0b
02:47 krinkle@deploy1002: Finished deploy [integration/docroot@010e445]: (no justification provided) (duration: 00m 09s)
02:47 krinkle@deploy1002: Started deploy [integration/docroot@010e445]: (no justification provided)
01:34 eileen: civicrm revision changed from 35a8dd33ba to 42ca3cf65a, config revision is cf07e7ba0b
00:28 legoktm: legoktm@deneb:/var/cache/pbuilder/aptcache$ sudo rm -rf * # Cleaned up 8GB more
00:27 legoktm: legoktm@deneb:/var/cache/apt/archives$ sudo rm -rf * # cleaned up 6GB
00:03 legoktm: subscribed all list admins to the listadmins@ mailing list (T280716)

2021-04-21

23:58 eileen: tools revision changed from 3d950fffbd to c26a8c0cb6
23:49 legoktm: made myself and Amir1 list admins for the listadmins@lists.wikimedia.org mailing list
20:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1017.eqiad.wmnet
20:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1017.eqiad.wmnet
20:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1016.eqiad.wmnet
20:03 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1016.eqiad.wmnet
19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host planet1003.eqiad.wmnet
19:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:48 robh@cumin1001: START - Cookbook sre.dns.netbox
19:48 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:46 mutante: creating a ganeti VM to test bullseye install
19:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host planet1003.eqiad.wmnet
19:45 bstorm: manually kicking off a run of update-openstack-mirror on sodium to capture an upstream package update
19:15 robh@cumin1001: START - Cookbook sre.dns.netbox
18:46 Urbanecm: Morning B&C done
18:42 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikibaseMediaInfo/: f831d16: Make the logistic regression image search default (T271799) (duration: 00m 58s)
18:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f6d076a: Update $wgGEHomepageNewAccountVariants (T278123) (duration: 00m 58s)
18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1ae5ca5: Set wgGEMentorshipMigrationStage to WRITE_BOTH/READ_NEW everywhere (T279853) (duration: 00m 59s)
18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e252de0: eswiki: Push Growth features out of dark mode (T278235) (duration: 01m 00s)
17:43 jynus: deploy grant changes on m5 backup sources (db1117 and db2078) T278614
15:54 legoktm: T280744: legoktm@lists1001:~$ sudo chmod 644 /etc/aliases
15:15 Urbanecm: urbanecm@mwmaint1002:~$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php # T279853
15:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15503 and previous config saved to /var/cache/conftool/dbconfig/20210421-151526-root.json
15:02 moritzm: installing jquery security updates on buster
15:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15502 and previous config saved to /var/cache/conftool/dbconfig/20210421-150023-root.json
14:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15501 and previous config saved to /var/cache/conftool/dbconfig/20210421-144519-root.json
14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15500 and previous config saved to /var/cache/conftool/dbconfig/20210421-143015-root.json
14:25 jbond42: upload new version of debmonitor-client to apt
13:54 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=fawiki # T279853
13:39 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
13:18 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiki # T279853
13:01 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
12:21 moritzm: installing failoid2002
12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
11:49 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:46 jbond@cumin1001: START - Cookbook sre.dns.netbox
11:32 awight: EU backport window complete
11:31 moritzm: installing failoid1002
11:29 awight@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikimediaEvents: Backport: Send 0 edits userEditCountBucket for anons (T210106) (duration: 00m 59s)
10:41 jbond42: switch debmonitor-client to cfssl (second try)
10:37 jbond42: upload golang-cfssl packages for jessi and stretch
10:33 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid1002.eqiad.wmnet
10:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1002.eqiad.wmnet
10:23 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid1002.eqiad.wmnet
10:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host eventlog1002.eqiad.wmnet
10:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid2002.codfw.wmnet
10:21 hnowlan: rebooting eventlog1002 for kernel update
10:06 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid2002.codfw.wmnet
09:56 jbond42: switch debmonitor-clients to use cfssl
09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15496 and previous config saved to /var/cache/conftool/dbconfig/20210421-093109-root.json
09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15495 and previous config saved to /var/cache/conftool/dbconfig/20210421-091605-root.json
09:08 elukey: upgrade hue on an-tool1009 to 4.9
09:05 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 05s)
09:05 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
09:03 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=mw2280.codfw.wmnet,service=nginx
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15494 and previous config saved to /var/cache/conftool/dbconfig/20210421-090100-root.json
09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet
08:58 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 05s)
08:58 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
08:58 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 05s)
08:58 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
08:56 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 05s)
08:55 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
08:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1009.eqiad.wmnet
08:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1008.eqiad.wmnet
08:53 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 05s)
08:52 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
08:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1007.eqiad.wmnet
08:50 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987 (duration: 00m 10s)
08:50 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - T266987
08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1008.eqiad.wmnet
08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1007.eqiad.wmnet
08:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1005.eqiad.wmnet
08:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1006.eqiad.wmnet
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15493 and previous config saved to /var/cache/conftool/dbconfig/20210421-084555-root.json
08:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1006.eqiad.wmnet
08:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1005.eqiad.wmnet
08:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1004.eqiad.wmnet
08:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
08:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
08:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
08:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2008.codfw.wmnet
08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
07:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
07:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
07:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
07:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
07:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
07:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
07:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE
07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
07:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE
07:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
07:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
07:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
06:49 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
06:49 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
06:42 elukey: upload hue_4.9.0-2+deb10u1 to buster-wikimedia
06:11 marostegui: Stop MySQL on db1074 to clone db1156 (there will be lag in s2 in wiki replicas) T258361
06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to clone db1156 T258361', diff saved to https://phabricator.wikimedia.org/P15491 and previous config saved to /var/cache/conftool/dbconfig/20210421-061019-marostegui.json
06:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2082.codfw.wmnet with reason: REIMAGE
06:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2077.codfw.wmnet with reason: REIMAGE
06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2082.codfw.wmnet with reason: REIMAGE
06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2077.codfw.wmnet with reason: REIMAGE
05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1086.eqiad.wmnet
05:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1086.eqiad.wmnet
00:38 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
00:36 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
00:15 ryankemper: [WDQS] Pooled `wdqs1003`
00:14 ryankemper: [WDQS] Pooled `wdqs2008`
00:07 ryankemper: `sudo -i wmf-auto-reimage-host -p T280382 wdqs1006.eqiad.wmnet`
00:04 ryankemper: [WDQS] pooled `wdqs1004`

2021-04-20

23:46 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 73544cc: urwiki: Enable Growth team features in stealth mode (T280067) (duration: 00m 57s)
23:44 urbanecm@deploy1002: Synchronized wmf-config/config/urwiki.yaml: 73544cc: urwiki: Enable Growth team features in stealth mode (T280067) (duration: 00m 57s)
23:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 73544cc: urwiki: Enable Growth team features in stealth mode (T280067) (duration: 00m 58s)
23:38 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=urwiki GrowthExperiments # T280067
23:38 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 314367b: elwiki: Enable Growth team features in stealth mode (T280172; 3/3) (duration: 00m 56s)
23:36 urbanecm@deploy1002: Synchronized wmf-config/config/elwiki.yaml: 314367b: elwiki: Enable Growth team features in stealth mode (T280172; 2/3) (duration: 00m 57s)
23:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 314367b: elwiki: Enable Growth team features in stealth mode (T280172; 1/3) (duration: 00m 57s)
23:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
23:32 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=elwiki GrowthExperiments # T280172
23:31 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 425d77b: cawiki: Enable Growth team features in stealth mode (T280673; 3/3) (duration: 00m 57s)
23:28 Urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist growthexperiments sql.php --cluster=extension1 /srv/mediawiki/php-1.37.0-wmf.1/extensions/GrowthExperiments/maintenance/schemas/mysql/growthexperiments_mentee_data.sql # T279587
23:28 urbanecm@deploy1002: Synchronized wmf-config/config/cawiki.yaml: 425d77b: cawiki: Enable Growth team features in stealth mode (T280673; 2/3) (duration: 00m 57s)
23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 425d77b: cawiki: Enable Growth team features in stealth mode (T280673; 1/3) (duration: 00m 57s)
23:24 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=cawiki GrowthExperiments # T280673
23:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on furud.codfw.wmnet with reason: REIMAGE
23:09 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on furud.codfw.wmnet with reason: REIMAGE
23:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flerovium.eqiad.wmnet with reason: REIMAGE
23:03 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flerovium.eqiad.wmnet with reason: REIMAGE
22:14 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:10 robh@cumin1001: START - Cookbook sre.dns.netbox
21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
20:52 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ruwiki # T279853
20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1020.wikimedia.org
20:41 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=viwiki # T279853
20:36 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1020.wikimedia.org
20:36 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ukwiki # T279853
20:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd[1017-1019].wikimedia.org
20:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=tewiki # T279853
20:32 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=svwiki # T279853
20:30 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=srwiki # T279853
20:29 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=rowiki # T279853
20:27 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hywiki # T279853
20:22 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=huwiki # T279853
20:21 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hrwiki # T279853
20:18 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hewiki # T279853
20:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiktionary # T279853
20:16 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd[1017-1019].wikimedia.org
20:15 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=euwiki # T279853
20:13 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=bnwiki # T279853
20:08 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:03 robh@cumin1001: START - Cookbook sre.dns.netbox
19:58 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
19:28 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1016.wikimedia.org
18:34 Urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=idwiki # T279853
18:33 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1016.wikimedia.org
18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/: 4d1969d: 1fbb8e9: MentorStore: Set wasPosted to true in command line mode (T275773) (duration: 00m 59s)
17:26 XioNoX: boot cr1-codfw:fpc1 - T277341
17:16 papaul: Adding a MPC7E to cr1-codfw
16:32 arturo: merging change to core route firewall https://gerrit.wikimedia.org/r/c/operations/homer/public/+/681316 (T272587)
16:15 andrewbogott: updating core routers config with https://gerrit.wikimedia.org/r/c/operations/homer/public/+/681315
15:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host eventlog1003.eqiad.wmnet
15:22 urbanecm@deploy1002: Synchronized docroot/noc/conf/debug.json: dc6647b: remove mwdebug1003 from list of debug servers (T267248) (duration: 00m 58s)
15:20 urbanecm@deploy1002: Synchronized debug.json: dc6647b: remove mwdebug1003 from list of debug servers (T267248) (duration: 00m 57s)
15:14 hnowlan@cumin1001: START - Cookbook sre.ganeti.makevm for new host eventlog1003.eqiad.wmnet
15:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
14:59 volker-e@deploy1002: Finished deploy [design/style-guide@c4d8314]: Deploy design/style-guide: c4d8314 “Components”: Fix “Buttons” active states (#460) (duration: 00m 07s)
14:58 volker-e@deploy1002: Started deploy [design/style-guide@c4d8314]: Deploy design/style-guide: c4d8314 “Components”: Fix “Buttons” active states (#460)
14:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
14:38 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
14:37 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
14:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
14:34 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
14:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
14:30 moritzm: installing exim updates from Buster point release
14:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
14:25 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc6767a] (duration: 04m 56s)
14:25 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
14:24 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:22 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:20 otto@deploy1002: Started deploy [analytics/refinery@fc6767a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc6767a]
14:18 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
14:18 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
14:17 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a] (thin): Regular analytics weekly train THIN [analytics/refinery@fc6767a] (duration: 00m 07s)
14:17 otto@deploy1002: Started deploy [analytics/refinery@fc6767a] (thin): Regular analytics weekly train THIN [analytics/refinery@fc6767a]
14:16 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry [analytics/refinery@fc6767a] (duration: 00m 03s)
14:16 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry [analytics/refinery@fc6767a]
14:16 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
14:16 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
14:16 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a] (duration: 00m 03s)
14:15 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a]
14:15 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a] (duration: 00m 03s)
14:14 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a]
14:14 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train [analytics/refinery@fc6767a] (duration: 14m 50s)
14:11 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
14:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
14:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
14:06 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
14:06 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
14:04 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
14:01 jiji@cumin1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet,cluster=videoscaler
13:59 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train [analytics/refinery@fc6767a]
13:42 moritzm: upgrading mw1276 to PHP 7.2.34
13:40 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
13:40 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 00m 13s)
13:40 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
13:38 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
13:36 otto@deploy1002: Finished deploy [analytics/aqs/deploy@ad170d4]: deploy Refactor pageviews per-article endpoint (duration: 05m 17s)
13:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
13:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
13:33 moritzm: upgrading mw1261 to PHP 7.2.34
13:31 otto@deploy1002: Started deploy [analytics/aqs/deploy@ad170d4]: deploy Refactor pageviews per-article endpoint
13:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
13:26 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
13:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
13:22 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
13:21 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
13:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
13:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
13:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
13:13 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/includes/actions/RollbackAction.php: ccbfcf2: Do not mark rollbacks as bot edits (T280655) (duration: 00m 57s)
13:12 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
13:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
13:09 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
13:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
13:07 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2076.codfw.wmnet with reason: REIMAGE
13:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
13:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2076.codfw.wmnet with reason: REIMAGE
12:58 moritzm: reimaging cumin2002 to bullseye T276589
12:55 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
12:54 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
12:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
12:51 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
12:49 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
12:47 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
12:42 moritzm: uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf1 to component/php72
12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 to check its tables T280492', diff saved to https://phabricator.wikimedia.org/P15483 and previous config saved to /var/cache/conftool/dbconfig/20210420-124118-marostegui.json
12:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5003.eqsin.wmnet
12:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
12:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
12:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
12:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5003.eqsin.wmnet
12:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
12:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
12:18 CFisch_WMDE: European mid-day backport window done
12:05 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add NS_PROJECT alias for azwiki (T280577) (duration: 00m 57s)
12:04 moritzm: drain ganeti5003
11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
11:54 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/includes/CommentFormatter.php: Backport: CommentFormatter: Add ext-discussiontools-section class instead of overwriting (T280433) (duration: 00m 57s)
11:47 moritzm: failover ganeti master in eqsin to ganeti5001
11:46 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
11:38 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/VisualEditor/modules/ve-mw/ui/pages/ve.ui.MWParameterPage.js: Backport: Add filtering for the suggested values combo box (T271898) (duration: 00m 58s)
11:15 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add default import sources (T214139) (duration: 00m 58s)
11:11 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
11:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
11:07 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
10:49 _joe_: temporary installing some python packages on deploy1002 for testing
10:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5001.eqsin.wmnet
10:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5001.eqsin.wmnet
10:20 moritzm: drain ganeti5001
10:11 hnowlan: opening access to cassandra on new AQS hosts (aqs101*) to analytics-in4 filter
10:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict1001.eqiad.wmnet
10:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host aphlict1001.eqiad.wmnet
09:42 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet
09:42 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet
09:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
09:40 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
09:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
09:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
09:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
09:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
08:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
08:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
08:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
08:50 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
08:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1003.eqiad.wmnet
08:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1003.eqiad.wmnet
08:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1004.eqiad.wmnet
08:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1004.eqiad.wmnet
08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2128.codfw.wmnet with reason: REIMAGE
08:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2004.codfw.wmnet
08:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2128.codfw.wmnet with reason: REIMAGE
08:09 dcaro: reprepro updating thirdparty/ceph-octopus repo
08:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2004.codfw.wmnet
08:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
08:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2003.codfw.wmnet
08:05 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
08:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2003.codfw.wmnet
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1086 from dbctl T278229', diff saved to https://phabricator.wikimedia.org/P15482 and previous config saved to /var/cache/conftool/dbconfig/20210420-075949-marostegui.json
07:38 XioNoX: BGP: prioritize directly connected peers - T280054
07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15480 and previous config saved to /var/cache/conftool/dbconfig/20210420-073808-root.json
07:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
07:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15479 and previous config saved to /var/cache/conftool/dbconfig/20210420-072305-root.json
07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15478 and previous config saved to /var/cache/conftool/dbconfig/20210420-070801-root.json
07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15477 and previous config saved to /var/cache/conftool/dbconfig/20210420-065257-root.json
06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2127.codfw.wmnet with reason: REIMAGE
06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2127.codfw.wmnet with reason: REIMAGE
06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2073.codfw.wmnet with reason: REIMAGE
06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2073.codfw.wmnet with reason: REIMAGE
06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2105.codfw.wmnet with reason: REIMAGE
06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
06:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2105.codfw.wmnet with reason: REIMAGE

2021-04-19

22:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
22:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
22:37 Trey314159: reindexing wikidata on cloudelastic finished/failed (T274200)
22:37 Trey314159: reindexing commons and wikidata on elastic@eqiad finished/failed (T274200)
21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.wikimedia.org with reason: REIMAGE
21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.wikimedia.org with reason: REIMAGE
21:03 sbassett: Deployed security patch for T280226
19:56 dcausse: repool wdqs1005
19:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2004.codfw.wmnet
19:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2004.codfw.wmnet
18:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2003.codfw.wmnet
18:56 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/tests: Factor out rollback logic from WikiPage - /tests (duration: 00m 59s)
18:55 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/maintenance: Factor out rollback logic from WikiPage - /maintenance (duration: 00m 57s)
18:51 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/includes/: Factor out rollback logic from WikiPage - /includes (duration: 01m 01s)
18:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2003.codfw.wmnet
18:47 jiji@cumin1001: conftool action : set/pooled=yes; selector: cluster=thumbor,name=thumbor2001.codfw.wmnet
18:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2002.codfw.wmnet
18:39 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: T274436 Math: Enable RESTBase-less Wikidata math validation (duration: 00m 56s)
18:34 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2002.codfw.wmnet
18:21 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: T249745 [EventBus] Make eventage-main timeout consistent with envoy (duration: 00m 56s)
18:13 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/: 66d137b: Remove <header> tags around headings for compat with MobileFrontend (T280433) (duration: 00m 59s)
18:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2001.codfw.wmnet
18:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/includes/Mentorship/Store/DatabaseMentorStore.php: 0233507: DatabaseMentorStore: Fix deprecation warning in upsert query (T280525) (duration: 00m 57s)
17:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2001.codfw.wmnet
17:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1004.eqiad.wmnet
17:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1004.eqiad.wmnet
17:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1003.eqiad.wmnet
17:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1003.eqiad.wmnet
17:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1002.eqiad.wmnet
16:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1002.eqiad.wmnet
16:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1001.eqiad.wmnet
16:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1001.eqiad.wmnet
16:25 hoo: Updated the Wikidata property suggester with data from the 2021-04-12 JSON dump (with pre-applied T132839 workarounds)
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15474 and previous config saved to /var/cache/conftool/dbconfig/20210419-161134-root.json
15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 90%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15473 and previous config saved to /var/cache/conftool/dbconfig/20210419-155631-root.json
15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 80%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15472 and previous config saved to /var/cache/conftool/dbconfig/20210419-154127-root.json
15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 70%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15471 and previous config saved to /var/cache/conftool/dbconfig/20210419-152623-root.json
15:24 volans: reverted debmonitor-client to 0.2.0-1 on apt.w.o for jessie-wikimedia
15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 60%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15470 and previous config saved to /var/cache/conftool/dbconfig/20210419-151119-root.json
14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15469 and previous config saved to /var/cache/conftool/dbconfig/20210419-145616-root.json
14:53 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename RelatedArticles wmg variables to wg (duration: 00m 56s)
14:53 jbond42: update debmonitor-client - T280484
14:52 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove RelatedArticles extension function and wmg to wg mapping (duration: 00m 56s)
14:48 reedy@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: Use namespaced PoolCounter Client (duration: 00m 57s)
14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 T278229', diff saved to https://phabricator.wikimedia.org/P15468 and previous config saved to /var/cache/conftool/dbconfig/20210419-144422-marostegui.json
14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 40%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15467 and previous config saved to /var/cache/conftool/dbconfig/20210419-144112-root.json
14:41 volans: uploaded debmonitor-client 0.2.8 to apt.w.o for jessie, stretch, buster, bullseye
14:29 hnowlan: imported envoyproxy_1.16.3-1 debs to envoy-future component
14:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 30%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15466 and previous config saved to /var/cache/conftool/dbconfig/20210419-142608-root.json
14:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 20%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15465 and previous config saved to /var/cache/conftool/dbconfig/20210419-141105-root.json
13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 15%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15464 and previous config saved to /var/cache/conftool/dbconfig/20210419-135601-root.json
13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15463 and previous config saved to /var/cache/conftool/dbconfig/20210419-134057-root.json
13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: Slowly pool db1182 for the first time in s2 T275633', diff saved to https://phabricator.wikimedia.org/P15462 and previous config saved to /var/cache/conftool/dbconfig/20210419-132554-root.json
13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15461 and previous config saved to /var/cache/conftool/dbconfig/20210419-131936-marostegui.json
13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15460 and previous config saved to /var/cache/conftool/dbconfig/20210419-131501-marostegui.json
12:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bd07630: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD everywhere (T279853) (duration: 00m 57s)
12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P15459 and previous config saved to /var/cache/conftool/dbconfig/20210419-125600-marostegui.json
12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15458 and previous config saved to /var/cache/conftool/dbconfig/20210419-125407-marostegui.json
12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1182 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15457 and previous config saved to /var/cache/conftool/dbconfig/20210419-125301-marostegui.json
12:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ef0f68e: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_NEW (T279853) (duration: 00m 57s)
12:38 Urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=cswiki # T279853
12:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2126.codfw.wmnet with reason: REIMAGE
12:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3e3cce1: cswiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD (T279853) (duration: 00m 58s)
12:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2126.codfw.wmnet with reason: REIMAGE
12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2072.codfw.wmnet with reason: REIMAGE
12:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2072.codfw.wmnet with reason: REIMAGE
11:39 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
11:37 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
11:33 moritzm: imported debdeploy 0.0.99.13-1+deb11u1 to bullseye-wikimedia T275873
11:27 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki --force # T279853
11:11 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki # T279853
11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 03f8ed8: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD (T279853) (duration: 00m 57s)
11:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable legacy javascript variable for the rest of wikis (T72470) (duration: 00m 57s)
11:02 moritzm: import promethus-rsyslog-exporter for bullseye-wikimedia/main
11:01 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
11:01 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
10:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
10:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
10:34 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
10:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
10:24 hnowlan: imported 1.16.3 into envoy-future
10:22 moritzm: reimaging theemin to bullseye
10:15 dcausse: depooling wdqs1005
10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
10:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
10:05 arturo: aborrero@apt1001:~ $ sudo -i reprepro --component thirdparty/kubeadm-k8s-1-18 update buster-wikimedia
10:04 arturo: aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished (remove old buster-wikimedia|thirdparty/kubeadm-k8s-1-15,16 repos and packages)
09:56 ema: cp3051: varnish-frontend-restart to apply exp policy settings changes starting from empty cache T275809
09:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
09:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
09:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
09:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15454 and previous config saved to /var/cache/conftool/dbconfig/20210419-092251-root.json
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 T280492', diff saved to https://phabricator.wikimedia.org/P15453 and previous config saved to /var/cache/conftool/dbconfig/20210419-092234-marostegui.json
09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15452 and previous config saved to /var/cache/conftool/dbconfig/20210419-091535-root.json
09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 90%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15451 and previous config saved to /var/cache/conftool/dbconfig/20210419-090747-root.json
09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15450 and previous config saved to /var/cache/conftool/dbconfig/20210419-090031-root.json
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 80%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15449 and previous config saved to /var/cache/conftool/dbconfig/20210419-085243-root.json
08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 T272008', diff saved to https://phabricator.wikimedia.org/P15448 and previous config saved to /var/cache/conftool/dbconfig/20210419-084834-marostegui.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15447 and previous config saved to /var/cache/conftool/dbconfig/20210419-084528-root.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 T272008', diff saved to https://phabricator.wikimedia.org/P15446 and previous config saved to /var/cache/conftool/dbconfig/20210419-084523-marostegui.json
08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 70%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15445 and previous config saved to /var/cache/conftool/dbconfig/20210419-083740-root.json
08:35 ema: restart debmonitor-client.service on cp4030, dns5002, an-worker1106 T280484
08:34 marostegui: Testing log
08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15444 and previous config saved to /var/cache/conftool/dbconfig/20210419-083021-root.json
08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15443 and previous config saved to /var/cache/conftool/dbconfig/20210419-083018-root.json
08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 T272008', diff saved to https://phabricator.wikimedia.org/P15442 and previous config saved to /var/cache/conftool/dbconfig/20210419-082559-marostegui.json
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 60%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15441 and previous config saved to /var/cache/conftool/dbconfig/20210419-082236-root.json
08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15440 and previous config saved to /var/cache/conftool/dbconfig/20210419-082000-root.json
08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15439 and previous config saved to /var/cache/conftool/dbconfig/20210419-081517-root.json
08:07 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labstore1004.eqiad.wmnet with reason: Restarting mysql
08:07 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labstore1004.eqiad.wmnet with reason: Restarting mysql
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15438 and previous config saved to /var/cache/conftool/dbconfig/20210419-080732-root.json
08:07 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836
08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15437 and previous config saved to /var/cache/conftool/dbconfig/20210419-080456-root.json
08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15436 and previous config saved to /var/cache/conftool/dbconfig/20210419-080454-root.json
08:03 moritzm: installing python-bleach security updates
08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15435 and previous config saved to /var/cache/conftool/dbconfig/20210419-080013-root.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 40%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15434 and previous config saved to /var/cache/conftool/dbconfig/20210419-075229-root.json
07:51 moritzm: upgrade mwdebug2002 to PHP 7.2.34
07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15433 and previous config saved to /var/cache/conftool/dbconfig/20210419-074953-root.json
07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15432 and previous config saved to /var/cache/conftool/dbconfig/20210419-074950-root.json
07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15431 and previous config saved to /var/cache/conftool/dbconfig/20210419-074510-root.json
07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 T272008', diff saved to https://phabricator.wikimedia.org/P15430 and previous config saved to /var/cache/conftool/dbconfig/20210419-074155-marostegui.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 30%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15429 and previous config saved to /var/cache/conftool/dbconfig/20210419-073725-root.json
07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15428 and previous config saved to /var/cache/conftool/dbconfig/20210419-073449-root.json
07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15427 and previous config saved to /var/cache/conftool/dbconfig/20210419-073446-root.json
07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15426 and previous config saved to /var/cache/conftool/dbconfig/20210419-073425-root.json
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 20%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15425 and previous config saved to /var/cache/conftool/dbconfig/20210419-072221-root.json
07:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
07:19 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15424 and previous config saved to /var/cache/conftool/dbconfig/20210419-071943-root.json
07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15423 and previous config saved to /var/cache/conftool/dbconfig/20210419-071921-root.json
07:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 T272008', diff saved to https://phabricator.wikimedia.org/P15422 and previous config saved to /var/cache/conftool/dbconfig/20210419-071701-marostegui.json
07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 15%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15421 and previous config saved to /var/cache/conftool/dbconfig/20210419-070718-root.json
07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15420 and previous config saved to /var/cache/conftool/dbconfig/20210419-070439-root.json
07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15419 and previous config saved to /var/cache/conftool/dbconfig/20210419-070418-root.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 T272008', diff saved to https://phabricator.wikimedia.org/P15418 and previous config saved to /var/cache/conftool/dbconfig/20210419-070035-marostegui.json
06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15417 and previous config saved to /var/cache/conftool/dbconfig/20210419-065627-root.json
06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Slowly pool db1179 for the first time in s3 T275633', diff saved to https://phabricator.wikimedia.org/P15416 and previous config saved to /var/cache/conftool/dbconfig/20210419-065213-root.json
06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15415 and previous config saved to /var/cache/conftool/dbconfig/20210419-064914-root.json
06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 T272008', diff saved to https://phabricator.wikimedia.org/P15414 and previous config saved to /var/cache/conftool/dbconfig/20210419-064600-marostegui.json
06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15413 and previous config saved to /var/cache/conftool/dbconfig/20210419-064123-root.json
06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15412 and previous config saved to /var/cache/conftool/dbconfig/20210419-062620-root.json
06:17 _joe_: upgrading envoy everywhere in eqiad T280317
06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15411 and previous config saved to /var/cache/conftool/dbconfig/20210419-061116-root.json
06:10 _joe_: upgrading envoy everywhere in codfw T280317
06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15410 and previous config saved to /var/cache/conftool/dbconfig/20210419-060321-marostegui.json
06:01 _joe_: rolling out further envoy upgrades T280317
05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 10%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15409 and previous config saved to /var/cache/conftool/dbconfig/20210419-055613-root.json
05:53 marostegui: Stop sanitarium master on s2 (lag will show up on clouddb* labsdb* hosts) T272008
05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 T272008', diff saved to https://phabricator.wikimedia.org/P15408 and previous config saved to /var/cache/conftool/dbconfig/20210419-055240-marostegui.json
05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P15407 and previous config saved to /var/cache/conftool/dbconfig/20210419-054831-marostegui.json
05:42 marostegui: Stop sanitarium master on s1 (lag will show up on clouddb* labsdb* hosts) T272008
05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 T272008', diff saved to https://phabricator.wikimedia.org/P15406 and previous config saved to /var/cache/conftool/dbconfig/20210419-054158-marostegui.json
05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15405 and previous config saved to /var/cache/conftool/dbconfig/20210419-053730-marostegui.json
05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15404 and previous config saved to /var/cache/conftool/dbconfig/20210419-053127-marostegui.json
05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1179 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15403 and previous config saved to /var/cache/conftool/dbconfig/20210419-053050-marostegui.json
05:05 marostegui: Restart m2 database master T280251

2021-04-18

06:40 Amir1: cleaning watchlist of User:Mr._Ibrahem in wikidatawiki (in main ns only)

2021-04-17

16:16 Amir1: cleaning SuccuBot's watchlist in wikidatawiki
00:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1307.eqiad.wmnet
00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
00:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1402.eqiad.wmnet
00:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1403.eqiad.wmnet
00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1403.eqiad.wmnet
00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1402.eqiad.wmnet
00:14 ryankemper: T267927 `sudo run-puppet-agent` and `sudo pool` on `wdqs2003`
00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
00:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
00:08 ryankemper: T267927 Reload of `wdqs2003` complete
00:07 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1403.eqiad.wmnet with reason: REIMAGE

2021-04-16

23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwdebug1003.eqiad.wmnet
23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1402.eqiad.wmnet with reason: REIMAGE
23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1403.eqiad.wmnet with reason: REIMAGE
23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1402.eqiad.wmnet with reason: REIMAGE
23:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mwdebug1003.eqiad.wmnet
23:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwdebug1003.eqiad.wmnet
23:47 mutante: decom'ing mwdebug1003, stretch VM created in T267248
23:39 mutante: reimaging last 3 remaining stretch appservers with buster, mw1307, mw1402, mw1403
23:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1402-1403].eqiad.wmnet with reason: reimage
23:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1402-1403].eqiad.wmnet with reason: reimage
21:08 ejegg: updated fundraising python tools from ef54260b0d to 3d950fffbd
20:40 Trey314159: reindexing wikidata on cloudelastic... AGAIN (T274200)
17:48 ryankemper: T267927 Transferring from `wdqs2008`->`wdqs2003` to resolve the data corruption on `wdqs2003`
17:47 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
17:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.wikimedia.org with reason: REIMAGE
17:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.wikimedia.org with reason: REIMAGE
17:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.wikimedia.org with reason: REIMAGE
17:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.wikimedia.org with reason: REIMAGE
17:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.wikimedia.org with reason: REIMAGE
17:35 mutante: depooling mwdebug1003 (stretch VM, will be removed), mwdebug1001/1002 (buster) and unchanged
17:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1003.eqiad.wmnet
17:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.wikimedia.org with reason: REIMAGE
17:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.wikimedia.org with reason: REIMAGE
17:31 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.wikimedia.org with reason: REIMAGE
17:03 ryankemper: T267927 Pooled `wdqs1007`, `wdqs2003`, `wdqs1008`, `wdqs2004`
17:00 ryankemper: T267927 Following data transfers complete: `wdqs1004`->`wdqs1007`, `wdqs2001`->`wdqs2003`, `wdqs1003`->`wdqs1008`, `wdqs2008`->`wdqs2004`
17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
16:59 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
16:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
16:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
16:09 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
15:57 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
15:43 urbanecm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
15:43 urbanecm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
15:31 urbanecm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
15:31 urbanecm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
15:22 urbanecm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
14:59 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
14:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev1006.eqiad.wmnet with reason: restarting for kernel update
14:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev1006.eqiad.wmnet with reason: restarting for kernel update
14:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev[1005-1006].eqiad.wmnet with reason: restarting for kernel update
14:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev[1005-1006].eqiad.wmnet with reason: restarting for kernel update
14:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
14:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
14:43 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
14:31 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
14:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2020.codfw.wmnet
14:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
13:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2019.codfw.wmnet
12:59 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
12:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
12:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
12:47 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
12:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
12:41 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
12:37 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
12:25 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
12:22 jayme: updated envoyproxy to 1.15.4-1 on 'A:mw-canary or A:restbase-canary'
11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
11:02 moritzm: imported ferm 2.5.1-1+wmf1 to bullseye-wikimedia/main T275873
11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
10:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
10:49 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
10:44 arturo: merging homer change to cr-eqiad (T279342)
10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
10:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
10:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
10:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
10:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
10:08 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
10:08 jayme: updated envoyproxy to 1.15.4-1 on mw1325.eqiad.wmnet,restbase1026.eqiad.wmnet
10:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
10:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2011.codfw.wmnet
10:03 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
10:00 jayme: updated envoyproxy to 1.15.4-1 on mwdebug1001.eqiad.wmnet
09:57 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2011.codfw.wmnet
09:55 jayme: imported envoyproxy_1.15.4-1 to stretch-wikimedia - T280317
09:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2010.codfw.wmnet
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15384 and previous config saved to /var/cache/conftool/dbconfig/20210416-093446-root.json
09:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2010.codfw.wmnet
09:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2009.codfw.wmnet
09:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2009.codfw.wmnet
09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15383 and previous config saved to /var/cache/conftool/dbconfig/20210416-091942-root.json
09:13 jayme: imported envoyproxy_1.15.4-1 to buster-wikimedia - T280317
09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15380 and previous config saved to /var/cache/conftool/dbconfig/20210416-090438-root.json
08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15374 and previous config saved to /var/cache/conftool/dbconfig/20210416-084935-root.json
08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15373 and previous config saved to /var/cache/conftool/dbconfig/20210416-083431-root.json
07:53 elukey: run reprepro --delete clearvanished on apt1001 to clear all cloudera packages
07:41 ema: cp-upload_ulsfo: rolling varnish-frontend-restart to apply exp policy settings changes starting from empty caches T275809
07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P15372 and previous config saved to /var/cache/conftool/dbconfig/20210416-071936-marostegui.json
06:58 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
06:52 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
06:48 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
06:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
06:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2095.codfw.wmnet with reason: REIMAGE
06:20 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2095.codfw.wmnet with reason: REIMAGE
05:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics-tool1001.eqiad.wmnet
05:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2094.codfw.wmnet with reason: REIMAGE
05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2094.codfw.wmnet with reason: REIMAGE
05:42 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics-tool1001.eqiad.wmnet
03:31 ryankemper: [wdqs] `ryankemper@wdqs1013:~$ sudo systemctl restart wdqs-blazegraph`
03:26 ryankemper: T267927 Pooled `wdqs2001`
03:22 ryankemper: T267927 Pooled `wdqs1006` and `wdqs2002`
03:09 ryankemper: T267927 kicked off next round of `data-transfer`s: `wdqs1004`->`wdqs1007`, `wdqs2001`->`wdqs2003`, `wdqs1003`->`wdqs1008`, `wdqs2008`->`wdqs2004`
03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
03:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
03:05 ryankemper: T267927 Last round of `data-transfer`s finished successfully, proceeding to next round
03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
03:04 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
00:30 Krinkle: Delete old data at doc1001:/srv/doc/cover/PasswordBlacklist (ref T254799)
00:09 jforrester@deploy1002: Finished deploy [integration/docroot@63b6fb6]: Sync with CI updates (no-op) (duration: 00m 08s)
00:09 jforrester@deploy1002: Started deploy [integration/docroot@63b6fb6]: Sync with CI updates (no-op)

2021-04-15

23:37 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/skins/Vector/skin.json: Backport: Adjust floating override (T280260) (duration: 00m 56s)
23:35 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/skins/Vector/resources/skins.vector.styles.legacy/layouts/screen.less: Backport: Adjust floating override (T280260) (duration: 00m 56s)
23:31 jforrester@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: Backport: searchSatisfaction: Default userEditBucket back to 0 edits (T280294) (duration: 00m 57s)
23:17 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Create Draft namespace on itwiki (T280289) (duration: 00m 56s)
23:09 jforrester@deploy1002: Synchronized wmf-config/logos.php: Config: [wikitech] Update logo to mirror the new MediaWiki logo (T279087) (duration: 00m 56s)
23:08 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech-2x.png: Config: [wikitech] Update logo to mirror the new MediaWiki logo (T279087) (duration: 00m 56s)
23:07 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech-1.5x.png: Config: [wikitech] Update logo to mirror the new MediaWiki logo (T279087) (duration: 00m 57s)
23:06 jforrester@deploy1002: Synchronized static/images/project-logos/wikitech.png: Config: [wikitech] Update logo to mirror the new MediaWiki logo (T279087) (duration: 00m 57s)
22:56 ryankemper: T267927 WDQS kicked off next round of `data-transfer`s: `wdqs1004`->`wdqs1006`, `wdqs2001`->`wdqs2002`, `wdqs2008`->`wdqs1003`
22:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
22:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
22:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
22:48 ryankemper: T267927 pooled `wdqs1005` (all caught up on lag)
22:46 ryankemper: T280108 T267927 Manually re-enabled and ran puppet on `wdqs1005` (had closed the tmux pane which terminated the cookbook without letting it do its final cleanup)
22:33 ryankemper: T280108 T267927 Data transfers completed successfully; small issue with new `wait_for_updater` logic is preventing termination so I ctrl+c'd manually
22:32 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
20:03 herron: migrating kafka-logging broker logstash1012 to kafka-logging1003 T279342
19:56 Trey314159: reindexing wikidata on cloudelastic finished/failed (T274200)
19:43 Trey314159: reindexing wikidata on cloudelastic (T274200)
19:42 Trey314159: reindexing commons and wikidata on elastic@eqiad (T274200)
19:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
19:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.1 refs T278345
18:49 andrew@deploy1002: Finished deploy [horizon/deploy@ec37c43]: test deploy of trove dashboard to codfw1dev (duration: 01m 58s)
18:47 andrew@deploy1002: Started deploy [horizon/deploy@ec37c43]: test deploy of trove dashboard to codfw1dev
18:39 jdrewniak@deploy1002: Synchronized private/readme.php: Config: Add $wgWMEVectorPrefDiffSalt to private/readme (T261842) (duration: 01m 08s)
18:32 jdrewniak@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add mediawiki.pref_diff stream to wgEventLoggingStreamNames/wgEventStreams (T261842) (duration: 01m 18s)
17:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:42 crusnov@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:34 crusnov@cumin1001: START - Cookbook sre.dns.netbox
16:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
16:21 ryankemper: T280108 T267927 Current wdqs transfers in progress: `wqds1004`->`wdqs1005`, `wdqs2008`->`wdqs2001`
16:21 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
16:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
16:17 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
16:17 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
16:17 ryankemper: T280108 T267927 Merged https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/679702 and ran puppet-agent on `cumin2001` before next round of wdqs `data-transfer`s
16:12 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
16:08 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
16:02 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
15:26 otto@deploy1002: Finished deploy [analytics/refinery@497f6a5] (hadoop-test): (no justification provided) (duration: 04m 44s)
15:21 otto@deploy1002: Started deploy [analytics/refinery@497f6a5] (hadoop-test): (no justification provided)
15:09 elukey@deploy1002: Finished deploy [analytics/refinery@497f6a5]: Regular analytics weekly train (duration: 13m 12s)
15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1002.wikimedia.org
15:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns1002.wikimedia.org
14:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1001.wikimedia.org
14:56 elukey@deploy1002: Started deploy [analytics/refinery@497f6a5]: Regular analytics weekly train
14:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns1001.wikimedia.org
14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5002.wikimedia.org
14:47 jayme: imported etcd-mirror_0.0.5-1 to buster-wikimedia
14:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns5002.wikimedia.org
14:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5001.wikimedia.org
14:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1048.eqiad.wmnet with reason: REIMAGE
14:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1047.eqiad.wmnet with reason: REIMAGE
14:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1048.eqiad.wmnet with reason: REIMAGE
14:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns5001.wikimedia.org
14:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1046.eqiad.wmnet with reason: REIMAGE
14:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1047.eqiad.wmnet with reason: REIMAGE
14:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2002.wikimedia.org
14:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1046.eqiad.wmnet with reason: REIMAGE
14:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2001.wikimedia.org
14:19 ppchelko@deploy1002: Finished deploy [restbase/deploy@4755f50]: T271983, try again (duration: 07m 45s)
14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns2001.wikimedia.org
14:17 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
14:12 ppchelko@deploy1002: Started deploy [restbase/deploy@4755f50]: T271983, try again
14:11 ppchelko@deploy1002: Finished deploy [restbase/deploy@4755f50]: T271983 (duration: 11m 15s)
14:09 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
14:00 ppchelko@deploy1002: Started deploy [restbase/deploy@4755f50]: T271983
13:56 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=wtp104[5-7].eqiad.wmnet
13:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
13:54 andrewbogott: upgrading packages and mediawiki on wikitech-static
13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4002.wikimedia.org
13:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
13:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns4002.wikimedia.org
13:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3001.wikimedia.org
13:32 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
13:25 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
13:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns3001.wikimedia.org
13:18 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
13:13 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
13:13 XioNoX: redirect ns2 to dns3002
13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3002.wikimedia.org
13:07 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
13:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns3002.wikimedia.org
13:02 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
12:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1045.eqiad.wmnet with reason: REIMAGE
12:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1044.eqiad.wmnet with reason: REIMAGE
12:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1045.eqiad.wmnet with reason: REIMAGE
12:56 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
12:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1043.eqiad.wmnet with reason: REIMAGE
12:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1044.eqiad.wmnet with reason: REIMAGE
12:54 XioNoX: redirect ns2 to dns3001
12:53 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1043.eqiad.wmnet with reason: REIMAGE
12:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns1001.wikimedia.org
12:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host authdns1001.wikimedia.org
12:37 XioNoX: redirect ns0 to authdns2001
12:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns2001.wikimedia.org
12:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1018.eqiad.wmnet
12:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host authdns2001.wikimedia.org
12:23 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=wtp104[0-2].eqiad.wmnet
12:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1018.eqiad.wmnet
12:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
12:12 XioNoX: redirect ns1 to authdns1001
12:09 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
11:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
11:45 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
11:45 hnowlan: restarting restbase1016 for kernel update
11:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1042.eqiad.wmnet with reason: REIMAGE
11:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1041.eqiad.wmnet with reason: REIMAGE
11:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1042.eqiad.wmnet with reason: REIMAGE
11:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1040.eqiad.wmnet with reason: REIMAGE
11:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1041.eqiad.wmnet with reason: REIMAGE
11:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1040.eqiad.wmnet with reason: REIMAGE
11:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev1004.eqiad.wmnet with reason: restarting for kernel update
11:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev1004.eqiad.wmnet with reason: restarting for kernel update
11:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6748a7f: Add *.jfklibrary.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T279506) (duration: 01m 51s)
11:14 arturo: merging homer changes for cr-codgw (T280225)
11:14 arturo: merging homer changes for cr-eqiad (T280225)
10:59 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=wtp103[7-9].eqiad.wmnet
10:54 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
10:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
10:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
10:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
10:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
10:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
10:21 elukey: Add kafka-logging100{2,3} to the kafka term in the analytics filters on cr1/cr2 eqiad - ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/679740
10:08 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
10:08 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
10:08 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
09:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15368 and previous config saved to /var/cache/conftool/dbconfig/20210415-095031-root.json
09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15367 and previous config saved to /var/cache/conftool/dbconfig/20210415-093633-root.json
09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15366 and previous config saved to /var/cache/conftool/dbconfig/20210415-093527-root.json
09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15365 and previous config saved to /var/cache/conftool/dbconfig/20210415-092129-root.json
09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15364 and previous config saved to /var/cache/conftool/dbconfig/20210415-092024-root.json
09:16 ema: cp-upload: varnishadm -n frontend param.set nuke_limit 1000 T275809
09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15363 and previous config saved to /var/cache/conftool/dbconfig/20210415-090625-root.json
09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15362 and previous config saved to /var/cache/conftool/dbconfig/20210415-090520-root.json
09:04 moritzm: installing tomcat security updates
08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15361 and previous config saved to /var/cache/conftool/dbconfig/20210415-085122-root.json
08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 10%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15360 and previous config saved to /var/cache/conftool/dbconfig/20210415-085017-root.json
08:48 godog: free space and bounce thanos-compact
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 10%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15359 and previous config saved to /var/cache/conftool/dbconfig/20210415-083618-root.json
08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 5%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15358 and previous config saved to /var/cache/conftool/dbconfig/20210415-082115-root.json
08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P15357 and previous config saved to /var/cache/conftool/dbconfig/20210415-081127-marostegui.json
08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15356 and previous config saved to /var/cache/conftool/dbconfig/20210415-080947-root.json
07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166 after cloning db1179', diff saved to https://phabricator.wikimedia.org/P15355 and previous config saved to /var/cache/conftool/dbconfig/20210415-075718-root.json
07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15354 and previous config saved to /var/cache/conftool/dbconfig/20210415-075444-root.json
07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166 after cloning db1179', diff saved to https://phabricator.wikimedia.org/P15353 and previous config saved to /var/cache/conftool/dbconfig/20210415-074214-root.json
07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15352 and previous config saved to /var/cache/conftool/dbconfig/20210415-073940-root.json
07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166 after cloning db1179', diff saved to https://phabricator.wikimedia.org/P15351 and previous config saved to /var/cache/conftool/dbconfig/20210415-072711-root.json
07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15350 and previous config saved to /var/cache/conftool/dbconfig/20210415-072436-root.json
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146 (s2,s4) to upgrade kernel', diff saved to https://phabricator.wikimedia.org/P15348 and previous config saved to /var/cache/conftool/dbconfig/20210415-071600-marostegui.json
07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166 after cloning db1179', diff saved to https://phabricator.wikimedia.org/P15347 and previous config saved to /var/cache/conftool/dbconfig/20210415-071207-root.json
06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Repool db1166 after cloning db1179', diff saved to https://phabricator.wikimedia.org/P15346 and previous config saved to /var/cache/conftool/dbconfig/20210415-065704-root.json
06:33 ryankemper: T280108 T267927 `data-transfer` to `wdqs1004` was successful; cookbook failed due to a newly introduced minor type error that didn't effect the transfer itself
06:32 elukey: move hue.wikimedia.org to an-tool1009 (from analytics-tool1001)
06:00 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
05:54 Amir1: end of cleaning archive of pywikibot-bugs and wikidata-bugs T262773
05:44 Amir1: start deleting archive of wikidata-bugs T262773
05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 to clone db1179 T275633', diff saved to https://phabricator.wikimedia.org/P15344 and previous config saved to /var/cache/conftool/dbconfig/20210415-050239-marostegui.json
04:14 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
04:14 ryankemper: T280108 T267927 `wdqs2008` (source) caught up on lag, xfering to `wdqs1004`: `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs1004.eqiad.wmnet --reason "transferring wikidata journal following reload from dumps" --blazegraph_instance blazegraph --task-id T267927`
04:06 ryankemper: T280108 T267927 Merged https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/679320, will verify correct behavior of `data-transfer` cookbook
01:19 Amir1: mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki --property-id P8671 --new-data-type external-id (T278427)
00:50 ejegg: updated fundraising CiviCRM from c3342aa4ea to 35a8dd33ba

2021-04-14

23:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable legacy javascript global variables in ruwiki (T72470) (duration: 01m 16s)
21:44 legoktm: manually started debmonitor-client.service on ml-serve2004 after 502 Bad gateway error
20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wtp[1037-1039].eqiad.wmnet with reason: reimage
20:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wtp[1037-1039].eqiad.wmnet with reason: reimage
20:38 mutante: wtp1037, wtp1038, wtp1039 - scap pull
19:52 dzahn@cumin1001: conftool action : set/weight=20; selector: name=mw2395.codfw.wmnet,cluster=jobrunner
19:52 dzahn@cumin1001: conftool action : set/weight=20; selector: name=mw2394.codfw.wmnet,cluster=jobrunner
19:51 dzahn@cumin1001: conftool action : set/weight=20; selector: name=mw2410.codfw.wmnet,cluster=videoscaler
19:51 dzahn@cumin1001: conftool action : set/weight=20; selector: name=mw2411.codfw.wmnet,cluster=videoscaler
19:50 cstone: civicrm revision changed from ec2a3bcff6 to c3342aa4ea
19:50 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2411.codfw.wmnet,cluster=videoscaler
19:50 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2410.codfw.wmnet,cluster=videoscaler
19:49 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2395.codfw.wmnet,cluster=videoscaler
19:48 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2394.codfw.wmnet,cluster=videoscaler
19:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2411.codfw.wmnet,cluster=jobrunner
19:42 herron: migrating kafka-logging broker logstash1011 to kafka-logging1002 T279342
19:06 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.1 refs T278345 (duration: 02m 03s)
19:04 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.1 refs T278345
18:58 mutante: urldownloader1002 - icinga alerted about disk space, ran 'apt-get clean' which is my usual go to in that case. it reduced usage from 97% to 89%
17:56 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/: ce44792: 84107c5: GrowthExperiments backports related to DatabaseMentorStore (T279957; T279959) (duration: 01m 55s)
15:00 shdubsh: run new curator actions on codfw - T274394
14:48 shdubsh: O:logstash::elasticsearch7 update elasticsearch-curator to 5.8.1
14:13 rzl: mcrouter cert renewal complete, puppet re-enabled T276029
14:11 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@8ae53e3]: T273847 export queries to relforge dag deployment - start date update (duration: 02m 14s)
14:11 moritzm: installing intel-microcode updates on Buster
14:09 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@8ae53e3]: T273847 export queries to relforge dag deployment - start date update
13:48 rzl: disabling puppet on C:mcrouter for cert renewal T276029
13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es5 master', diff saved to https://phabricator.wikimedia.org/P15342 and previous config saved to /var/cache/conftool/dbconfig/20210414-134331-marostegui.json
13:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: Repool es1025 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15341 and previous config saved to /var/cache/conftool/dbconfig/20210414-133411-root.json
13:29 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@825c60a]: T273847 export queries to relforge dag deployment - schedule change (duration: 02m 08s)
13:27 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@825c60a]: T273847 export queries to relforge dag deployment - schedule change
13:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 75%: Repool es1025 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15340 and previous config saved to /var/cache/conftool/dbconfig/20210414-131908-root.json
13:12 moritzm: installing OpenSSL updates on buster
13:12 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
13:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: Repool es1025 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15339 and previous config saved to /var/cache/conftool/dbconfig/20210414-130404-root.json
13:02 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
13:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
13:01 godog: extend prometheus global @ codfw by 100G
12:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 25%: Repool es1025 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15338 and previous config saved to /var/cache/conftool/dbconfig/20210414-124901-root.json
12:39 elukey: update kafka term for analytics-in{4,6} on cr{1,2}-eqiad to include kafka-logging1001 - ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/679296
12:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1039.eqiad.wmnet with reason: REIMAGE
12:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1039.eqiad.wmnet with reason: REIMAGE
12:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1038.eqiad.wmnet with reason: REIMAGE
12:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: Repool es1025 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15337 and previous config saved to /var/cache/conftool/dbconfig/20210414-123357-root.json
12:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1037.eqiad.wmnet with reason: REIMAGE
12:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1038.eqiad.wmnet with reason: REIMAGE
12:30 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1037.eqiad.wmnet with reason: REIMAGE
12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15336 and previous config saved to /var/cache/conftool/dbconfig/20210414-122727-root.json
12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15335 and previous config saved to /var/cache/conftool/dbconfig/20210414-122108-root.json
12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15334 and previous config saved to /var/cache/conftool/dbconfig/20210414-121223-root.json
12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1025 for kernel and mysql upgrade T279281', diff saved to https://phabricator.wikimedia.org/P15333 and previous config saved to /var/cache/conftool/dbconfig/20210414-120724-marostegui.json
12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15332 and previous config saved to /var/cache/conftool/dbconfig/20210414-120604-root.json
12:03 marostegui: Upgrade mysql on db1080 T279281
11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15331 and previous config saved to /var/cache/conftool/dbconfig/20210414-115720-root.json
11:53 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=(wtp1034|wtp1035|wtp1036).eqiad.wmnet
11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15330 and previous config saved to /var/cache/conftool/dbconfig/20210414-115101-root.json
11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15329 and previous config saved to /var/cache/conftool/dbconfig/20210414-114216-root.json
11:41 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
11:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15328 and previous config saved to /var/cache/conftool/dbconfig/20210414-113714-root.json
11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15327 and previous config saved to /var/cache/conftool/dbconfig/20210414-113557-root.json
11:31 marostegui: Upgrade kernel on db1096 (s5, s6)
11:29 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 (s5,s6) kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15326 and previous config saved to /var/cache/conftool/dbconfig/20210414-112619-marostegui.json
11:25 hnowlan: regenerated certificates for restbase1019/restbase102[0-7]
11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 90%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15325 and previous config saved to /var/cache/conftool/dbconfig/20210414-112211-root.json
11:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 80%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15323 and previous config saved to /var/cache/conftool/dbconfig/20210414-110706-root.json
11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1036.eqiad.wmnet with reason: REIMAGE
11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
11:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1035.eqiad.wmnet with reason: REIMAGE
11:04 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
11:04 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
11:04 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
11:03 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
11:03 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1036.eqiad.wmnet with reason: REIMAGE
11:03 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
11:03 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
11:02 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
11:02 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
11:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1034.eqiad.wmnet with reason: REIMAGE
11:01 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1035.eqiad.wmnet with reason: REIMAGE
10:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1034.eqiad.wmnet with reason: REIMAGE
10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 70%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15322 and previous config saved to /var/cache/conftool/dbconfig/20210414-105202-root.json
10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 60%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15321 and previous config saved to /var/cache/conftool/dbconfig/20210414-103659-root.json
10:30 marostegui: Failover m1 from db1080 to db1159 - T276448
10:25 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Upgrading ceph to octopus
10:25 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Upgrading ceph to octopus
10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15320 and previous config saved to /var/cache/conftool/dbconfig/20210414-102153-root.json
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 40%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15319 and previous config saved to /var/cache/conftool/dbconfig/20210414-100649-root.json
09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 30%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15318 and previous config saved to /var/cache/conftool/dbconfig/20210414-095146-root.json
09:37 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 20%: Slowly pool db1177 for the first time in s8 T275633', diff saved to https://phabricator.wikimedia.org/P15317 and previous config saved to /var/cache/conftool/dbconfig/20210414-093642-root.json
09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1177 with minimal weight on s8 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15316 and previous config saved to /var/cache/conftool/dbconfig/20210414-093305-marostegui.json
09:29 gehel: depooling wdqs1004 - corrupted data after data reload
09:27 effie: disable puppet on all mediawiki servers to merge 676580
09:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/includes/Hooks/HookUtils.php: e4b2d93: Dont allow query and cookie hacks to enable topic subscriptions (T280082) (duration: 01m 24s)
09:23 gehel: repooling wdqs1013, catched up on lag
09:22 gehel: depooling wdqs1003 - corrupted data after data reload
09:19 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kraz.wikimedia.org
09:16 gehel: restarting blazegraph on wdqs1003
09:12 ryankemper: T267927 depooled `wdqs1004` following data transfer (catching up on lag), current round of data transfers is done so there shouldn't be any left to depool
09:10 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
09:09 jmm@cumin1001: START - Cookbook sre.hosts.decommission for hosts kraz.wikimedia.org
09:06 ryankemper: T267927 depool `wdqs2001` following data transfer (catching up on lag)
09:03 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast1002.wikimedia.org
09:03 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
08:53 jmm@cumin1001: START - Cookbook sre.hosts.decommission for hosts bast1002.wikimedia.org
08:44 Urbanecm: Run scap pull on mwdebug1002
08:40 Urbanecm: Stagging on mwdebug1002
08:20 akosiaris@cumin1001: conftool action : set/weight=10; selector: cluster=videoscaler,service=apache2,name=mw2394.codfw.wmnet
08:20 akosiaris@cumin1001: conftool action : set/weight=10; selector: cluster=videoscaler,service=apache2,name=mw2395.codfw.wmnet
08:16 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=(wtp1033.eqiad.wmnet|wtp1032.eqiad.wmnet)
08:07 jayme: updated chartmuseum to 0.13.1 on charmuseum1001, chartmuseum2001
08:06 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
08:05 gehel: depooling wdqs2004 - catching up on lag
08:01 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
07:59 gehel: depooling wdqs2001 - catching up on lag
07:57 gehel: depooling wdqs1013 - catching up on lag
07:56 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
07:55 gehel: restarting blazegraph + updater on wdqs1013
07:51 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
07:51 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
07:42 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
07:42 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
07:42 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
07:42 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
07:42 jayme: imported chartmuseum_0.13.1-1 to buster-wikimedia
07:41 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
07:41 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
07:41 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
07:41 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836
07:41 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
07:40 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
07:40 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
07:22 XioNoX: push pfw policy - T280059
06:47 eileen: civicrm revision changed from 649e415c07 to ec2a3bcff6, config revision is c5fc1b91e0
06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1177 with minimal weight on s8 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15314 and previous config saved to /var/cache/conftool/dbconfig/20210414-062549-marostegui.json
05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1177 with minimal weight on s8 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15313 and previous config saved to /var/cache/conftool/dbconfig/20210414-052959-marostegui.json
05:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1076.eqiad.wmnet
05:08 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
05:08 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
05:07 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
05:07 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
05:04 root@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1076.eqiad.wmnet
04:50 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
04:42 eileen: civicrm revision changed from a4c1a7b842 to 649e415c07, config revision is c5fc1b91e0
02:54 andrew@deploy1002: Finished deploy [horizon/deploy@ef844a1]: fix for T276963 (duration: 04m 10s)
02:49 andrew@deploy1002: Started deploy [horizon/deploy@ef844a1]: fix for T276963
00:11 legoktm@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw2411.codfw.wmnet
00:10 legoktm@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw2411.codfw.wmnet
00:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw2411.codfw.wmnet
00:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw2410.codfw.wmnet

2021-04-13

23:27 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: Broadcast IRC events to irc1001 instead of kraz (T224579) (duration: 01m 06s)
23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Unset $wmgUseWikimediaShopLink for ptwiki (T279877) (duration: 01m 06s)
23:10 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: ExtensionDistributor: Add REL1_36 (duration: 02m 03s)
22:41 mutante: welcome new deployer Silvan Heintze (sihe) (T279764)
22:40 cstone: civicrm revision changed from 76bd8ff009 to a4c1a7b842
22:08 ejegg: updated payments-wiki from 70f5163816 to 9a4eef1375
22:06 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2395.codfw.wmnet,cluster=jobrunner
22:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2395.codfw.wmnet,cluster=jobrunner
22:04 Urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki]$ foreachwikiindblist growthexperiments sql.php php-1.37.0-wmf.1/extensions/GrowthExperiments/maintenance/schemas/mysql/growthexperiments_mentor_mentee.sql # T278573
21:50 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2394.codfw.wmnet,cluster=jobrunner
21:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2394.codfw.wmnet,cluster=jobrunner
21:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2394.codfw.wmnet,service=jobrunner
21:45 mutante: mw2394, mw2395 - scap pull
21:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2395.codfw.wmnet
21:35 mutante: mw2394 - rebooting
21:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2394.codfw.wmnet
21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
21:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
21:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
21:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
20:58 mutante: mw2395, mw2395 - reimaging as jobrunners (T279100)
20:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2394-2395].codfw.wmnet with reason: reimage
20:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2394-2395].codfw.wmnet with reason: reimage
20:47 mutante: [kubemaster1001:~] $ sudo kubectl delete pod linkrecommendation-production-load-datasets-1618311600-hn6k8 -n linkrecommendation (T280076)
19:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
19:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
19:32 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
19:32 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
19:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1033.eqiad.wmnet with reason: REIMAGE
19:29 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
19:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1032.eqiad.wmnet with reason: REIMAGE
19:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1033.eqiad.wmnet with reason: REIMAGE
19:25 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1032.eqiad.wmnet with reason: REIMAGE
19:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2020.codfw.wmnet with reason: REIMAGE
19:15 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2020.codfw.wmnet with reason: REIMAGE
19:11 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.1
18:45 jhuneidi@deploy1002: Pruned MediaWiki: 1.36.0-wmf.37 (duration: 03m 16s)
18:11 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.1 (duration: 30m 36s)
18:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1031.eqiad.wmnet with reason: REIMAGE
17:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1031.eqiad.wmnet with reason: REIMAGE
17:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1030.eqiad.wmnet with reason: REIMAGE
17:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1030.eqiad.wmnet with reason: REIMAGE
17:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2019.codfw.wmnet with reason: REIMAGE
17:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
17:54 ayounsi@cumin1001: START - Cookbook sre.network.cf
17:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2018.codfw.wmnet with reason: REIMAGE
17:52 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2019.codfw.wmnet with reason: REIMAGE
17:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2017.codfw.wmnet with reason: REIMAGE
17:50 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2018.codfw.wmnet with reason: REIMAGE
17:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2017.codfw.wmnet with reason: REIMAGE
17:41 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.1
17:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
17:28 ayounsi@cumin1001: START - Cookbook sre.network.cf
17:21 mutante: gerrit1001 - remove /var/lib/gerrit2/review_site/static/gerrit-theme.html after https://gerrit.wikimedia.org/r/c/operations/puppet/+/678646
16:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15311 and previous config saved to /var/cache/conftool/dbconfig/20210413-163851-root.json
16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 90%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15310 and previous config saved to /var/cache/conftool/dbconfig/20210413-162347-root.json
16:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1029.eqiad.wmnet with reason: REIMAGE
16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 80%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15309 and previous config saved to /var/cache/conftool/dbconfig/20210413-160844-root.json
16:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1028.eqiad.wmnet with reason: REIMAGE
16:05 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1029.eqiad.wmnet with reason: REIMAGE
16:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2016.codfw.wmnet with reason: REIMAGE
16:03 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1028.eqiad.wmnet with reason: REIMAGE
16:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2015.codfw.wmnet with reason: REIMAGE
16:02 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2016.codfw.wmnet with reason: REIMAGE
16:00 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2014.codfw.wmnet with reason: REIMAGE
16:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2015.codfw.wmnet with reason: REIMAGE
15:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2014.codfw.wmnet with reason: REIMAGE
15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 70%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15308 and previous config saved to /var/cache/conftool/dbconfig/20210413-155340-root.json
15:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 60%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15307 and previous config saved to /var/cache/conftool/dbconfig/20210413-153836-root.json
15:26 herron: migrating kafka-logging broker logstash1010 to kafka-logging1001 T279342
15:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15306 and previous config saved to /var/cache/conftool/dbconfig/20210413-152333-root.json
15:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
15:12 Trey314159: reindexing English wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (with some failures) (T274200)
15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 40%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15305 and previous config saved to /var/cache/conftool/dbconfig/20210413-150829-root.json
14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 30%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15304 and previous config saved to /var/cache/conftool/dbconfig/20210413-145325-root.json
14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 20%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15303 and previous config saved to /var/cache/conftool/dbconfig/20210413-143821-root.json
14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1184 with minimal weight on s1 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15302 and previous config saved to /var/cache/conftool/dbconfig/20210413-143419-marostegui.json
14:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1027.eqiad.wmnet with reason: REIMAGE
14:09 moritzm: updated bullseye d-i image to 2021-04-12 daily build T275873
14:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1027.eqiad.wmnet with reason: REIMAGE
14:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1026.eqiad.wmnet with reason: REIMAGE
14:06 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1026.eqiad.wmnet with reason: REIMAGE
14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1184 with minimal weight on s1 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15301 and previous config saved to /var/cache/conftool/dbconfig/20210413-140431-marostegui.json
14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 20%: Slowly pool db1184 for the first time in s1 T275633', diff saved to https://phabricator.wikimedia.org/P15300 and previous config saved to /var/cache/conftool/dbconfig/20210413-140353-root.json
14:03 _joe_: uploading new versions of the mcrouter, php7.2-fpm and php7.3-fpm images to the registry
14:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2013.codfw.wmnet with reason: REIMAGE
13:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2012.codfw.wmnet with reason: REIMAGE
13:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2013.codfw.wmnet with reason: REIMAGE
13:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2011.codfw.wmnet with reason: REIMAGE
13:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2012.codfw.wmnet with reason: REIMAGE
13:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2011.codfw.wmnet with reason: REIMAGE
13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15299 and previous config saved to /var/cache/conftool/dbconfig/20210413-133644-root.json
13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 90%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15298 and previous config saved to /var/cache/conftool/dbconfig/20210413-132140-root.json
13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 80%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15297 and previous config saved to /var/cache/conftool/dbconfig/20210413-130637-root.json
12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1184 with minimal weight on s1 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15296 and previous config saved to /var/cache/conftool/dbconfig/20210413-125652-marostegui.json
12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 70%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15295 and previous config saved to /var/cache/conftool/dbconfig/20210413-125133-root.json
12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 60%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15294 and previous config saved to /var/cache/conftool/dbconfig/20210413-123629-root.json
12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1184 with minimal weight on s1 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15293 and previous config saved to /var/cache/conftool/dbconfig/20210413-122248-marostegui.json
12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15292 and previous config saved to /var/cache/conftool/dbconfig/20210413-122126-root.json
12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1076 from dbctl T274752', diff saved to https://phabricator.wikimedia.org/P15291 and previous config saved to /var/cache/conftool/dbconfig/20210413-122119-marostegui.json
12:13 dcausse: deleting stale wikidata indices on cloudelastic (T231517)
11:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2010.codfw.wmnet with reason: REIMAGE
11:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2009.codfw.wmnet with reason: REIMAGE
11:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2010.codfw.wmnet with reason: REIMAGE
11:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2008.codfw.wmnet with reason: REIMAGE
11:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2009.codfw.wmnet with reason: REIMAGE
11:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2008.codfw.wmnet with reason: REIMAGE
11:17 jbond42: switch debmonitor internal service to apache
11:14 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable legacy javascript variables in zhwiki (T72470) (duration: 00m 57s)
10:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 40%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15289 and previous config saved to /var/cache/conftool/dbconfig/20210413-105625-root.json
10:55 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
10:55 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
10:43 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
10:43 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 30%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15288 and previous config saved to /var/cache/conftool/dbconfig/20210413-104121-root.json
10:39 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
10:35 jbond42: switch debmonitor internal interface to use to use apache
10:33 moritzm: restarting FPM on mw canaries to pick up OpenSSL updates
10:31 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
10:28 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@3227eea]: (no justification provided) (duration: 03m 08s)
10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: Slowly pool db1180 for the first time in s6 T275633', diff saved to https://phabricator.wikimedia.org/P15287 and previous config saved to /var/cache/conftool/dbconfig/20210413-102617-root.json
10:25 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@3227eea]: (no justification provided)
10:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1180 with minimal weight on s6 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15286 and previous config saved to /var/cache/conftool/dbconfig/20210413-095717-marostegui.json
09:41 ema: cp[5002-5006]: rolling varnish-frontend-restart to apply exp policy settings changes starting from empty caches T275809
09:34 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbmonitor1001.wikimedia.org
09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P15285 and previous config saved to /var/cache/conftool/dbconfig/20210413-093208-root.json
09:22 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:21 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P15284 and previous config saved to /var/cache/conftool/dbconfig/20210413-091704-root.json
09:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2007.codfw.wmnet with reason: REIMAGE
09:16 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2006.codfw.wmnet with reason: REIMAGE
09:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2007.codfw.wmnet with reason: REIMAGE
09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1180 with minimal weight on s6 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15283 and previous config saved to /var/cache/conftool/dbconfig/20210413-091414-marostegui.json
09:13 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2005.codfw.wmnet with reason: REIMAGE
09:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2006.codfw.wmnet with reason: REIMAGE
09:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2005.codfw.wmnet with reason: REIMAGE
09:06 jmm@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbmonitor1001.wikimedia.org
09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P15282 and previous config saved to /var/cache/conftool/dbconfig/20210413-090201-root.json
08:59 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
08:59 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1180 with minimal weight on s6 for the first time T275633', diff saved to https://phabricator.wikimedia.org/P15281 and previous config saved to /var/cache/conftool/dbconfig/20210413-085057-marostegui.json
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after schema change', diff saved to https://phabricator.wikimedia.org/P15280 and previous config saved to /var/cache/conftool/dbconfig/20210413-084657-root.json
08:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7ca7673: mswiki: Fix help panel links (T277562) (duration: 00m 58s)
08:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
08:18 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
08:16 marostegui: Restart sanitarium hosts db1124, db1125, db1154, db1155, db2094, db2095 T279587
08:09 akosiaris: Remove system maintenance message from OTRS. Migration to Znuny 6.0.33 done. T279303
08:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2004.codfw.wmnet with reason: REIMAGE
08:00 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2003.codfw.wmnet with reason: REIMAGE
08:00 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2004.codfw.wmnet with reason: REIMAGE
07:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2002.codfw.wmnet with reason: REIMAGE
07:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2003.codfw.wmnet with reason: REIMAGE
07:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2002.codfw.wmnet with reason: REIMAGE
07:49 jiji@cumin1001: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=(mw1311.eqiad.wmnet|mw1318.eqiad.wmnet|mw1334.eqiad.wmnet)
07:46 akosiaris: Start up all components on otrs1001. T279303
07:38 jiji@cumin1001: conftool action : set/weight=10; selector: cluster=jobrunner,name=mw1318.eqiad.wmnet
07:38 jiji@cumin1001: conftool action : set/weight=10; selector: cluster=jobrunner,name=mw1334.eqiad.wmnet
07:30 akosiaris: migrating to Znuny-6.0.33, release 2021-03-10 . T279303
07:26 akosiaris: shutdown all OTRS components on otrs1001, prep for OTRS -> Znuny migration. T279303
05:56 _joe_: restarting blazegraph on wdqs1013
05:44 eileen: civicrm revision changed from ecc32d2a35 to 76bd8ff009, config revision is c5fc1b91e0
04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P15278 and previous config saved to /var/cache/conftool/dbconfig/20210413-045708-marostegui.json

2021-04-12

23:25 krinkle@deploy1002: Synchronized wmf-config/mc.php: I390b47 (duration: 00m 58s)
23:06 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wgAbuseFilterAflFilterMigrationStage: Make COMPAT_NEW in production (T269712) (duration: 00m 58s)
18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 117743f: Enable assignment of importupload on enwikibooks (T278683) (duration: 00m 57s)
18:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a1949fd: Add extendedconfirmed on svwiki (T279836) (duration: 00m 59s)
18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5d275ec: Add abusefilter-maintainer to wmgPrivilegedGlobalGroups (T279835) (duration: 00m 58s)
18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 13b10d3: Enable <mapframe> on bswiki (T279635) (duration: 00m 57s)
18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ae05f7c: Replace ombudsman with ombuds in wmgPrivilegedGlobalGroups (T256299) (duration: 00m 57s)
18:03 urbanecm@deploy1002: sync-file aborted: ae05f7c: Replace ombudsman with ombuds in wmgPrivilegedGlobalGroups (T256299ú (duration: 00m 00s)
11:29 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable legacy javascript in jawiki (T72470) (duration: 00m 56s)
11:26 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.38/extensions/FlaggedRevs/frontend/FlaggedRevsXML.php: Backport: Don't do strict equal condition check (T279750) (duration: 00m 57s)
11:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NO-OP: 6c03d6a: Explicitly set wgGEMentorshipMigrationStage: WRITE_OLD/READ_OLD (T279853) (duration: 00m 58s)
11:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wikidata: post edit constraint jobs on 60% of edits (T204031) (duration: 01m 13s)
11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove all remains of idGeneratorLogging (T274156) (2/2, Beta-only) (duration: 00m 56s)
11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove all remains of idGeneratorLogging (T274156) (1/2) (duration: 00m 57s)
11:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Remove idGeneratorLogging (T274156) (duration: 00m 58s)
11:00 ladsgroup@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T279398 T279419) (duration: 00m 58s)
10:59 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T279398 T279419) (duration: 00m 58s)
09:55 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 57s)
09:44 Urbanecm: Start server-side upload for 4 video files #2 (T279878, T279839, T279818)
08:43 Urbanecm: Start server-side upload for 4 video files (T279878, T279839, T279818)
08:08 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=jobrunner,name=mw1318.eqiad.wmnet
08:07 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=jobrunner,name=mw1334.eqiad.wmnet
08:07 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1311.eqiad.wmnet
08:06 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1318.eqiad.wmnet
08:06 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1334.eqiad.wmnet
08:05 vgutierrez: restart acme-chief

2021-04-10

14:21 andrew@deploy1002: Finished deploy [horizon/deploy@ee1be56]: fix for T279699 (duration: 04m 12s)
14:17 andrew@deploy1002: Started deploy [horizon/deploy@ee1be56]: fix for T279699
14:11 andrew@deploy1002: Finished deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699 (duration: 02m 21s)
14:08 andrew@deploy1002: Started deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699
14:08 andrew@deploy1002: Finished deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699 (duration: 00m 11s)
14:08 andrew@deploy1002: Started deploy [horizon/deploy@ee1be56]: cloudweb2001-dev deploy partial fix for T279699

2021-04-09

14:07 jynus: retry es4 backup dump on eqiad (backup1002)
01:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on moss-be2002.codfw.wmnet with reason: REIMAGE
01:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2002.codfw.wmnet with reason: REIMAGE
00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on moss-be2001.codfw.wmnet with reason: REIMAGE
00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2001.codfw.wmnet with reason: REIMAGE
00:49 legoktm: imported mailman3 backports on apt.wm.o (T278905)

2021-04-08

23:48 brennen@deploy1002: Synchronized php-1.36.0-wmf.38/extensions/WikibaseMediaInfo/resources/mediasearch-vue/store/actions.js: Backport: Do not show "invalid search" message when request is aborted by user (TT277714) (duration: 00m 57s)
22:12 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
22:12 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
22:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
21:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
21:56 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
21:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
21:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
21:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
21:52 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
21:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
21:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
21:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
21:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
21:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
21:46 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
21:46 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
21:38 andrew@deploy1002: Finished deploy [horizon/deploy@3abe9d0]: Fix for T279667 (duration: 03m 52s)
21:34 andrew@deploy1002: Started deploy [horizon/deploy@3abe9d0]: Fix for T279667
21:33 tgr@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
20:33 mutante: mw2403 through mw2411 pooled and set to active state in netbox (T279599)
20:32 mutante: mw2304 through mw2411 - pooled and set to active state in netbox (T279599)
20:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw240[3-9].codfw.wmnet
20:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw241[0-1].codfw.wmnet
20:27 legoktm: legoktm@deploy1002:~$ cat deb-parsoid-urls.txt | mwscript purgeList.php --wiki=aawiki # to clear releases.wm.o/debian/ cache
20:02 legoktm: imported parsoid_0.11.1all_all.deb to releases.wikimedia.org apt repo
19:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw241[0-1].codfw.wmnet
19:58 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw241[0-1].codfw.wmnet
19:57 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw238[0-2].codfw.wmnet
19:56 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2379.codfw.wmnet
19:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw240[3-9].codfw.wmnet
19:54 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw240[3-9].codfw.wmnet
19:50 mutante: mw2403 through mw2411 - scap pull - new hardware
19:35 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.38
18:52 phuedx: phuedx@deploy1002 Synchronized private/PrivateSettings.php: PrivateSettings: Add value for $wgWMEVectorPrefDiffSalt (T261842)
18:51 phuedx@deploy1002: Synchronized private/PrivateSettings.php: PrivateSettings: Add value for (T261842) (duration: 01m 06s)
18:37 mutante: mw2403 through mw2411 - serial rebooting
18:31 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
18:31 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
18:29 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.38/extensions/VisualEditor/modules/ve-mw/ui/tools/ve.ui.MWBackTool.js: e0f3735: Revert incorrect changes to ve.ui.MWBackCommand that made it stop working (T279613) (duration: 01m 07s)
18:25 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
18:25 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
18:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2410-2411].codfw.wmnet with reason: new_install
18:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2410-2411].codfw.wmnet with reason: new_install
18:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: new_install
18:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 7 hosts with reason: new_install
18:03 mutante: mw2403 through mw2411 - new hardware moving into production, not pooled yet, initial puppet run, being added to icinga etc, creating mcrouter certs for them (T279599)
18:02 mutante: mw2403 through mw2401 - new hardwere moving into production, not pooled yet, initial puppet run, being added to icinga etc, creating mcrouter certs for them (T279599)
17:59 tgr@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
17:52 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
17:29 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
17:23 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
17:18 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
17:16 dancy: Scap 3.17.0 deployed to beta cluster
16:51 dancy: testing Scap 3.17.0 release on deployment-deploy01
16:33 elukey: reboot an-worker1100 again to check if all the disks come up correctly
16:16 cmjohnson1: update bios cp1087, already deposed for h/w issues T278729
16:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wtp1025.eqiad.wmnet with reason: REIMAGE
16:13 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wtp1025.eqiad.wmnet with reason: REIMAGE
16:10 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:05 pt1979@cumin2001: START - Cookbook sre.dns.netbox
15:51 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
15:36 elukey: reboot an-worker1100 to see if it helps with the strange BBU behavior
13:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephmon2001-dev.codfw.wmnet
13:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephmon2001-dev.codfw.wmnet
13:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
13:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
13:24 moritzm: installing groff bugfix updates from Buster point release
12:49 ema: cp5001: varnish-frontend-restart to test exp policy settings starting from a empty cache T275809
12:44 moritzm: installing libbsd security updates for Buster
12:39 moritzm: installing xcftools security updates
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15264 and previous config saved to /var/cache/conftool/dbconfig/20210408-123137-root.json
12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15263 and previous config saved to /var/cache/conftool/dbconfig/20210408-121633-root.json
12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15262 and previous config saved to /var/cache/conftool/dbconfig/20210408-120128-root.json
11:58 XioNoX: tighten all routers loopback firewall filter - T207799
11:57 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix (duration: 00m 09s)
11:57 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix
11:50 XioNoX: tighten cr3-ulsfo loopback firewall filter - T207799
11:49 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix (duration: 01m 39s)
11:47 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@25dad72]: T273847 export queries to relforge dag deployment - elastic index name fix
11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15261 and previous config saved to /var/cache/conftool/dbconfig/20210408-114625-root.json
11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15259 and previous config saved to /var/cache/conftool/dbconfig/20210408-112332-root.json
11:09 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2028.codfw.wmnet
11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15258 and previous config saved to /var/cache/conftool/dbconfig/20210408-110828-root.json
11:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: de1670c: Enable Growth for newcomers on simplewiki, mswiki, tawiki (T278369; T277562; T277550) (duration: 01m 07s)
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15257 and previous config saved to /var/cache/conftool/dbconfig/20210408-105324-root.json
10:47 effie: disable puppet on parsoid* servers
10:41 XioNoX: enable sampling on all routers FPCs
10:40 marostegui: Upgrade db2085's kernel
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15256 and previous config saved to /var/cache/conftool/dbconfig/20210408-103821-root.json
10:37 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
10:32 XioNoX: enable sampling on cr1-codfw:fpc0
10:30 marostegui: Upgrade kernel on db1118
10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15255 and previous config saved to /var/cache/conftool/dbconfig/20210408-102855-marostegui.json
10:27 effie: enable puppet on all mw* servers
10:27 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15254 and previous config saved to /var/cache/conftool/dbconfig/20210408-101702-root.json
10:17 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
10:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P15253 and previous config saved to /var/cache/conftool/dbconfig/20210408-101303-marostegui.json
10:11 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@ff0137d]: T273847 export queries to relforge dag deployment - start date update (duration: 01m 37s)
10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15252 and previous config saved to /var/cache/conftool/dbconfig/20210408-101119-root.json
10:10 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
10:09 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@ff0137d]: T273847 export queries to relforge dag deployment - start date update
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1180 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15251 and previous config saved to /var/cache/conftool/dbconfig/20210408-100829-marostegui.json
10:07 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15250 and previous config saved to /var/cache/conftool/dbconfig/20210408-100159-root.json
09:58 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:56 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15249 and previous config saved to /var/cache/conftool/dbconfig/20210408-095615-root.json
09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15248 and previous config saved to /var/cache/conftool/dbconfig/20210408-094655-root.json
09:44 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Lusccasdeutsch . # T278856
09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1177 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15247 and previous config saved to /var/cache/conftool/dbconfig/20210408-094218-marostegui.json
09:42 effie: disable puppet in mw* servers for 677114
09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15246 and previous config saved to /var/cache/conftool/dbconfig/20210408-094112-root.json
09:36 Urbanecm: Retry server-side upload for T279192
09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P15244 and previous config saved to /var/cache/conftool/dbconfig/20210408-093151-root.json
09:30 moritzm: installing openssl updates for buster
09:29 akosiaris@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
09:29 akosiaris@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
09:27 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@d098717]: T273847 export queries to relforge dag deployment - sensor name fix (duration: 01m 48s)
09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15243 and previous config saved to /var/cache/conftool/dbconfig/20210408-092608-root.json
09:25 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@d098717]: T273847 export queries to relforge dag deployment - sensor name fix
09:24 moritzm: installing libzstd security updates on buster
09:20 ema: cp5001: varnish-frontend-restart to test exp policy settings starting from a empty cache T275809
09:14 akosiaris@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
09:14 akosiaris@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
09:09 moritzm: installing underscore security updates on stretch
08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P15242 and previous config saved to /var/cache/conftool/dbconfig/20210408-085630-marostegui.json
08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15241 and previous config saved to /var/cache/conftool/dbconfig/20210408-085610-root.json
08:44 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
08:43 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
08:41 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15240 and previous config saved to /var/cache/conftool/dbconfig/20210408-084107-root.json
08:40 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
08:39 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
08:38 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
08:38 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
08:37 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
08:33 moritzm: installing remaining curl security updates for buster
08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15239 and previous config saved to /var/cache/conftool/dbconfig/20210408-082603-root.json
08:24 marostegui: Stop MySQL on all db1117 sections to upgrade kernel
08:17 moritzm: imported postgis 3.1.1+dfsg-1~wmf1 to component/postgis for buster-wikimedia T277064
08:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15238 and previous config saved to /var/cache/conftool/dbconfig/20210408-081059-root.json
08:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
07:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 100%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15237 and previous config saved to /var/cache/conftool/dbconfig/20210408-075457-root.json
07:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es5 master', diff saved to https://phabricator.wikimedia.org/P15236 and previous config saved to /var/cache/conftool/dbconfig/20210408-074911-marostegui.json
07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P15235 and previous config saved to /var/cache/conftool/dbconfig/20210408-074524-marostegui.json
07:42 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
07:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 75%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15234 and previous config saved to /var/cache/conftool/dbconfig/20210408-073953-root.json
07:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 50%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15233 and previous config saved to /var/cache/conftool/dbconfig/20210408-072450-root.json
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 25%: Repool es1023', diff saved to https://phabricator.wikimedia.org/P15232 and previous config saved to /var/cache/conftool/dbconfig/20210408-070946-root.json
06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1023 to upgrade kernel and mysql, remove weight from es1021, to leave it as it was yesterday T279281', diff saved to https://phabricator.wikimedia.org/P15231 and previous config saved to /var/cache/conftool/dbconfig/20210408-065627-marostegui.json
06:44 elukey@deploy1002: Finished deploy [analytics/refinery@1dbbd3d] (hadoop-test): (no justification provided) (duration: 02m 20s)
06:41 elukey@deploy1002: Started deploy [analytics/refinery@1dbbd3d] (hadoop-test): (no justification provided)
06:33 marostegui: Stop MySQL on db1111 to clone db1177 T275633
06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 to clone db1177 T275633', diff saved to https://phabricator.wikimedia.org/P15229 and previous config saved to /var/cache/conftool/dbconfig/20210408-063331-marostegui.json
06:01 kart_: Updated cxserver to 2021-04-07-062518-production (T278141, T263139, T271711, T201491, T240525, T207662)
05:58 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
05:54 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
05:43 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
02:50 AaronSchulz: Restarted importMissingLocalNames.php (mwmaint 1002, wiki=metawiki,batch-size=1000)

2021-04-07

23:38 ejegg: updated payments-wiki from b06009c099 to 70f5163816,
23:35 cstone: civicrm revision changed from eb9379daa3 to fdb4f90c74
23:10 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 321bf91: Wikibase: sample function call counters at 1:100 (T277817) (duration: 01m 08s)
22:49 mforns@deploy1002: Finished deploy [analytics/refinery@1dbbd3d] (hadoop-test): Regular analytics weekly train TEST retry1 [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3] (duration: 01m 47s)
22:48 mforns@deploy1002: Started deploy [analytics/refinery@1dbbd3d] (hadoop-test): Regular analytics weekly train TEST retry1 [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3]
22:33 mforns@deploy1002: Finished deploy [analytics/refinery@1dbbd3d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3] (duration: 04m 15s)
22:29 mforns@deploy1002: Started deploy [analytics/refinery@1dbbd3d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3]
22:29 mforns@deploy1002: Finished deploy [analytics/refinery@1dbbd3d] (thin): Regular analytics weekly train THIN [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3] (duration: 00m 07s)
22:29 mforns@deploy1002: Started deploy [analytics/refinery@1dbbd3d] (thin): Regular analytics weekly train THIN [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3]
22:28 mforns@deploy1002: Finished deploy [analytics/refinery@1dbbd3d]: Regular analytics weekly train [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3] (duration: 42m 54s)
22:03 Amir1: clearing watchlist of bots in wikidatawiki (https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&oldid=1397670734#Clean_up_watchlist_of_bots)
22:01 legoktm: deployed patch for T279451 (part 2)
21:45 mforns@deploy1002: Started deploy [analytics/refinery@1dbbd3d]: Regular analytics weekly train [analytics/refinery@1dbbd3dfa996d2e970eb1cbc0a63d53040d4e3a3]
21:31 legoktm: deployed patch for T279451
21:22 mutante: mw2397 through mw2402 - pooled as new API appservers after scap pull and all monitoring green (T278396)
21:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw240[0-2].codfw.wmnet
21:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[7-9].codfw.wmnet
21:05 mutante: mw2397 - mw2402 - scap pull
21:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw240[0-2].codfw.wmnet
21:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw239[7-9].codfw.wmnet
21:04 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw240[0-2].codfw.wmnet
21:03 Amir1: clearing watchlist of bots in enwiki (https://en.wikipedia.org/w/index.php?title=Wikipedia:Bots/Noticeboard&oldid=1016563560#Clearing_bot_watchlists)
21:02 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[7-9].codfw.wmnet
20:58 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[2400-2401].codfw.wmnet with reason: new_install
20:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[2400-2401].codfw.wmnet with reason: new_install
20:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[2397-2399].codfw.wmnet with reason: new_install
20:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[2397-2399].codfw.wmnet with reason: new_install
20:54 pt1979@cumin2001: START - Cookbook sre.dns.netbox
20:54 mutante: mw2397 - mw2402 - rebooting
20:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
20:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
20:30 mutante: mw2397 through mw2402 - new hardware moving into production, initial puppet runs as appservers, added to monitoring etc (T278396)
19:47 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on conf2006.codfw.wmnet with reason: REIMAGE
19:45 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2006.codfw.wmnet with reason: REIMAGE
19:35 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: no-op for Beta (disable LocalisationUpdate extension) (duration: 01m 06s)
19:24 dduvall@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.38 (duration: 01m 06s)
19:23 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.38
19:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on conf2005.codfw.wmnet with reason: REIMAGE
19:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2005.codfw.wmnet with reason: REIMAGE
18:54 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on conf2004.codfw.wmnet with reason: REIMAGE
18:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2004.codfw.wmnet with reason: REIMAGE
17:40 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
17:40 tgr@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
17:29 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
17:29 tgr@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
17:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: REIMAGE
17:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: REIMAGE
16:45 tgr@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
15:47 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: REIMAGE
15:45 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: REIMAGE
15:39 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
15:30 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
15:13 Amir1: setting enwiki and enwikibooks to wmf.38 on mwdebug1002 to test flagged revs
15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repool db1173 after cloning db1180', diff saved to https://phabricator.wikimedia.org/P15228 and previous config saved to /var/cache/conftool/dbconfig/20210407-150436-root.json
14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repool db1173 after cloning db1180', diff saved to https://phabricator.wikimedia.org/P15227 and previous config saved to /var/cache/conftool/dbconfig/20210407-144933-root.json
14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repool db1173 after cloning db1180', diff saved to https://phabricator.wikimedia.org/P15226 and previous config saved to /var/cache/conftool/dbconfig/20210407-143429-root.json
14:33 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2411.codfw.wmnet with reason: REIMAGE
14:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2411.codfw.wmnet with reason: REIMAGE
14:19 effie: restarting pybal on lvs2009, lvs1015
14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repool db1173 after cloning db1180', diff saved to https://phabricator.wikimedia.org/P15225 and previous config saved to /var/cache/conftool/dbconfig/20210407-141925-root.json
14:16 effie: restarting pybal on lvs2010, lvs1016
14:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2410.codfw.wmnet with reason: REIMAGE
14:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2410.codfw.wmnet with reason: REIMAGE
13:54 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2409.codfw.wmnet with reason: REIMAGE
13:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2409.codfw.wmnet with reason: REIMAGE
13:43 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
13:43 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
13:42 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
13:42 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
13:41 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
13:41 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
13:39 moritzm: imported jenkins 2.277.2 to apt.wikimedia.org (thirdparty/ci) T279033
13:37 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
13:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
13:35 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
13:35 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P15224 and previous config saved to /var/cache/conftool/dbconfig/20210407-122304-root.json
12:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
12:18 marostegui: Upgrade db1173's kernel
12:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1173', diff saved to https://phabricator.wikimedia.org/P15222 and previous config saved to /var/cache/conftool/dbconfig/20210407-121659-marostegui.json
12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P15221 and previous config saved to /var/cache/conftool/dbconfig/20210407-120800-root.json
12:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P15220 and previous config saved to /var/cache/conftool/dbconfig/20210407-115257-root.json
11:39 marostegui: Deploy schema change on s3 codfw, lag will appear T276150 T276156
11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P15219 and previous config saved to /var/cache/conftool/dbconfig/20210407-113753-root.json
11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1184 to s1 depooled T275633', diff saved to https://phabricator.wikimedia.org/P15218 and previous config saved to /var/cache/conftool/dbconfig/20210407-111708-marostegui.json
11:15 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: flaggedrevs: Disable quality and pristine tier in all wikis (T277883) (duration: 02m 15s)
10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P15217 and previous config saved to /var/cache/conftool/dbconfig/20210407-105617-marostegui.json
10:51 marostegui: Stop apache on dbmonitor1001 T224589
10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P15216 and previous config saved to /var/cache/conftool/dbconfig/20210407-103404-marostegui.json
10:01 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2106 and db2147 T279406', diff saved to https://phabricator.wikimedia.org/P15215 and previous config saved to /var/cache/conftool/dbconfig/20210407-100147-kormat.json
09:58 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kraz.wikimedia.org
09:58 moritzm: reboot kraz to nudge reconnects to irc2001.w.o for remaining connected clients
09:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host kraz.wikimedia.org
09:40 moritzm: imported git-lfs for bullseye/main (part of standard packages) T275873
09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P15214 and previous config saved to /var/cache/conftool/dbconfig/20210407-092320-root.json
09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P15213 and previous config saved to /var/cache/conftool/dbconfig/20210407-091610-marostegui.json
09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P15212 and previous config saved to /var/cache/conftool/dbconfig/20210407-090817-root.json
08:58 moritzm: imported quickstack for bullseye/main (part of standard packages) T275873
08:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P15211 and previous config saved to /var/cache/conftool/dbconfig/20210407-085313-root.json
08:52 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P15210 and previous config saved to /var/cache/conftool/dbconfig/20210407-083809-root.json
08:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
08:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P15209 and previous config saved to /var/cache/conftool/dbconfig/20210407-081508-root.json
08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repool db1163 after upgrade', diff saved to https://phabricator.wikimedia.org/P15207 and previous config saved to /var/cache/conftool/dbconfig/20210407-080537-root.json
08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15206 and previous config saved to /var/cache/conftool/dbconfig/20210407-080410-marostegui.json
08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P15205 and previous config saved to /var/cache/conftool/dbconfig/20210407-080005-root.json
07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Repool db1163 after upgrade', diff saved to https://phabricator.wikimedia.org/P15204 and previous config saved to /var/cache/conftool/dbconfig/20210407-075034-root.json
07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P15203 and previous config saved to /var/cache/conftool/dbconfig/20210407-074501-root.json
07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repool db1163 after upgrade', diff saved to https://phabricator.wikimedia.org/P15201 and previous config saved to /var/cache/conftool/dbconfig/20210407-073530-root.json
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P15200 and previous config saved to /var/cache/conftool/dbconfig/20210407-072957-root.json
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Repool db1163 after upgrade', diff saved to https://phabricator.wikimedia.org/P15199 and previous config saved to /var/cache/conftool/dbconfig/20210407-072027-root.json
07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1163 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15198 and previous config saved to /var/cache/conftool/dbconfig/20210407-071219-marostegui.json
07:03 gehel: repooling wdqs1005, catched up on lag
06:59 gehel: depooling wdqs1005, restarting blazegraph and waiting for it to catchup on lag
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for schema change', diff saved to https://phabricator.wikimedia.org/P15197 and previous config saved to /var/cache/conftool/dbconfig/20210407-065450-marostegui.json
06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 100%: Repool es1020', diff saved to https://phabricator.wikimedia.org/P15196 and previous config saved to /var/cache/conftool/dbconfig/20210407-063033-root.json
06:28 moritzm: restarting apache/FPM on mw canaries to pick up curl updates
06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P15195 and previous config saved to /var/cache/conftool/dbconfig/20210407-062451-root.json
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 75%: Repool es1020', diff saved to https://phabricator.wikimedia.org/P15194 and previous config saved to /var/cache/conftool/dbconfig/20210407-061529-root.json
06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P15193 and previous config saved to /var/cache/conftool/dbconfig/20210407-060948-root.json
06:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 50%: Repool es1020', diff saved to https://phabricator.wikimedia.org/P15192 and previous config saved to /var/cache/conftool/dbconfig/20210407-060026-root.json
05:54 moritzm: installing curl security updates
05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P15191 and previous config saved to /var/cache/conftool/dbconfig/20210407-055444-root.json
05:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: Repool es1020', diff saved to https://phabricator.wikimedia.org/P15190 and previous config saved to /var/cache/conftool/dbconfig/20210407-054522-root.json
05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 for upgrade', diff saved to https://phabricator.wikimedia.org/P15189 and previous config saved to /var/cache/conftool/dbconfig/20210407-054127-marostegui.json
05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P15188 and previous config saved to /var/cache/conftool/dbconfig/20210407-053940-root.json
05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: Repool es1020', diff saved to https://phabricator.wikimedia.org/P15187 and previous config saved to /var/cache/conftool/dbconfig/20210407-052901-root.json
05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 for upgrade', diff saved to https://phabricator.wikimedia.org/P15186 and previous config saved to /var/cache/conftool/dbconfig/20210407-050758-marostegui.json
05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for schema change', diff saved to https://phabricator.wikimedia.org/P15185 and previous config saved to /var/cache/conftool/dbconfig/20210407-050530-marostegui.json
03:28 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2408.codfw.wmnet with reason: REIMAGE
03:26 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2408.codfw.wmnet with reason: REIMAGE
03:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2407.codfw.wmnet with reason: REIMAGE
03:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2407.codfw.wmnet with reason: REIMAGE
02:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2406.codfw.wmnet with reason: REIMAGE
02:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2406.codfw.wmnet with reason: REIMAGE
02:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2405.codfw.wmnet with reason: REIMAGE
02:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2405.codfw.wmnet with reason: REIMAGE
01:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2404.codfw.wmnet with reason: REIMAGE
01:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2404.codfw.wmnet with reason: REIMAGE
01:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: REIMAGE
01:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: REIMAGE
01:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2403.codfw.wmnet with reason: REIMAGE
01:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2403.codfw.wmnet with reason: REIMAGE
01:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2402.codfw.wmnet with reason: REIMAGE
01:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2402.codfw.wmnet with reason: REIMAGE
01:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2401.codfw.wmnet with reason: REIMAGE
01:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2401.codfw.wmnet with reason: REIMAGE
00:45 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2400.codfw.wmnet with reason: REIMAGE
00:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2400.codfw.wmnet with reason: REIMAGE
00:38 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2399.codfw.wmnet with reason: REIMAGE
00:36 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2399.codfw.wmnet with reason: REIMAGE
00:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2398.codfw.wmnet with reason: REIMAGE
00:10 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2398.codfw.wmnet with reason: REIMAGE
00:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2397.codfw.wmnet with reason: REIMAGE
00:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2397.codfw.wmnet with reason: REIMAGE

2021-04-06

23:36 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/resources/src/: b8a0dab: Fix missing styles on diff (T279099) (duration: 01m 08s)
23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 997b6f3: thwikisource: Enable transwiki import (T275281) (duration: 01m 08s)
23:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4d12a86: Disable upcoming DiscussionTools features for now (duration: 01m 08s)
19:49 dduvall@deploy1002: Pruned MediaWiki: 1.36.0-wmf.36 (duration: 01m 50s)
19:47 dduvall@deploy1002: Pruned MediaWiki: 1.36.0-wmf.35 (duration: 02m 02s)
19:45 dduvall@deploy1002: Pruned MediaWiki: 1.36.0-wmf.34 (duration: 03m 37s)
19:40 marxarelli: 1.36.0-wmf.38 rolled to group0. error rates steady and no new errors spotted (T278344)
19:26 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.38
18:41 dduvall@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.38 (duration: 33m 31s)
18:20 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Sturm . # T278856
18:18 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet
18:18 bblack: cp2036 - re-pooling via confctl
18:14 bblack: dns2001 - re-enabling and running puppet agent to restore service
18:10 dduvall@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.38
18:07 andrew@deploy1002: Finished deploy [horizon/deploy@392708e]: Updating Horizon to 'main' to see if that works around T279465 (duration: 04m 10s)
18:03 andrew@deploy1002: Started deploy [horizon/deploy@392708e]: Updating Horizon to 'main' to see if that works around T279465
17:51 bblack: dns2001 - manually disabled puppet and stopped pdns-recursor.service (and thus implicitly BIRD) to manual-depool due to switch port issues
17:49 bblack: cp2036 - explicitly confctl-depooled due to switch issues
17:48 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp2036.codfw.wmnet
17:05 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: Upgrade to Horizon/Wallaby (take two) (duration: 03m 30s)
17:02 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: Upgrade to Horizon/Wallaby (take two)
16:28 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: Upgrade to Horizon/Wallaby (duration: 04m 32s)
16:23 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: Upgrade to Horizon/Wallaby
16:16 krinkle@deploy1002: Synchronized php-1.36.0-wmf.37/skins/Vector/: I3234e7712b8c1 (duration: 01m 01s)
15:49 Urbanecm: Start server-side upload for 3 video files (T279189, T279188, T279183)
15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repool db1163', diff saved to https://phabricator.wikimedia.org/P15182 and previous config saved to /var/cache/conftool/dbconfig/20210406-153123-root.json
15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Repool db1163', diff saved to https://phabricator.wikimedia.org/P15180 and previous config saved to /var/cache/conftool/dbconfig/20210406-151619-root.json
15:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on sretest1002.eqiad.wmnet with reason: bullseye tests
15:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on sretest1002.eqiad.wmnet with reason: bullseye tests
15:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repool db1163', diff saved to https://phabricator.wikimedia.org/P15179 and previous config saved to /var/cache/conftool/dbconfig/20210406-150115-root.json
14:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Repool db1163', diff saved to https://phabricator.wikimedia.org/P15178 and previous config saved to /var/cache/conftool/dbconfig/20210406-144612-root.json
14:31 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 97 hosts with reason: upgrading openstack
14:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 97 hosts with reason: upgrading openstack
14:30 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 9 hosts with reason: upgrading openstack
14:30 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 9 hosts with reason: upgrading openstack
14:29 dcaro: populated thirdparty/ceph-octopus buster repo with reprepro (T274566)
14:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
13:57 moritzm: upgrading sretest1002 to bullseye
13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1163 for schema change', diff saved to https://phabricator.wikimedia.org/P15177 and previous config saved to /var/cache/conftool/dbconfig/20210406-134418-marostegui.json
13:37 Urbanecm: Retrying server-side upload for 1 file (T279192)
13:20 Urbanecm: Start server-side upload for 4 video files (T279191, T279192, T279193, T279190)
12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P15176 and previous config saved to /var/cache/conftool/dbconfig/20210406-124614-root.json
12:43 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:43 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:42 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:42 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P15175 and previous config saved to /var/cache/conftool/dbconfig/20210406-123111-root.json
12:28 moritzm: installing netty security updates
12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P15174 and previous config saved to /var/cache/conftool/dbconfig/20210406-121607-root.json
12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P15173 and previous config saved to /var/cache/conftool/dbconfig/20210406-120104-root.json
11:57 moritzm: installing openjpeg2 security updates on buster
11:43 moritzm: removed mw2247 from debmonitor T277780
11:37 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:37 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:37 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1164 for schema change', diff saved to https://phabricator.wikimedia.org/P15172 and previous config saved to /var/cache/conftool/dbconfig/20210406-112839-marostegui.json
11:07 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable legacy javascript globals on all wikis except some big ones (T72470) (duration: 01m 01s)
10:57 moritzm: upload wmf-laptop 0.5.1 to buster-wikimedia component/wmf-sre-laptop
10:55 moritzm: remove wmf-laptop 0.5.0 from buster-wikimedia (incorrect import to main, next upload will land in component/wmf-sre-laptop)
10:33 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
10:31 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15171 and previous config saved to /var/cache/conftool/dbconfig/20210406-100329-root.json
09:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1002.eqiad.wmnet with reason: REIMAGE
09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1002.eqiad.wmnet with reason: REIMAGE
09:49 Urbanecm: Start server-side upload for 1 video file (T279418)
09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15170 and previous config saved to /var/cache/conftool/dbconfig/20210406-094825-root.json
09:41 Urbanecm: Start server side upload for 4 video files (T279197, T279196, T279195, T279194)
09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15169 and previous config saved to /var/cache/conftool/dbconfig/20210406-093322-root.json
09:32 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for conf2003.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
09:31 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for conf2003.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
09:30 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for conf2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
09:29 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for conf2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
09:29 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for kraz.wikimedia.org: Renew puppet certificate - jbond@cumin1001
09:28 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for kraz.wikimedia.org: Renew puppet certificate - jbond@cumin1001
09:28 jbond42: renew puppet cert for kraz T279410
09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15168 and previous config saved to /var/cache/conftool/dbconfig/20210406-091818-root.json
08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es4 master', diff saved to https://phabricator.wikimedia.org/P15167 and previous config saved to /var/cache/conftool/dbconfig/20210406-083248-marostegui.json
08:07 moritzm: installing underscore security updates on buster
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repool es1022', diff saved to https://phabricator.wikimedia.org/P15166 and previous config saved to /var/cache/conftool/dbconfig/20210406-075957-root.json
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repool es1022', diff saved to https://phabricator.wikimedia.org/P15165 and previous config saved to /var/cache/conftool/dbconfig/20210406-074453-root.json
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repool es1022', diff saved to https://phabricator.wikimedia.org/P15164 and previous config saved to /var/cache/conftool/dbconfig/20210406-072950-root.json
07:20 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836 T268435
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repool es1022', diff saved to https://phabricator.wikimedia.org/P15162 and previous config saved to /var/cache/conftool/dbconfig/20210406-071446-root.json
06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P15161 and previous config saved to /var/cache/conftool/dbconfig/20210406-065539-root.json
06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 for schema change', diff saved to https://phabricator.wikimedia.org/P15160 and previous config saved to /var/cache/conftool/dbconfig/20210406-065131-marostegui.json
06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P15159 and previous config saved to /var/cache/conftool/dbconfig/20210406-064036-root.json
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 for upgrade', diff saved to https://phabricator.wikimedia.org/P15158 and previous config saved to /var/cache/conftool/dbconfig/20210406-063938-marostegui.json
06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es1020', diff saved to https://phabricator.wikimedia.org/P15157 and previous config saved to /var/cache/conftool/dbconfig/20210406-063858-marostegui.json
06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 for upgrade', diff saved to https://phabricator.wikimedia.org/P15156 and previous config saved to /var/cache/conftool/dbconfig/20210406-063759-marostegui.json
06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P15155 and previous config saved to /var/cache/conftool/dbconfig/20210406-062532-root.json
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for decommission T274752', diff saved to https://phabricator.wikimedia.org/P15154 and previous config saved to /var/cache/conftool/dbconfig/20210406-061500-marostegui.json
06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P15153 and previous config saved to /var/cache/conftool/dbconfig/20210406-061028-root.json
05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for upgrade', diff saved to https://phabricator.wikimedia.org/P15152 and previous config saved to /var/cache/conftool/dbconfig/20210406-055324-marostegui.json
05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2106 and db2147 after a crash', diff saved to https://phabricator.wikimedia.org/P15151 and previous config saved to /var/cache/conftool/dbconfig/20210406-053427-marostegui.json
02:18 eileen: civicrm revision changed from 740e49d868 to eb9379daa3, config revision is 6779e3829a
01:55 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:47 pt1979@cumin2001: START - Cookbook sre.dns.netbox

2021-04-05

23:17 AaronSchulz: Running importMissingLocalNames.php on mwmaint1002 in a screen
20:58 sbassett: re-deploy security patch for T270453 to wmf.37
20:50 sbassett: re-deploy security patch for T270988 to wmf.37
20:43 mholloway-shell@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add event stream config for android.image_recommendation_interaction (duration: 00m 59s)
19:31 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: Returning cloudweb2001-dev to Horizon/Wallaby (duration: 01m 41s)
19:30 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: Returning cloudweb2001-dev to Horizon/Wallaby
19:08 andrew@deploy1002: Finished deploy [horizon/deploy@392708e]: Experimental main deploy of Horizon (duration: 02m 04s)
19:06 andrew@deploy1002: Started deploy [horizon/deploy@392708e]: Experimental main deploy of Horizon
18:28 tgr_: Morning deploys done
18:28 tgr@deploy1002: Synchronized dblists/growthexperiments.dblist: Config: Fix growthexperiments.dblist (T275171) (duration: 00m 58s)
18:27 tgr@deploy1002: Synchronized wmf-config/config/frwiki.yaml: Config: Fix growthexperiments.dblist (T275171) (duration: 00m 59s)
17:39 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
17:05 dpifke@deploy1002: Finished deploy [performance/navtiming@bc5af87]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/676006 (duration: 00m 05s)
17:05 dpifke@deploy1002: Started deploy [performance/navtiming@bc5af87]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/676006
16:45 Urbanecm: Start server-side upload of 4 video files (T279204, T279201, T279200, T279198)
14:43 XioNoX: push pfw policies - T278970
14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15149 and previous config saved to /var/cache/conftool/dbconfig/20210405-140825-root.json
14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15148 and previous config saved to /var/cache/conftool/dbconfig/20210405-140751-root.json
13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15147 and previous config saved to /var/cache/conftool/dbconfig/20210405-135321-root.json
13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15146 and previous config saved to /var/cache/conftool/dbconfig/20210405-135248-root.json
13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15145 and previous config saved to /var/cache/conftool/dbconfig/20210405-133818-root.json
13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15144 and previous config saved to /var/cache/conftool/dbconfig/20210405-133744-root.json
13:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15143 and previous config saved to /var/cache/conftool/dbconfig/20210405-132314-root.json
13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P15142 and previous config saved to /var/cache/conftool/dbconfig/20210405-132240-root.json
13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for upgrade', diff saved to https://phabricator.wikimedia.org/P15141 and previous config saved to /var/cache/conftool/dbconfig/20210405-131221-marostegui.json
12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P15140 and previous config saved to /var/cache/conftool/dbconfig/20210405-124118-marostegui.json
12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15139 and previous config saved to /var/cache/conftool/dbconfig/20210405-123751-root.json
12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15138 and previous config saved to /var/cache/conftool/dbconfig/20210405-122247-root.json
12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15137 and previous config saved to /var/cache/conftool/dbconfig/20210405-120744-root.json
12:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts scb[1001-1004].eqiad.wmnet
12:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts scb[2001-2006].codfw.wmnet
11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15136 and previous config saved to /var/cache/conftool/dbconfig/20210405-115240-root.json
11:11 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts scb[1001-1004].eqiad.wmnet
11:09 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts scb[2001-2006].codfw.wmnet
11:06 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts scb[2001-2006].codfw.wmnet
11:06 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts scb[2001-2006].codfw.wmnet
11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for schema change', diff saved to https://phabricator.wikimedia.org/P15135 and previous config saved to /var/cache/conftool/dbconfig/20210405-110506-marostegui.json
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15134 and previous config saved to /var/cache/conftool/dbconfig/20210405-105731-root.json
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15133 and previous config saved to /var/cache/conftool/dbconfig/20210405-105715-root.json
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15132 and previous config saved to /var/cache/conftool/dbconfig/20210405-104227-root.json
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15131 and previous config saved to /var/cache/conftool/dbconfig/20210405-104211-root.json
10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113 (s5,s6) after upgrade', diff saved to https://phabricator.wikimedia.org/P15130 and previous config saved to /var/cache/conftool/dbconfig/20210405-104010-marostegui.json
10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6) for upgrade', diff saved to https://phabricator.wikimedia.org/P15129 and previous config saved to /var/cache/conftool/dbconfig/20210405-103318-marostegui.json
10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15128 and previous config saved to /var/cache/conftool/dbconfig/20210405-103301-root.json
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15127 and previous config saved to /var/cache/conftool/dbconfig/20210405-102724-root.json
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15126 and previous config saved to /var/cache/conftool/dbconfig/20210405-102708-root.json
10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15125 and previous config saved to /var/cache/conftool/dbconfig/20210405-101757-root.json
10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P15124 and previous config saved to /var/cache/conftool/dbconfig/20210405-101213-root.json
10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P15123 and previous config saved to /var/cache/conftool/dbconfig/20210405-101204-root.json
10:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15122 and previous config saved to /var/cache/conftool/dbconfig/20210405-100253-root.json
10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1,s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P15121 and previous config saved to /var/cache/conftool/dbconfig/20210405-100246-marostegui.json
09:50 marostegui: Deploy schema change on s1 codfw, lag will appear in codfw - T276150 T276156
09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Pool in s7', diff saved to https://phabricator.wikimedia.org/P15120 and previous config saved to /var/cache/conftool/dbconfig/20210405-094744-root.json
09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1181 T275633', diff saved to https://phabricator.wikimedia.org/P15119 and previous config saved to /var/cache/conftool/dbconfig/20210405-091043-marostegui.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1181 T275633', diff saved to https://phabricator.wikimedia.org/P15118 and previous config saved to /var/cache/conftool/dbconfig/20210405-082521-marostegui.json
08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15117 and previous config saved to /var/cache/conftool/dbconfig/20210405-080523-root.json
07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15116 and previous config saved to /var/cache/conftool/dbconfig/20210405-075019-root.json
07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15115 and previous config saved to /var/cache/conftool/dbconfig/20210405-073515-root.json
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool after schema change', diff saved to https://phabricator.wikimedia.org/P15114 and previous config saved to /var/cache/conftool/dbconfig/20210405-072012-root.json
06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1181 in s7 with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15113 and previous config saved to /var/cache/conftool/dbconfig/20210405-064727-marostegui.json
05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1181 in s7 with minimal weight T275633', diff saved to https://phabricator.wikimedia.org/P15112 and previous config saved to /var/cache/conftool/dbconfig/20210405-054951-marostegui.json
05:30 marostegui: Deploy schema change on db1121, lag will appear on s4 on wikireplicas
05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for schema change', diff saved to https://phabricator.wikimedia.org/P15111 and previous config saved to /var/cache/conftool/dbconfig/20210405-053000-marostegui.json
05:12 marostegui: Restart all sanitarium hosts to pick up new filters T278573

2021-04-04

14:47 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 01m 36s)
14:45 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch

2021-04-03

19:20 andrew@deploy1002: Finished deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch (duration: 02m 11s)
19:18 andrew@deploy1002: Started deploy [horizon/deploy@df2b0b4]: upgrade labtesthorizon to the Wallaby branch
17:30 andrew@deploy1002: Finished deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch (duration: 03m 35s)
17:26 andrew@deploy1002: Started deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch
16:44 elukey: power reset for ms-be2028 - not reachable via ssh, no tty available via mgmt console, NMI unrecoverable errors logged in iLo's system logs
15:35 andrew@deploy1002: Finished deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch (duration: 02m 18s)
15:33 andrew@deploy1002: Started deploy [horizon/deploy@3a84c77]: upgrade labtesthorizon to the Wallaby branch
15:12 andrew@deploy1002: Finished deploy [horizon/deploy@8833f80]: upgrade labtesthorizon to the Wallaby branch (duration: 11m 51s)
15:00 andrew@deploy1002: Started deploy [horizon/deploy@8833f80]: upgrade labtesthorizon to the Wallaby branch
05:38 andrew@deploy1002: Finished deploy [horizon/deploy@35199a3]: upgrade labtesthorizon to the Wallaby branch (duration: 03m 05s)
05:35 andrew@deploy1002: Started deploy [horizon/deploy@35199a3]: upgrade labtesthorizon to the Wallaby branch

2021-04-02

22:31 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
22:31 bstorm@cumin1001: Added views for new wiki: trvwiki T276246
22:08 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
22:08 mutante: pooled mw2395,mw2396 as API appservers running on new hardware
22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[5-6].codfw.wmnet
21:58 legoktm: legoktm@lists1002:~$ time sudo mailman-web rebuild_index
21:56 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[5-6].codfw.wmnet
21:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw239[5-6].codfw.wmnet
21:48 mutante: mw2395, mw2396 - reboot - becoming API servers
21:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw239[0-4].codfw.wmnet
21:42 mutante: pooled 12 brand-new codfw appservers running on new hardware generation
21:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw238[5-9].codfw.wmnet
21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2384.codfw.wmnet
21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2383.codfw.wmnet
21:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2395-2396].codfw.wmnet with reason: new_install
21:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2395-2396].codfw.wmnet with reason: new_install
21:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: REIMAGE
21:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: REIMAGE
21:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: REIMAGE
21:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw239[0-4].codfw.wmnet
21:34 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw238[3-9].codfw.wmnet
21:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: REIMAGE
21:28 legoktm: imported python-xapian-haystack 2.1.0-6~wmf1 on apt1001 (T278717)
21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2394.codfw.wmnet
21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2393.codfw.wmnet
21:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2392.codfw.wmnet
21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2391.codfw.wmnet
21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2390.codfw.wmnet
21:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2389.codfw.wmnet
21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2388.codfw.wmnet
21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2387.codfw.wmnet
21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2386.codfw.wmnet
21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2385.codfw.wmnet
21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2384.codfw.wmnet
21:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2383.codfw.wmnet
21:19 mutante: generating mcrouter certs for mw2395 through mw2404 (T278396)
21:07 mutante: mw2383 through mw2394 - 'uptime && scap pull' via ssh -C (not cumin because it needs to run as non-root)
20:58 mutante: mw238* - scap pull via cumin not possible because it doesnt work as root
20:50 andrew@deploy1002: Finished deploy [horizon/deploy@86c7cdc]: tweak to affinity group options (duration: 03m 39s)
20:46 andrew@deploy1002: Started deploy [horizon/deploy@86c7cdc]: tweak to affinity group options
20:44 mutante: mw2385 through mw2394 - serial rebooting
20:43 mutante: mw2384 reboot
20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[2390-2394].codfw.wmnet with reason: new_install
20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[2390-2394].codfw.wmnet with reason: new_install
20:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: new_install
20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: new_install
20:40 andrew@deploy1002: Finished deploy [horizon/deploy@86c7cdc]: update horizon for codfw1dev (duration: 01m 47s)
20:39 andrew@deploy1002: Started deploy [horizon/deploy@86c7cdc]: update horizon for codfw1dev
20:09 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
20:09 bstorm@cumin1001: Added views for new wiki: taywiki T275836
19:47 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
19:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2383.codfw.wmnet with reason: new_install
19:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: new_install
19:07 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
19:07 bstorm@cumin1001: Added views for new wiki: mnwwiktionary T276126
18:44 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
18:44 mutante: [puppetmaster1001:~] $ sudo puppet node deactivate mw2247.codfw.wmnet
18:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2247.codfw.wmnet
18:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2247.codfw.wmnet
17:57 legoktm: upgraded mailman3 python3-django-postorius on lists1002
15:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
15:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
15:45 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
15:45 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
15:41 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
14:35 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=jobrunner,name=mw133[7-8].eqiad.wmnet
14:34 jiji@cumin1001: conftool action : set/weight=20; selector: cluster=videoscaler,name=mw133[5-6].eqiad.wmnet
14:32 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw133[5-6].eqiad.wmnet
14:31 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw133[7-8].eqiad.wmnet
14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
14:29 jiji@cumin1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1111.eqiad.wmnet
14:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: REIMAGE
14:20 Urbanecm: Start server-side upload for 3 video files (T279060, T279061, T279062)
14:09 Urbanecm: Start server-side upload for 3 video files (T279138, T279137, T279136)
13:42 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.37
13:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
13:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: REIMAGE
13:11 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/load.php: T278579 (duration: 00m 58s)
13:10 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/includes/OutputHandler.php: T278579 (duration: 00m 57s)
13:08 reedy@deploy1002: Synchronized php-1.36.0-wmf.37/includes/MediaWiki.php: T278579 (duration: 00m 58s)
11:46 Urbanecm: correction: Start server-side upload for 3 video files (T279079, T279080, T279104)
11:45 Urbanecm: Start server-side upload for 3 images (T279079, T279080, T279104)
10:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
10:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: REIMAGE
10:14 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
10:14 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
10:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
10:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
10:11 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
10:11 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
10:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: Rollback group0 wikis to 1.36.0-wmf.36 - T278343
09:45 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 and group2 wikis to 1.36.0-wmf.36 - T278343
09:44 hashar@deploy1002: sync-wikiversions aborted: Revert group1 and group2 wikis to 1.36.0-wmf.36 (duration: 00m 01s)
09:06 dcausse: remove dumps from wdqs1009 to free disk space
07:33 effie: powercycle an-worker1080
07:28 elukey: manual fix for an-worker1080's interface in netbox (xe-4/0/11), moved by mistake to public-1b
03:54 dwisehaupt: replication user on fundraising db set to require ssl for connections at the mysql user level. db updated on frdb1004 and verified on a set of hosts
03:16 dwisehaupt: replication user on payments db set to require ssl for connections at the mysql user level. db updated on payments1001 and verified on a set of hosts

2021-04-01

23:32 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: Backport: Revert "Turn on glent m1 AB test" T262612 (duration: 00m 58s)
23:28 thcipriani: reset /srv/mediawiki-staging/php-1.36.0-wmf.37/extensions/TimedMediaHandler to 1be781d (HEAD of wmf/1.36.0-wmf.37 -- from HEAD of master 49f417)
23:12 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Backport: Part III Add hi-res version of mediawiki.org logos T268230 (duration: 00m 57s)
23:10 thcipriani@deploy1002: Synchronized logos: Backport: Part II Add hi-res version of mediawiki.org logos T268230 (duration: 00m 57s)
23:08 thcipriani@deploy1002: Synchronized static: Backport: Part I Add hi-res version of mediawiki.org logos T268230 (duration: 00m 59s)
22:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2248.codfw.wmnet
22:50 twentyafterfour@deploy1002: Finished deploy [releng/phatality@27ddd0b]: deploy phatality (duration: 00m 13s)
22:50 twentyafterfour@deploy1002: Started deploy [releng/phatality@27ddd0b]: deploy phatality
22:49 twentyafterfour: deploying phatality
22:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2248.codfw.wmnet
22:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2247.codfw.wmnet
22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2247.codfw.wmnet
22:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2246.codfw.wmnet
22:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2246.codfw.wmnet
21:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2243.codfw.wmnet
21:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2243.codfw.wmnet
20:42 mutante: mw2243, mw2246, mw2247, mw2248 - depooled - replaced by mw2379, mw2380, mw2381, mw2382 ( T277780)
20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2248.codfw.wmnet
20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2247.codfw.wmnet
20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2246.codfw.wmnet
20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2243.codfw.wmnet
20:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2382.codfw.wmnet
20:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2381.codfw.wmnet
20:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
20:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2379.codfw.wmnet
20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2379.codfw.wmnet with reason: new_install
20:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: new_install
20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2380.codfw.wmnet with reason: new_install
20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: new_install
20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2382.codfw.wmnet with reason: new_install
20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2382.codfw.wmnet with reason: new_install
20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2381.codfw.wmnet with reason: new_install
20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2381.codfw.wmnet with reason: new_install
20:01 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1 (duration: 00m 04s)
20:01 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1
20:01 razzi@deploy1002: deploy aborted: Deployment of superset fd7c9eb71e193, released after 1.0.1hv (duration: 00m 00s)
20:01 mutante: mw2379, mw2380, mw2381, mw2382 - scap pull
19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2382.codfw.wmnet
19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2381.codfw.wmnet
19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
19:59 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1 (duration: 00m 21s)
19:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2379.codfw.wmnet
19:58 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1
19:57 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
19:57 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
19:56 razzi@deploy1002: Finished deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1 (duration: 00m 12s)
19:56 razzi@deploy1002: Started deploy [analytics/superset/deploy@5b8de4c]: Deployment of superset fd7c9eb71e193, released after 1.0.1
19:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
19:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
19:51 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
19:37 mutante: pooled parse2001 again after twentyaftefour rebuilt the l10n cache for wmf.37 which fixed it and made Apache alert recover (T268524)
19:34 mutante: mw2379, mw2380, mw2381, mw2382 - rebooting
19:34 twentyafterfour@deploy1002: scap sync-l10n completed (1.36.0-wmf.37) (duration: 02m 38s)
19:30 mutante: depooled parse2001 because on train deployment it caused "MWException: No localisation cache found for English" and then "HTTP CRITICAL: HTTP/1.1 500 Internal Server Error" (T268524)
19:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
19:28 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2001.codfw.wmnet
19:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=parse2001.codfw.wmnet
19:21 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.37 refs T278343
18:59 mutante: creating mcrouter certs for mw2379 thorugh mw2382
18:35 Urbanecm: Morning B&C window done
18:33 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikibaseMediaInfo/resources/mediasearch-vue/components/base/Dialog.vue: e77f2b9: Use appendChild() instead of append() (T278448) (duration: 01m 09s)
18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b485d1c: Enable SandboxLink extension in ptwikinews (T278634) (duration: 01m 12s)
17:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: REIMAGE
17:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: REIMAGE
17:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:59 Urbanecm: Start server-side upload of two files (T279082, T279081)
16:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1007.eqiad.wmnet
16:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a7acf33: hrwiki: Fix help panel links (T275684) (duration: 01m 10s)
16:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2396.codfw.wmnet with reason: REIMAGE
16:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2396.codfw.wmnet with reason: REIMAGE
16:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
16:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2395.codfw.wmnet with reason: REIMAGE
15:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
15:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2394.codfw.wmnet with reason: REIMAGE
15:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2393.codfw.wmnet with reason: REIMAGE
15:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2393.codfw.wmnet with reason: REIMAGE
15:32 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2391.codfw.wmnet with reason: REIMAGE
15:30 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2391.codfw.wmnet with reason: REIMAGE
15:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2392.codfw.wmnet with reason: REIMAGE
15:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2392.codfw.wmnet with reason: REIMAGE
14:52 volans: uploaded python3-wmflib_0.0.7 to bullseye-wikimedia
14:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2390.codfw.wmnet with reason: REIMAGE
14:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2390.codfw.wmnet with reason: REIMAGE
14:22 effie: disable puppet on mw* canaries, rolling depool and pooling of canaries
14:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE
14:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1001.eqiad.wmnet with reason: REIMAGE
14:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2389.codfw.wmnet with reason: REIMAGE
13:59 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2389.codfw.wmnet with reason: REIMAGE
13:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2388.codfw.wmnet with reason: REIMAGE
13:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2388.codfw.wmnet with reason: REIMAGE
13:24 ema: cp3054: reboot with Linux 4.19.181+1 -- the kernel was not upgraded earlier during T273278 reboots due to broken dpkg status
13:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
13:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
12:59 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:53 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:51 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:47 moritzm: drain ganeti1022
12:46 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:45 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
12:40 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:38 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2003-dev.codfw.wmnet
12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
12:34 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2003-dev.codfw.wmnet
12:23 moritzm: drain ganeti1021
12:21 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2003-dev.codfw.wmnet
12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
12:15 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2003-dev.codfw.wmnet
12:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
11:59 Urbanecm: Start server upload of two video files (~4 GB in total) # T278856
11:55 moritzm: drain ganeti1020
11:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
11:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable RelatedArticles on Timeless skin on German Wikipedia (T278611) (duration: 01m 08s)
11:41 moritzm: drain ganeti1019
11:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
11:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
{{safesubst:SAL entry|1=11:23 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:674820|Enable MediaSearch by default for anonymous users (duration: 01m 10s)}}
11:20 moritzm: drain ganeti1018
11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
11:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
11:00 moritzm: drain ganeti1017
10:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
10:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
10:39 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2002-dev.codfw.wmnet
10:33 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2002-dev.codfw.wmnet
10:33 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2001-dev.codfw.wmnet
10:26 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2001-dev.codfw.wmnet
09:07 hashar: contint2001: compressing files with 4 parallel executions: sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -print0|xargs -0 -P4 gzip
09:01 hashar: contint2001: compressing all fresnel trace--trace.json files: sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -exec gzip {} \+ # T249268
08:52 moritzm: drain ganeti1011
08:35 moritzm: failover Ganeti master in eqiad to ganeti1009
08:25 moritzm: installing ldb security updates
08:12 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
08:12 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
08:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
08:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
08:09 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
07:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
07:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
07:55 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
06:37 elukey: powercycle cp1087 (no ssh, no tty via serial console) - T278729
06:35 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet
02:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2386.codfw.wmnet with reason: REIMAGE
02:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2386.codfw.wmnet with reason: REIMAGE
02:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2387.codfw.wmnet with reason: REIMAGE
02:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2387.codfw.wmnet with reason: REIMAGE
02:16 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
02:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
01:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2385.codfw.wmnet with reason: REIMAGE
01:52 Reedy: `echo "https://www.mediawiki.org/static/images/footer/poweredby_mediawiki_176x62.png" | mwscript purgeList.php --wiki=enwiki` T268230
01:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2385.codfw.wmnet with reason: REIMAGE
01:51 Reedy: `echo "https://www.mediawiki.org/favicon.ico" | mwscript purgeList.php --wiki=enwiki` T268230
01:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
01:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2384.codfw.wmnet with reason: REIMAGE
01:24 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
01:22 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
01:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2382.codfw.wmnet with reason: REIMAGE
01:10 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2382.codfw.wmnet with reason: REIMAGE
00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2381.codfw.wmnet with reason: REIMAGE
00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2381.codfw.wmnet with reason: REIMAGE
00:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
00:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
00:32 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
00:30 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
00:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:08 legoktm: uploaded mailman3 3.2.1-1+wmf1, postorius 1.2.4-1+wmf1 to apt.wikimedia.org
00:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox

2021-03-31

23:34 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/Wikibase/client/includes/DataAccess/Scribunto/: bfc8f55: Eliminate another php.getSetting() from Lua code (duration: 01m 09s)
23:32 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/Wikibase/client/includes/DataAccess/Scribunto/: ad564a0: Eliminate another php.getSetting() from Lua code (duration: 01m 10s)
23:12 jhuneidi@deploy1002: Synchronized .pipeline/config.yaml: Config: Include private folder in restricted image (T276145) (duration: 01m 08s)
23:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Use the new mediawiki logos, part II (T268230) (duration: 01m 11s)
23:03 ladsgroup@deploy1002: Synchronized static: Use the new mediawiki logos, part I (T268230) (duration: 01m 09s)
22:58 Urbanecm: Start server side upload for 3 files
22:01 Urbanecm: Server side upload of three video files (T279011, T278956, T278955)
22:01 eileen: civicrm revision changed from 2fcea570bd to 740e49d868, config revision is 6779e3829a
20:16 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:00 dwisehaupt: shifted payments2003 to use gtid for mysql replication.
19:55 robh@cumin1001: START - Cookbook sre.dns.netbox
19:21 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.37 refs T278343 (duration: 01m 08s)
19:20 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.37 refs T278343
19:18 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:13 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37 refs T278343
19:06 robh@cumin1001: START - Cookbook sre.dns.netbox
19:03 twentyafterfour@deploy1002: Synchronized php-1.36.0-wmf.37/includes/Revision/RevisionRecord.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/675875 to unblock train refs T278376 T278343 (duration: 00m 58s)
17:56 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.36 refs T278343
17:49 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.37 refs T278343
17:41 twentyafterfour: The train is now unblocked, promoting to group0 refs T278343
17:01 Urbanecm: Server side upload of three video files (T278959, T278958, T278957)
15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
15:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
14:57 papaul: disconnecting ps1-d8-codfw for replacement
14:17 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1007.eqiad.wmnet
14:02 Urbanecm: Server side upload of two video files (T278961, T278960)
13:48 jynus: retrying s3 snapshot on codfw
13:39 akosiaris: revert mw1412, mw1413, wtp1032, mw2305 to the previous state for T278220
13:34 akosiaris: disabling puppet on role::mediawiki::appserver, role::mediawiki::appserver::api, role::mediawiki::maintenance, role::mediawiki::jobrunner, role::parsoid, role::parsoid::testing T278220
13:00 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters. The video transcoding backlog has been served we can return to "normal"
12:59 akosiaris: repool all jobrunners/videoscalers in the respective conftool clusters
12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler
12:59 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=jobrunner
11:38 awight: EU deployment complete
11:38 awight@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/WikibaseMediaInfo: Backport: Style change to mediasearch logged-in notice close (T274927) Suppress user notice on mobile (T274927) Reset namespace filter on cancel (T276261) (duration: 01m 08s)
11:26 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: vector: Disable WVUI search widget treatment A/B test (T276917) (duration: 01m 08s)
10:48 effie: enable puppet on all mw* servers
10:10 effie: disable puppet on all mw* hosts
09:03 hashar: contint2001: enable puppet again
08:38 hashar: contint2001: stopping Puppet for an Apache config live hack
04:35 eileen: civicrm revision changed from 7040b68c11 to 2fcea570bd, config revision is 6779e3829a
02:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
02:22 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:17 pt1979@cumin2001: START - Cookbook sre.dns.netbox
02:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
02:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
02:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2003.codfw.wmnet with reason: REIMAGE
02:00 pt1979@cumin2001: START - Cookbook sre.dns.netbox
01:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
01:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2002.codfw.wmnet with reason: REIMAGE
01:15 urbanecm@deploy1002: Synchronized wmf-config/config/gawiki.yaml: 3283ae5: Enable local uploads on Irish Wikipedia (T277723) (duration: 01m 08s)
01:13 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: 3283ae5: Enable local uploads on Irish Wikipedia (T277723) (duration: 01m 08s)
01:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE
01:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2001.codfw.wmnet with reason: REIMAGE

2021-03-30

23:59 Trey314159: reindexing English wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T274200)
23:56 legoktm@deploy1002: Synchronized php-1.36.0-wmf.37/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default (T278867) (duration: 01m 08s)
23:53 legoktm@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/TimedMediaHandler/extension.json: Allow autoconfirmed users to see Special:TranscodeStatistics by default (T278867) (duration: 01m 08s)
23:29 Amir1: sudo django-admin hyperkitty_import -l discovery-alerts@lists-next.wikimedia.org discovery-alerts.mbox/discovery-alerts.mbox --pythonpath /usr/share/mailman3-web --settings settings (T278609)
23:27 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ef306a3: Growth features: bnwiki: Enable impact module (T274793) (duration: 01m 07s)
22:52 cstone: civicrm revision changed from ad430721f6 to 7040b68c11
21:11 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: rollback (duration: 00m 12s)
21:11 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: rollback
21:05 twentyafterfour@deploy1002: Finished deploy [releng/phatality@fbca60c]: trying again with newly built zip (duration: 00m 12s)
21:05 twentyafterfour@deploy1002: Started deploy [releng/phatality@fbca60c]: trying again with newly built zip
21:02 legoktm: scap pulling on mw1298
20:59 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 15s)
20:58 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
20:58 legoktm: killed remaining ffmpeg on mw1298
20:56 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 12s)
20:56 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
20:53 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
20:52 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
20:41 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 20s)
20:41 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
20:41 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
20:40 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
20:38 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
20:37 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
20:37 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 31s)
20:36 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
20:35 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 05s)
20:35 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
20:34 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
20:34 twentyafterfour@deploy1002: Started restart [releng/phatality@715d809]: (no justification provided)
20:33 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.37 refs T278343 (duration: 80m 32s)
20:29 twentyafterfour@deploy1002: Finished deploy [releng/phatality@715d809]: (no justification provided) (duration: 00m 49s)
20:29 twentyafterfour@deploy1002: Started deploy [releng/phatality@715d809]: (no justification provided)
20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1307.eqiad.wmnet
20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1306.eqiad.wmnet
20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1305.eqiad.wmnet
20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1304.eqiad.wmnet
20:28 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1303.eqiad.wmnet
20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1307.eqiad.wmnet
20:28 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1306.eqiad.wmnet
20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1305.eqiad.wmnet
20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1304.eqiad.wmnet
20:27 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1303.eqiad.wmnet
20:26 twentyafterfour: preparing to deploy phatality upgrade to kibana cluster
20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1296.eqiad.wmnet
20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1298.eqiad.wmnet
20:25 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1299.eqiad.wmnet
20:21 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a] (duration: 04m 29s)
20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1299.eqiad.wmnet
20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1298.eqiad.wmnet
20:20 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1296.eqiad.wmnet
20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1a53e9a]
20:16 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a] (duration: 00m 07s)
20:16 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a] (thin): Regular analytics weekly train THIN [analytics/refinery@1a53e9a]
20:15 joal@deploy1002: Finished deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a] (duration: 17m 11s)
20:02 twentyafterfour: when syncing 1.36.0-wmf.37 promote to testwikis, one server failed: server mw1298.eqiad.wmnet and two more appear to be hung because scap is stuck at 2 left 99% without making any progress for a long time now. refs T278343
19:58 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
19:58 joal@deploy1002: Started deploy [analytics/refinery@1a53e9a]: Regular analytics weekly train [analytics/refinery@1a53e9a]
19:58 bblack: repool cp1087 - T278729
19:13 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.37 refs T278343
18:15 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:09 pt1979@cumin2001: START - Cookbook sre.dns.netbox
17:22 legoktm: moved mw[1293-1295] to jobrunners and mw[1300-1302] to videoscalers
17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1302.eqiad.wmnet
17:22 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1301.eqiad.wmnet
17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=videoscaler,name=mw1300.eqiad.wmnet
17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1302.eqiad.wmnet
17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1301.eqiad.wmnet
17:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1300.eqiad.wmnet
17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1295.eqiad.wmnet
17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1294.eqiad.wmnet
17:21 legoktm@deploy1002: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1293.eqiad.wmnet
17:19 legoktm: killed all ffmpeg on mw1294
17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1295.eqiad.wmnet
17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1293.eqiad.wmnet
17:17 legoktm@deploy1002: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1294.eqiad.wmnet
17:13 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
17:12 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
17:08 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
17:05 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
17:02 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
16:40 effie: enable puppet on mw* hosts
16:10 mutante: mw1296 - started ferm
16:10 mutante: mw1308 - started ferm
16:07 akosiaris: split jobrunners/videoscalers clusters in conftool. mw12* become videoscalers, mw13* become jobrunners, killing ffmpeg on mw13*
16:07 mutante: mw1309 - systemctl start ferm
16:07 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=jobrunner,name=mw12.*
16:06 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw13.*
16:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
15:59 akosiaris: depool a number of hosts from videoscalers
15:59 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=videoscaler,name=mw12.*
15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1308.eqiad.wmnet,service=jobrunner
15:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet,service=jobrunner
15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
15:29 hnowlan: moving all test tables out of cassandra directories on aqs hosts
14:59 effie: disable puppet on mediawiki servers to deploy 663565
14:58 Urbanecm: Move Help talk:Help talk:Getting started --> Help talk:Getting started via moveBatch.php on enwiki (T278350)
14:32 arturo: manually start update-openstack-mirror.service on sodium (T278505)
13:02 jbond42: rollout lxml update T278822
12:55 jbond42: update spamassasin on lists,otrs and mx T278820
12:39 Amir1: ssh -p 29418 gerrit.wikimedia.org replication start wikidata/query-builder --wait (T277060)
12:38 jbond42: update python(3)-pygments
12:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
12:14 Urbanecm: mwmaint1002: Downloading multiple big files (total filesize estimated 150 GB, downloaded and processed in batches) for server-side uploads
11:21 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable legacy javascript global variables in group1, Some increase in client errors is expected (T72470) (duration: 01m 11s)
09:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1003.eqiad.wmnet
09:52 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1003.eqiad.wmnet
09:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
09:41 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
09:35 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
09:05 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
09:04 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
08:36 jynus: mariadb upgrade of all buster source backup hosts to 10.4.18 T250666
08:05 dcausse: refreshing wdqs entities (T278693)
07:37 elukey: restart-php7.2-fpm on mw1304, jobrunner completely overwhelmed by ffmpeg/transcode jobs (not publishing metrics, erroring out for memcached timeouts) - T278734
07:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36 - T274940
06:06 elukey: powercycle cp1087 (no ssh, no mgmt console tty)
06:04 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet

2021-03-29

19:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1004.eqiad.wmnet
17:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:37 volans@cumin1001: START - Cookbook sre.dns.netbox
16:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1004.eqiad.wmnet
16:11 hnowlan: depooled aqs1004 for transfer of large tables to aqs1010
15:54 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:47 jbond@cumin1001: START - Cookbook sre.dns.netbox
15:45 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:39 jbond@cumin1001: START - Cookbook sre.dns.netbox
13:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
13:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: REIMAGE
13:03 ema: cp4027: rollback luajit experiment https://github.com/apache/trafficserver/issues/7423#issuecomment-809354214
12:36 ema: cp4027: re-enable JIT compilation in all ats-be lua scripts -- https://github.com/apache/trafficserver/issues/7423
11:57 ema: cp4027: re-enable JIT compilation in normalize-path.lua -- https://github.com/apache/trafficserver/issues/7423
11:32 ema: cp4027: install libluajit 2.1.0~beta3+dfsg-6wm1 with P15083 applied -- https://github.com/apache/trafficserver/issues/7423
09:59 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
09:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
09:16 ryankemper: T267927 `sudo -i cookbook sre.wdqs.data-reload wdqs2008.codfw.wmnet --task-id T267927 --reload-data wikidata --reason 'T267927: Reload wikidata jnl from fresh dumps' --reuse-downloaded-dump --depool`
09:15 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
08:47 filippo@deploy1002: Finished deploy [librenms/librenms@df69efe]: deploy I156f32925f693 (duration: 00m 08s)
08:47 filippo@deploy1002: Started deploy [librenms/librenms@df69efe]: deploy I156f32925f693
07:59 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 06s)
07:58 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
07:54 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: Wrap most of functionalities depending on protect mode in a condition - T278478 (duration: 01m 08s)
07:49 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/FlaggedRevs: Wrap most of functionalities depending on protect mode in a condition (T278478) (duration: 01m 08s)
07:42 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836 T268435

2021-03-27

19:25 elukey: powercycle elastic1060 - T278630
06:10 ryankemper: T267927 `sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 -O /srv/wdqs/latest-all.ttl.bz2 && sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.bz2 -O /srv/wdqs/latest-lexemes.ttl.bz2` on `ryankemper@wdqs2008` tmux session `download_dumps_2020-03-26`
05:44 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
05:44 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
05:42 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
05:42 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
05:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
05:40 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
05:38 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
05:38 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload

2021-03-26

22:27 tzatziki: reset password for Philroc
20:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
20:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
17:44 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/includes/changes/RecentChange.php: RecentChange: directly build the user identity if we have the data - T277795 (duration: 01m 06s)
17:42 hashar@deploy1002: Finished scap: Revert "Add change tags for media additions/removals" - T266067 T278429 (duration: 31m 43s)
17:10 hashar@deploy1002: Started scap: Revert "Add change tags for media additions/removals" - T266067 T278429
15:40 Urbanecm: Delete `commonswiki:ip-autoblock:whitelist` cache key from memcached (wmf.36 moves the autoblock whitelist source, and it was deployed on commonswiki for a while, resulting in the cache key being empty)
15:37 hnowlan: importing imposm3_0.11.0+git20201104.4758cf4-1_amd64.changes on apt1001
14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
13:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
13:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
13:02 moritzm: reimaging theemin T275873
12:56 moritzm: drain ganeti1014
12:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
12:37 moritzm: drain ganeti1013
12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
12:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
10:55 Urbanecm: Move `Help talk:Getting Started --> Help talk:Getting started` on enwiki with `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing phab:T278350' -u 'Martin Urbanec' batch.txt` (T278350)
10:49 Urbanecm: Move `User talk:TheAafi/Help talk` to `Help talk:Getting Started` via `[urbanecm@mwmaint1002 ~]$ mwscript moveBatch.php --wiki=enwiki -r 'sysadmin action: fixing phab:T278350' -u 'Martin Urbanec' batch.txt` to fix an UBN task (T278350)
10:10 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts chlorine.eqiad.wmnet
10:02 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts chlorine.eqiad.wmnet
10:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts argon.eqiad.wmnet
09:49 filippo@deploy1002: Finished deploy [librenms/librenms@63e862a]: deploy I955cbfc244 (duration: 00m 08s)
09:49 filippo@deploy1002: Started deploy [librenms/librenms@63e862a]: deploy I955cbfc244
09:46 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts argon.eqiad.wmnet
09:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts acrab.codfw.wmnet
09:43 moritzm: delete fermium in Ganeti (was still around, but powered down) T224586
09:38 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts acrux.codfw.wmnet
09:36 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrab.codfw.wmnet
09:32 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts acrux.codfw.wmnet
09:31 filippo@deploy1002: Finished deploy [librenms/librenms@e7727e3]: deploy I12ac21d877c (duration: 00m 12s)
09:31 filippo@deploy1002: Started deploy [librenms/librenms@e7727e3]: deploy I12ac21d877c
09:28 moritzm: drain ganeti1012
09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
09:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
08:38 moritzm: drain ganeti1010
08:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
08:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
06:11 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
06:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
05:06 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@bb5a072]: 0.3.68 (duration: 07m 31s)
05:00 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.68` on canary `wdqs1003`; proceeding to rest of fleet
04:58 ryankemper@deploy1002: Started deploy [wdqs/wdqs@bb5a072]: 0.3.68
04:58 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.68`. Pre-deploy tests passing on canary `wdqs1003`

2021-03-25

23:47 thcipriani@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/3D/package.json: No-op demo sync (duration: 01m 07s)
23:37 stran@deploy1002: Synchronized README: (no justification provided) (duration: 01m 06s)
23:20 jhuneidi@deploy1002: Synchronized README: DEMO: README (duration: 01m 07s)
22:59 brennen: no patches for upcoming deploy window, but we'll be conducting a deployment training using DEMO patches to READMEs.
22:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
21:35 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
21:31 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
21:27 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
19:48 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 and 2 wikis to 1.36.0-wmf.35 - T274940
19:37 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.35 - T274940
19:36 hashar@deploy1002: sync-wikiversions aborted: (no justification provided) (duration: 00m 03s)
19:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.36
19:04 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: ce7d2d7: ruwiki: flaggedrevs: Delete autoeditor group (T275337) (duration: 01m 08s)
19:01 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ce7d2d7: ruwiki: flaggedrevs: Delete autoeditor group (T275337) (duration: 01m 06s)
18:59 Urbanecm: `mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' ` finished (T275337)
18:53 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Sturm . # T278391
18:50 Urbanecm: mwscript migrateUserGroup.php --wiki=ruwiki 'autoeditor' 'autoreview' # T275337
18:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 39cd4f1: ruwiki: flaggedrevs: Do not allow sysops to modify users in autoeditor group (T275337) (duration: 01m 09s)
18:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: dcfb7fe: ruwiki: flaggedrevs: Do not remove autoreview group (T275337) (duration: 01m 14s)
18:39 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: 3fb6646: ruwiki: flaggedrevs: Revoke review from sysop group (T275811) (duration: 01m 06s)
18:29 urbanecm@deploy1002: Synchronized logos/config.yaml: 29660f9: Update altwiki logo (3/3; T275819) (duration: 01m 06s)
18:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 29660f9: Update altwiki logo (2/3; T275819) (duration: 01m 06s)
18:26 urbanecm@deploy1002: Synchronized static/images/project-logos/: 29660f9: Update altwiki logo (1/3; T275819) (duration: 01m 10s)
18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 62be4e7: Disable magic links on enwiki (T275951) (duration: 01m 20s)
18:14 mutante: alert1001 - sudo systemctl restart tcpircbot-logmsgbot
18:09 marxarelli: scap sync-file .pipeline Config: Include patches in restricted image (T271274)
18:06 hnowlan: draining and restarting aqs1004-b cassandra
17:45 hnowlan: draining and restarting aqs1004-a cassandra
17:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
17:14 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
17:08 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
16:39 hashar: Restarted Apache 2 on contint2001 / contint1001
16:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
16:32 moritzm: restarting apache on an-tool1007/turnilo
16:27 moritzm: restarting dnsdist/rdns-recursor on malmok
16:24 jbond42: restart slapd on ldap-replica
16:22 jbond42: restart slapd on ldap-corp
16:20 jbond42: restart apache on lists1002
16:18 jbond42: restart apache on netbox
16:13 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Disallow negative or decimal values in pages tag - T278400 (duration: 01m 32s)
16:12 jbond42: restart routinator on rpki*
16:12 moritzm: restarting nginx on apt*
16:10 moritzm: restarting apache on dbmonitor
16:08 moritzm: restart Apacge on matomo/piwik
16:03 jbond42: restart apache service on gerrit
16:02 jbond42: restart idp service
16:01 ema: A:cp rolling ats-{tls,backend}-restart for openssl upgrades -- https://www.openssl.org/news/secadv/20210325.txt
15:45 moritzm: installing openssl updates on buster
14:48 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 herron@cumin1001: START - Cookbook sre.dns.netbox
14:13 twentyafterfour: update phabricator again (last night's update undid a hotfix that is now fixed properly)
13:45 moritzm: drain ganeti1009
13:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
13:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on webperf1001.eqiad.wmnet with reason: adapt RAM
13:27 moritzm: reduce webperf1001/webperf2001 to 4G RAM (xhgui has been split off to separate VMs)
13:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1008.eqiad.wmnet
13:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1008.eqiad.wmnet
12:52 hnowlan: aqs1004 nodetool-a cleanup finished
12:14 moritzm: drain ganeti1008
12:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1007.eqiad.wmnet
12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1007.eqiad.wmnet
11:52 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable Legacy javascript in fawikiquote (T72470) (duration: 01m 07s)
11:46 moritzm: drain ganeti1007
11:44 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.36/skins/Vector/resources: Inform anonymous A/B test by tracking time from navigationStart (T275807) (duration: 01m 09s)
11:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1006.eqiad.wmnet
11:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1006.eqiad.wmnet
11:33 ladsgroup@deploy1002: Synchronized dblists/: tawiki: Enable Growth features in dark mode, Part II (T278369) (duration: 01m 07s)
11:32 ladsgroup@deploy1002: Synchronized wmf-config: tawiki: Enable Growth features in dark mode (T278369) (duration: 01m 30s)
11:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
11:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE
11:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4001.wikimedia.org
11:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
11:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host dns4001.wikimedia.org
11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: REIMAGE
11:10 moritzm: drain ganeti1006
11:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1005.eqiad.wmnet
10:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti1005.eqiad.wmnet
10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
10:54 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
10:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
10:48 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
10:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
10:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
10:42 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
10:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
10:36 hnowlan: running general nodetool cleanup on aqs1004-a
10:35 hnowlan: running cleanup on aqs1004-a: nodetool-a cleanup "local_group_default_T_pageviews_per_project_v2" data
10:34 moritzm: drain ganeti1005
10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
10:28 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:24 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:23 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
10:18 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:17 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:13 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99)
10:13 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
09:26 moritzm: drain ganeti2024
09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
09:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
08:45 moritzm: drain ganeti2023
08:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
08:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
08:12 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2 for buster-wikimedia
08:11 elukey: upgrade hive packages in thirdparty/bigtop15 to 2.3.6-2
07:41 legoktm: upgraded lists1002 to hyperkitty 1.2.2-1+wmf1 (T276687)
07:36 legoktm: uploaded hyperkitty 1.2.2-1+wmf1 to buster-wikimedia (T276687)
07:35 jynus: restart db2135 T278408 T273281
07:05 effie: enable puppet on all mediawiki servers
06:57 XioNoX: Option 82: use-vlan-id
06:53 effie: enable puppet on jobrunners
06:47 effie: enable puppet on parsoid
06:40 effie: disable puppet on all mediawiki servers to merge 673061 (service proxy to listen on ::1)
06:23 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
05:19 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
04:44 legoktm: restarted exim4 on lists1002 so it listens on 0.0.0.0 instead of 127.0.0.1
04:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
03:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
01:33 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
01:10 legoktm: mailman3: added lists-next.wikimedia.org domain
01:08 legoktm: mailman3: renamed default site from "example.com" to "lists-next.wikimedia.org"
00:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2378.codfw.wmnet
00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2377.codfw.wmnet
00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2777.codfw.wmnet
00:34 mutante: mw2377, mw2378 - first scap pull
00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2378.codfw.wmnet
00:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2377.codfw.wmnet
00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2378.codfw.wmnet
00:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2377.codfw.wmnet
00:29 legoktm: syncing facts for puppet-compiler
00:23 mutante: mw2377, mw2378 - reboot
00:14 twentyafterfour: phabricator update complete
00:10 twentyafterfour: deploying phabricator
00:05 ryankemper: T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_eqiad "eqiad cluster reboot" --task-id T274204 --nodes-per-run 3 --start-datetime 2021-03-24T23:55:35` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`

2021-03-24

23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
23:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: new_install
23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: new_install
23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
23:48 mutante: generating new mcrouter certs for mw2377, mw2378
22:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
22:07 legoktm: disabled puppet on lists1002 while mailman3-web is broken
21:49 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
21:19 mutante: webperf2001 - restarted apache
21:11 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 07s)
21:10 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
21:08 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
21:07 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GrowthExperiments: LinkRecommendation: Modify path args for calls to API - T277865 (duration: 01m 07s)
21:05 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/ProofreadPage: Revert "Add default TemplateStyles for an Index" - T278379 (duration: 01m 07s)
21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
21:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
21:02 hashar@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/GlobalUsage: Fix hook registration after class was namespaced - T278375 (duration: 01m 07s)
20:59 hashar@deploy1002: Synchronized wmf-config/env.php: multiversion: Move '@' operator in env.php closer to relevant statement (duration: 01m 07s)
20:56 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
20:30 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
20:26 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
20:13 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
20:13 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
20:10 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
20:09 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
20:07 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
20:05 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
19:59 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:59 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
19:57 ryankemper: T267927 Host key is missing for `wdqs2008` leading to `data-transfer` cookbook failing, looking into resolving
19:55 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:55 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
19:50 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:50 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
19:49 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:49 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
19:45 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:45 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
19:42 ryankemper: T267927 Re-enabledpuppet on `wdqs2008` and ran puppet agent
19:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
19:14 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group 1 to 1.36.0-wmf.35
19:07 hashar@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.36 (duration: 01m 21s)
19:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.36
19:03 urbanecm@deploy1002: Synchronized wmf-config/config/shwiki.yaml: 0f3aa72: shwiki: Enable Growth features in dark mode (T278240; 3/3) (duration: 01m 08s)
19:02 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 0f3aa72: shwiki: Enable Growth features in dark mode (T278240; 2/3) (duration: 01m 06s)
19:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0f3aa72: shwiki: Enable Growth features in dark mode (T278240; 1/3) (duration: 01m 07s)
18:54 urbanecm@deploy1002: Synchronized wmf-config/config/eswiki.yaml: ced0920: Enable Growth features on eswiki in dark mode (T278235; 3/3) (duration: 01m 06s)
18:53 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: ced0920: Enable Growth features on eswiki in dark mode (T278235; 2/3) (duration: 01m 07s)
18:52 urbanecm@deploy1002: sync-file aborted: ced0920: Enable Growth features on eswiki in dark mode (2/3) (duration: 00m 01s)
18:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ced0920: Enable Growth features on eswiki in dark mode (T278235; 1/3) (duration: 01m 08s)
18:49 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:45 legoktm@cumin1001: START - Cookbook sre.dns.netbox
18:42 legoktm@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:40 legoktm@cumin1001: START - Cookbook sre.dns.netbox
18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5aa0506: Promote several Growth target wikis out of dark mode (T277491; T276830; T276123; T276816; T275550; T276450) (duration: 01m 08s)
18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 333393d: Add autopatrol to autoreviewers in en.wikibooks (T278300) (duration: 01m 09s)
18:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
17:25 effie: upgrade memcached on mc-gp* hosts
15:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
15:45 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 1:00:00 on irc2001.wikimedia.org with reason: adapt RAM
15:42 moritzm: reduce RAM for irc2001 to 2G, was originally created with 8 G T224579
15:35 effie: enable puppet on all mediawiki + memcached hosts
15:20 moritzm: drain ganeti2022
15:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
14:35 moritzm: drain ganeti2021
14:31 effie: disable puppet on all mediawiki servers + memcached for 674290
14:05 moritzm: failover Ganeti master in codfw to ganeti2019
13:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
13:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
13:29 moritzm: installing irc1001
13:15 moritzm: drain ganeti2020
12:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
12:28 effie: enabling puppet on mediawiki and memcached servers
12:10 jynus: restart dbprov200[12] T271913
11:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15076 and previous config saved to /var/cache/conftool/dbconfig/20210324-115940-root.json
11:57 Andrew-WMDE_: EU deploys done
11:53 jynus: restart dbprov100[12] T271913
11:51 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/MassMessage/: Backport: MassMessage: Unbreak remote content fetching (T276936) (duration: 01m 08s)
11:49 effie: disable puppet on all hosts running mediawiki+memcached to merge 674282
11:45 andrew-wmde@deploy1002: Synchronized php-1.36.0-wmf.36/extensions/MassMessage/: Backport: MassMessage: Unbreak remote content fetching (T276936) (duration: 01m 07s)
11:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15075 and previous config saved to /var/cache/conftool/dbconfig/20210324-114436-root.json
11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15074 and previous config saved to /var/cache/conftool/dbconfig/20210324-112932-root.json
11:22 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable CodeMirror accessibility colors on initial wikis (T276346) (duration: 01m 08s)
11:15 jynus: restart serially db2097 db2098 db2099 db2100 T271913
11:14 andrew-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable bracket matching on group0 and wikitech (T273591) (duration: 01m 25s)
11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Slowly repool db1160 after schema change', diff saved to https://phabricator.wikimedia.org/P15073 and previous config saved to /var/cache/conftool/dbconfig/20210324-111429-root.json
10:50 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1001.wikimedia.org
10:48 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
10:45 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
10:44 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
10:36 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host irc1001.wikimedia.org
10:31 jynus: restart db1171 T271913
10:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
10:14 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
10:14 jynus: restart db1145 T271913
10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
10:06 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
10:03 jynus: restart db1139 T271913
09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15072 and previous config saved to /var/cache/conftool/dbconfig/20210324-095655-marostegui.json
09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15071 and previous config saved to /var/cache/conftool/dbconfig/20210324-095606-root.json
09:51 jynus: restart db1116 T271913
09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15070 and previous config saved to /var/cache/conftool/dbconfig/20210324-094102-root.json
09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
09:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15069 and previous config saved to /var/cache/conftool/dbconfig/20210324-092558-root.json
09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Slowly repool db1149 after schema change', diff saved to https://phabricator.wikimedia.org/P15068 and previous config saved to /var/cache/conftool/dbconfig/20210324-091055-root.json
08:29 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
08:16 gehel: restarting wdqs updater on all nodes for config change
08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics-external
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15066 and previous config saved to /var/cache/conftool/dbconfig/20210324-081057-root.json
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15065 and previous config saved to /var/cache/conftool/dbconfig/20210324-080725-root.json
08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for schema change', diff saved to https://phabricator.wikimedia.org/P15064 and previous config saved to /var/cache/conftool/dbconfig/20210324-080223-marostegui.json
08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-main
08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-logging-external
08:01 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=zotero
07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15063 and previous config saved to /var/cache/conftool/dbconfig/20210324-075553-root.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15062 and previous config saved to /var/cache/conftool/dbconfig/20210324-075221-root.json
07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-main
07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=eventgate-logging-external
07:50 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=dnsdisc=zotero
07:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after schema change', diff saved to https://phabricator.wikimedia.org/P15061 and previous config saved to /var/cache/conftool/dbconfig/20210324-074050-root.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15060 and previous config saved to /var/cache/conftool/dbconfig/20210324-073718-root.json
07:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2002.codfw.wmnet
07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 for schema change', diff saved to https://phabricator.wikimedia.org/P15059 and previous config saved to /var/cache/conftool/dbconfig/20210324-072319-marostegui.json
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P15058 and previous config saved to /var/cache/conftool/dbconfig/20210324-072214-root.json
07:20 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ml-etcd2002.codfw.wmnet
07:10 elukey@cumin1001: START - Cookbook sre.hosts.decommission for hosts ml-etcd2002.codfw.wmnet
07:09 moritzm: installing squid security updates
06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1181 to dbctl, depooled T275633', diff saved to https://phabricator.wikimedia.org/P15057 and previous config saved to /var/cache/conftool/dbconfig/20210324-063459-marostegui.json
06:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1084.eqiad.wmnet
06:14 root@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1084.eqiad.wmnet
05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P15056 and previous config saved to /var/cache/conftool/dbconfig/20210324-055246-marostegui.json
04:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
03:41 ryankemper: T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id T274204 --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
03:41 ryankemper: T274204 Restarting `codfw` restart; the timestamp argument should prevent it from wasting time on nodes that have been rebooted already
03:40 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
03:39 ryankemper: T274204 Timed out waiting for write queues to empty: `[59/60, retrying in 60.00s] Attempt to run 'spicerack.elasticsearch_cluster.ElasticsearchClusters.wait_for_all_write_queues_empty' raised: Write queue not empty (had value of 241631) for partition 0 of topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite.`
03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
02:38 ryankemper: T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_codfw "codfw cluster reboot" --task-id T274204 --nodes-per-run 3 --start-datetime 2021-03-24T02:29:39` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
02:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
01:59 ryankemper: T274204 For now I'll proceed to the reboots of `codfw`
01:59 ryankemper: T274204 `ctrl+c`'d out of run; relforge is relying on outdated config that is trying to talk to `relforge1002` which no longer exists. Need to refactor so that config no longer lives in spicerack
01:58 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade-reboot (exit_code=97)
01:49 ryankemper: T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade-reboot relforge "relforge cluster restarts" --task-id T274204 --nodes-per-run 3 --start-datetime 2021-03-24T01:45:59+00:00` on `ryankemper@cumin1001` tmux session `elasticsearch_rolling_upgrade_reboots`
01:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade-reboot
01:36 eileen: civicrm revision changed from f36a0b08f0 to ad430721f6, config revision is 26b02db7ba
00:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
00:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
00:18 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
00:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE

2021-03-23

22:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
22:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: REIMAGE
22:33 dwisehaupt: pushing 60f9baaf50b to fundraising hosts which will enable ssl by default for mysql client connections that use the host my.cnf file - T170321
22:19 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace (duration: 02m 07s)
22:17 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@3fd7d7b]: partition ores dumps by namespace
22:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:05 dzahn@cumin1001: START - Cookbook sre.dns.netbox
21:27 ppchelko@deploy1002: Finished deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint (duration: 17m 58s)
21:09 ppchelko@deploy1002: Started deploy [restbase/deploy@531c474]: Add pageviews top-per-country endpoint
21:04 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:00 robh@cumin1001: START - Cookbook sre.dns.netbox
21:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:41 eileen: civicrm revision changed from 39d24e8b0a to f36a0b08f0, config revision is 26b02db7ba
20:24 robh@cumin1001: START - Cookbook sre.dns.netbox
20:24 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
20:21 robh@cumin1001: START - Cookbook sre.dns.netbox
20:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts auth1002.eqiad.wmnet
20:03 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts auth1002.eqiad.wmnet
20:02 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts auth1002.eqiad.wmnet
20:01 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts auth1002.eqiad.wmnet
19:51 jforrester@deploy1002: Finished deploy [integration/docroot@9de8c9d]: Add homer-public listing, added by volans (duration: 00m 08s)
19:51 jforrester@deploy1002: Started deploy [integration/docroot@9de8c9d]: Add homer-public listing, added by volans
18:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove schema overrides for 6 finished EL migrations - T267347 T271164 T267351 T267348 T267343 T267353 (duration: 01m 07s)
18:40 legoktm@deploy1002: Synchronized php-1.36.0-wmf.36/vendor/: Bump wikimedia/parsoid to 0.13.0-a29 (duration: 01m 16s)
18:20 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:18 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:16 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
18:10 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add irc2001.wikimedia.org (running buster) as second irc server (T224579) (duration: 01m 08s)
15:39 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
15:39 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
15:38 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
15:38 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
15:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
15:36 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
15:32 moritzm: installing libsdl2 security updates
15:31 akosiaris: pool echostore for eqiad (the first of the larger services traffic wise)
15:31 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=echostore
15:25 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T274200)
15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
15:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
14:53 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
14:46 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
14:43 akosiaris: pool more services in eqiad k8s. T277741. Only the very large ones traffic wise are still on codfw
14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=recommendation-api
14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=push-notifications
14:43 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=proton
14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mobileapps
14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=linkrecommendation
14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventstreams-internal
14:42 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventstreams
14:20 akosiaris: pool a few more services in eqiad k8s. T277741
14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=wikifeeds
14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=termbox
14:19 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=similar-users
14:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.36
14:06 akosiaris: pool a few services in eqiad k8s. T277741
14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=cxserver
14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=citoid
14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=blubberoid
14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=api-gateway
14:05 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apertium
14:05 moritzm: installing pygments security updates on stretch
14:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2008.codfw.wmnet
13:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2008.codfw.wmnet
13:55 hashar@deploy1002: Finished scap: Promote testwikis from 1.36.0-wmf.35 to 1.36.0-wmf.36 - T274940 (duration: 31m 57s)
13:54 elukey: sudo systemctl reload apache2 on prometheus[12]00[34] to pick up new k8s-mlserve instance settings
13:28 moritzm: drain ganeti2008
13:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
13:23 hashar@deploy1002: Started scap: Promote testwikis from 1.36.0-wmf.35 to 1.36.0-wmf.36 - T274940
13:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
13:15 ema: cp3054: install varnishkafka built explicitly against varnish 6.0.1-1wm2 to fix broken dpkg status T264398
13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 100%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15054 and previous config saved to /var/cache/conftool/dbconfig/20210323-130543-root.json
13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15053 and previous config saved to /var/cache/conftool/dbconfig/20210323-130153-root.json
12:58 moritzm: drain ganeti2018
12:58 akosiaris: remove and decomission argon, chroline, acrab, acrux T277741, T277191
12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15052 and previous config saved to /var/cache/conftool/dbconfig/20210323-125155-root.json
12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15051 and previous config saved to /var/cache/conftool/dbconfig/20210323-125039-root.json
12:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15050 and previous config saved to /var/cache/conftool/dbconfig/20210323-124650-root.json
12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 85%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15049 and previous config saved to /var/cache/conftool/dbconfig/20210323-123651-root.json
12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15048 and previous config saved to /var/cache/conftool/dbconfig/20210323-123535-root.json
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15047 and previous config saved to /var/cache/conftool/dbconfig/20210323-123146-root.json
12:27 moritzm: drain ganeti2017
12:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15046 and previous config saved to /var/cache/conftool/dbconfig/20210323-122148-root.json
12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after cloning db1181', diff saved to https://phabricator.wikimedia.org/P15045 and previous config saved to /var/cache/conftool/dbconfig/20210323-122032-root.json
12:17 akosiaris: remove all schedule downtimes for k8s cluster. T277741
12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: Slowly repool db1148 after schema change', diff saved to https://phabricator.wikimedia.org/P15044 and previous config saved to /var/cache/conftool/dbconfig/20210323-121642-root.json
12:09 moritzm: drain ganeti2016
12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 60%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15043 and previous config saved to /var/cache/conftool/dbconfig/20210323-120644-root.json
11:55 moritzm: installing libcaca security updates
11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15042 and previous config saved to /var/cache/conftool/dbconfig/20210323-115141-root.json
11:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on aqs[1012-1015].eqiad.wmnet with reason: New buster hosts, not in use
11:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on aqs[1012-1015].eqiad.wmnet with reason: New buster hosts, not in use
11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 35%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15041 and previous config saved to /var/cache/conftool/dbconfig/20210323-113637-root.json
11:31 Lucas_WMDE: EU backport&config window done
11:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable DiscussionTools' beta features on dewiki (T276494) (duration: 00m 58s)
11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15040 and previous config saved to /var/cache/conftool/dbconfig/20210323-112133-root.json
11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 20%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15039 and previous config saved to /var/cache/conftool/dbconfig/20210323-110630-root.json
11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P15038 and previous config saved to /var/cache/conftool/dbconfig/20210323-110553-marostegui.json
11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15037 and previous config saved to /var/cache/conftool/dbconfig/20210323-110347-root.json
11:01 moritzm: installing tomcat8 security updates
10:56 jayme: all services re-deployed to k8s eqiad - T277741
10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 15%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15036 and previous config saved to /var/cache/conftool/dbconfig/20210323-105126-root.json
10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15035 and previous config saved to /var/cache/conftool/dbconfig/20210323-104843-root.json
10:46 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'staging' .
10:46 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
10:45 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
10:44 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
10:43 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
10:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
10:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
10:41 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
10:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
10:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
10:37 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
10:37 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15034 and previous config saved to /var/cache/conftool/dbconfig/20210323-103623-root.json
10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15033 and previous config saved to /var/cache/conftool/dbconfig/20210323-103340-root.json
10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
10:32 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
10:31 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
10:31 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
10:29 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
10:28 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
10:27 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
10:27 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
10:26 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'staging' .
10:26 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
10:25 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
10:25 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
10:24 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'staging' .
10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
10:23 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
10:22 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubesvc
10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
10:22 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
10:21 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
10:21 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 5%: Slowly pool db1165 into s6 T258361', diff saved to https://phabricator.wikimedia.org/P15031 and previous config saved to /var/cache/conftool/dbconfig/20210323-102119-root.json
10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
10:20 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
10:19 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
10:19 hashar@deploy1002: Pruned MediaWiki: 1.36.0-wmf.33 (duration: 01m 48s)
10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Slowly repool db1147 after schema change', diff saved to https://phabricator.wikimedia.org/P15030 and previous config saved to /var/cache/conftool/dbconfig/20210323-101836-root.json
10:16 hashar@deploy1002: Pruned MediaWiki: 1.36.0-wmf.32 (duration: 14m 47s)
10:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1005.eqiad.wmnet
10:02 hashar: scap clean --delete 1.36.0-wmf.32 # T274940
10:01 hashar: Applied security patches for 1.36.0-wmf.36 # T274940
09:57 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1006.eqiad.wmnet
09:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1015.eqiad.wmnet
09:54 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1006.eqiad.wmnet
09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1165 into s6 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15029 and previous config saved to /var/cache/conftool/dbconfig/20210323-095437-marostegui.json
09:54 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
09:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1016.eqiad.wmnet
09:53 akosiaris: deploy helmfile.d/admin_ng for eqiad T277741
09:53 hashar: scap prep 1.36.0-wmf.36 # T274940
09:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
09:53 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=kubesvc,name=kubernetes2017.codfw.wmnet
09:53 jayme@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=kubesvc,name=kubernetes2017.codfw.wmnet
09:51 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
09:50 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubesvc,name=kubernetes1017.eqiad.wmnet
09:50 jayme@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=kubesvc,name=kubernetes1017.eqiad.wmnet
09:49 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
09:46 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
09:46 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
09:45 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
09:45 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
09:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: REIMAGE
09:44 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
09:44 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
09:43 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: REIMAGE
09:43 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: REIMAGE
09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1165 into s6 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P15028 and previous config saved to /var/cache/conftool/dbconfig/20210323-094257-marostegui.json
09:41 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: REIMAGE
09:41 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1016.eqiad.wmnet
09:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: REIMAGE
09:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1015.eqiad.wmnet
09:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1005.eqiad.wmnet
09:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: REIMAGE
09:38 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: REIMAGE
09:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: REIMAGE
09:36 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: REIMAGE
09:36 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1004.eqiad.wmnet with reason: REIMAGE
09:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: REIMAGE
09:34 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: REIMAGE
09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1017.eqiad.wmnet
09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: REIMAGE
09:32 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: REIMAGE
09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1165 to dbctl, depooled - T258361', diff saved to https://phabricator.wikimedia.org/P15027 and previous config saved to /var/cache/conftool/dbconfig/20210323-093246-marostegui.json
09:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: REIMAGE
09:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1004.eqiad.wmnet with reason: REIMAGE
09:30 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: REIMAGE
09:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1003.eqiad.wmnet with reason: REIMAGE
09:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1002.eqiad.wmnet with reason: REIMAGE
09:28 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: REIMAGE
09:28 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1003.eqiad.wmnet with reason: REIMAGE
09:27 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1002.eqiad.wmnet with reason: REIMAGE
09:26 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1001.eqiad.wmnet with reason: REIMAGE
09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 to clone db1181 T275633', diff saved to https://phabricator.wikimedia.org/P15025 and previous config saved to /var/cache/conftool/dbconfig/20210323-092600-marostegui.json
09:24 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1001.eqiad.wmnet with reason: REIMAGE
09:18 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dc=eqiad,cluster=kubernetes,name=kubernetes1017.eqiad.wmnet
09:17 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=kubemaster,cluster=kubernetes
09:17 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=kubemaster,cluster=kubernetes
09:16 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1017.eqiad.wmnet
09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P15024 and previous config saved to /var/cache/conftool/dbconfig/20210323-091432-marostegui.json
09:05 akosiaris: reboot kubetcd100[456] for kernel upgrades. T277741 T273278
09:04 akosiaris: empty etcd T277741
08:43 akosiaris: poweroff argon and chlorine T277741
08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15023 and previous config saved to /var/cache/conftool/dbconfig/20210323-083957-root.json
08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=zotero
08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=wikifeeds
08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=termbox
08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=similar-users
08:41 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore
08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=recommendation-api
08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=push-notifications
08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=proton
08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mobileapps
08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=linkrecommendation
08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams-internal
08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams
08:40 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-main
08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-logging-external
08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics-external
08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=echostore
08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=api-gateway
08:39 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=apertium
08:33 akosiaris: eqiad services in k8s depooled. T277741
08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=wikifeeds
08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=termbox
08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=similar-users
08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore
08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=recommendation-api
08:33 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=push-notifications
08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=proton
08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mobileapps
08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=linkrecommendation
08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams-internal
08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventstreams
08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-main
08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-logging-external
08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics-external
08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
08:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=echostore
08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=cxserver
08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=citoid
08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=blubberoid
08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=api-gateway
08:31 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=apertium
08:28 akosiaris: downtime all services in T277741 for 24H
08:25 akosiaris: beginning the k8s upgrade/reinit process. T277741
08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15022 and previous config saved to /var/cache/conftool/dbconfig/20210323-082454-root.json
08:24 moritzm: installing mariadb-10.3 updates on buster (just client-side libs/tools, unrelated to the main wmf-mariadb packages)
08:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize eqiad k8s cluster with new etcd
08:24 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize eqiad k8s cluster with new etcd
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15021 and previous config saved to /var/cache/conftool/dbconfig/20210323-082213-root.json
08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15020 and previous config saved to /var/cache/conftool/dbconfig/20210323-080949-root.json
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15019 and previous config saved to /var/cache/conftool/dbconfig/20210323-080709-root.json
07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15017 and previous config saved to /var/cache/conftool/dbconfig/20210323-075445-root.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 to enable report_host T266483', diff saved to https://phabricator.wikimedia.org/P15016 and previous config saved to /var/cache/conftool/dbconfig/20210323-075253-marostegui.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15015 and previous config saved to /var/cache/conftool/dbconfig/20210323-075230-root.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15014 and previous config saved to /var/cache/conftool/dbconfig/20210323-075216-root.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15013 and previous config saved to /var/cache/conftool/dbconfig/20210323-075206-root.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15012 and previous config saved to /var/cache/conftool/dbconfig/20210323-073726-root.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15011 and previous config saved to /var/cache/conftool/dbconfig/20210323-073713-root.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Slowly repool db1146:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P15010 and previous config saved to /var/cache/conftool/dbconfig/20210323-073702-root.json
07:36 elukey: create a 50g lvm volume on prometheus[12]00[34] for the k8s-mlserve cluster - T272918
07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 100%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15009 and previous config saved to /var/cache/conftool/dbconfig/20210323-072352-root.json
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15008 and previous config saved to /var/cache/conftool/dbconfig/20210323-072223-root.json
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15007 and previous config saved to /var/cache/conftool/dbconfig/20210323-072209-root.json
07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 75%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15006 and previous config saved to /var/cache/conftool/dbconfig/20210323-070849-root.json
07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P15005 and previous config saved to /var/cache/conftool/dbconfig/20210323-070719-root.json
07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15004 and previous config saved to /var/cache/conftool/dbconfig/20210323-070705-root.json
07:02 marostegui: Upgrade kernel on db1101
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 to enable report_host T266483', diff saved to https://phabricator.wikimedia.org/P15003 and previous config saved to /var/cache/conftool/dbconfig/20210323-065947-marostegui.json
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 to enable report_host T266483', diff saved to https://phabricator.wikimedia.org/P15002 and previous config saved to /var/cache/conftool/dbconfig/20210323-065836-marostegui.json
06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 50%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15001 and previous config saved to /var/cache/conftool/dbconfig/20210323-065345-root.json
06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P15000 and previous config saved to /var/cache/conftool/dbconfig/20210323-063842-root.json
06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14999 and previous config saved to /var/cache/conftool/dbconfig/20210323-062942-marostegui.json
06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 10%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P14998 and previous config saved to /var/cache/conftool/dbconfig/20210323-062338-root.json
06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086', diff saved to https://phabricator.wikimedia.org/P14997 and previous config saved to /var/cache/conftool/dbconfig/20210323-062059-marostegui.json
06:20 marostegui: Upgrade kernel on db1086
06:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1086 (re)pooling @ 25%: Slowly repool db1086 after removing it from master', diff saved to https://phabricator.wikimedia.org/P14996 and previous config saved to /var/cache/conftool/dbconfig/20210323-060701-root.json
06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1136 to s7 master and remove read-only from s7 T274336', diff saved to https://phabricator.wikimedia.org/P14995 and previous config saved to /var/cache/conftool/dbconfig/20210323-060216-marostegui.json
06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s7 as read-only for maintenance T274336', diff saved to https://phabricator.wikimedia.org/P14994 and previous config saved to /var/cache/conftool/dbconfig/20210323-060104-marostegui.json
06:00 marostegui: Starting s7 eqiad failover from db1086 to db1136 - T274336
05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1174 to api T274336', diff saved to https://phabricator.wikimedia.org/P14993 and previous config saved to /var/cache/conftool/dbconfig/20210323-051346-marostegui.json
05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1136 before failover T274336', diff saved to https://phabricator.wikimedia.org/P14992 and previous config saved to /var/cache/conftool/dbconfig/20210323-051210-marostegui.json
00:07 tstarling@deploy1002: Synchronized wmf-config: use RequestTimeout library step 3: clean up (duration: 00m 58s)
00:06 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: use RequestTimeout library step 2: enable new system (duration: 00m 57s)
00:04 tstarling@deploy1002: Synchronized wmf-config/PhpAutoPrepend.php: use RequestTimeout library step 1: disable old request timeout system (duration: 00m 58s)

2021-03-22

23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
23:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
23:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2250.codfw.wmnet
23:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:18 ebernhardson@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: T262612: Start glent m1 ab test (duration: 01m 53s)
23:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
23:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2250.codfw.wmnet
23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2249.codfw.wmnet
22:52 mutante: decom mw2249
22:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2249.codfw.wmnet
21:08 sbassett: Deployed security patch for T272244
20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2279.codfw.wmnet,service=canary
20:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2278.codfw.wmnet,service=canary
20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2279.codfw.wmnet,service=canary
20:02 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2278.codfw.wmnet,service=canary
19:50 mutante: gerrit2001 - restarted apache2 as well for consistency
19:47 mutante: gerrit - restarting apache2 after we dropped MaxClients config line. This should make us fall back to Debian default MaxRequestWorkers. (since we use event MPM we should not be using MaxClients in the first place, says #httpd) (T277127)
18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 25247c9: hrwiki: Configure mentorship for Growth team features (T275684) (duration: 01m 00s)
18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 951601f: Grant enwiki pagemovers the delete-redirect right (T278131) (duration: 00m 59s)
17:30 Trey314159: reindexing Italian wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T274200)
16:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
16:48 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
16:47 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
16:46 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:37 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
16:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox
15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14990 and previous config saved to /var/cache/conftool/dbconfig/20210322-155808-root.json
15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14989 and previous config saved to /var/cache/conftool/dbconfig/20210322-154304-root.json
15:38 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14988 and previous config saved to /var/cache/conftool/dbconfig/20210322-152800-root.json
15:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Slowly repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P14987 and previous config saved to /var/cache/conftool/dbconfig/20210322-151257-root.json
14:26 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:23 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
14:22 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
14:14 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
14:14 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P14986 and previous config saved to /var/cache/conftool/dbconfig/20210322-141146-marostegui.json
14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14985 and previous config saved to /var/cache/conftool/dbconfig/20210322-140800-root.json
14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad - T277771
14:07 XioNoX: rename cloud-hosts1-b-eqiad to cloud-hosts1-eqiad
13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14984 and previous config saved to /var/cache/conftool/dbconfig/20210322-135256-root.json
13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14983 and previous config saved to /var/cache/conftool/dbconfig/20210322-133753-root.json
13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:26 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P14982 and previous config saved to /var/cache/conftool/dbconfig/20210322-132249-root.json
13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:16 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
12:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
12:27 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
12:20 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
12:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for schema change', diff saved to https://phabricator.wikimedia.org/P14981 and previous config saved to /var/cache/conftool/dbconfig/20210322-121924-marostegui.json
11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14980 and previous config saved to /var/cache/conftool/dbconfig/20210322-112954-root.json
11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14979 and previous config saved to /var/cache/conftool/dbconfig/20210322-112707-root.json
11:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
11:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
11:15 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14978 and previous config saved to /var/cache/conftool/dbconfig/20210322-111451-root.json
11:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14977 and previous config saved to /var/cache/conftool/dbconfig/20210322-111203-root.json
11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
11:09 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14976 and previous config saved to /var/cache/conftool/dbconfig/20210322-105947-root.json
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14975 and previous config saved to /var/cache/conftool/dbconfig/20210322-105700-root.json
10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
10:53 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
10:51 moritzm: installing libdbi-perl security updates
10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
10:48 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:47 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P14974 and previous config saved to /var/cache/conftool/dbconfig/20210322-104443-root.json
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P14973 and previous config saved to /var/cache/conftool/dbconfig/20210322-104156-root.json
10:42 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
10:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
10:41 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
10:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:17 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:12 elukey: run homer for cr1/cr2 eqiad and codfw to add new iBGP session for the k8s ML clusters - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/661055
09:50 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config cleanup (duration: 00m 57s)
09:49 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config cleanup (duration: 00m 59s)
09:48 reedy@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config cleanup (duration: 01m 20s)
09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for schema change', diff saved to https://phabricator.wikimedia.org/P14971 and previous config saved to /var/cache/conftool/dbconfig/20210322-093558-marostegui.json
09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14970 and previous config saved to /var/cache/conftool/dbconfig/20210322-091534-root.json
09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14969 and previous config saved to /var/cache/conftool/dbconfig/20210322-090030-root.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14968 and previous config saved to /var/cache/conftool/dbconfig/20210322-084527-root.json
08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P14967 and previous config saved to /var/cache/conftool/dbconfig/20210322-083023-root.json
08:13 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - T272836 T268435
08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
08:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
08:02 jayme: build and release docker-registry.discovery.wmnet/eventrouter:0.3.0-6, docker-registry.discovery.wmnet/fluent-bit:1.5.3-3, docker-registry.discovery.wmnet/ratelimit:1.5.1-s3
08:00 marostegui: Stop MySQL on db1085 to clone db1165 (lag will appear on s6 on wiki replicas) T258361
08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 to clone db1165', diff saved to https://phabricator.wikimedia.org/P14965 and previous config saved to /var/cache/conftool/dbconfig/20210322-080020-marostegui.json
07:51 elukey: stop/start mariadb instances on dbstore1004 to reduce buffer pool memory settings - T273865
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14964 and previous config saved to /var/cache/conftool/dbconfig/20210322-073747-root.json
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14963 and previous config saved to /var/cache/conftool/dbconfig/20210322-072243-root.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141 for schema change', diff saved to https://phabricator.wikimedia.org/P14962 and previous config saved to /var/cache/conftool/dbconfig/20210322-071430-marostegui.json
07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14961 and previous config saved to /var/cache/conftool/dbconfig/20210322-070740-root.json
06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Slowly repool db1161', diff saved to https://phabricator.wikimedia.org/P14960 and previous config saved to /var/cache/conftool/dbconfig/20210322-065236-root.json
06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1084 from dbctl T276302', diff saved to https://phabricator.wikimedia.org/P14959 and previous config saved to /var/cache/conftool/dbconfig/20210322-063732-marostegui.json
06:11 marostegui: Sanitize db1124 db2094 db1154: taywiki trvwiki mnwwiktionary
04:28 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .

2021-03-21

10:25 _joe_: restarting gerrit on gerrit1001, using 45G of reserved memory
09:22 elukey: install apache2-bin-dbgsym on gerrit1001 - T277127
08:50 qchris: Restarting apache on gerrit1001 again (all apache workers busy again) see T277127
08:18 qchris: Restarting apache on gerrit1001 (all apache workers busy)

2021-03-20

00:22 tzatziki: altering emails for STei (WMF) and SGrabarczuk (WMF)

2021-03-19

21:11 mutante: scandium - stop apache and rerun puppet which fails after reimaging because it tries to run an nginx on port 80 which is already used by apache T268248
20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on scandium.eqiad.wmnet with reason: REIMAGE
20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on scandium.eqiad.wmnet with reason: REIMAGE
20:15 mutante: scandium - reimaging with buster
20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on scandium.eqiad.wmnet with reason: reimage
20:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on scandium.eqiad.wmnet with reason: reimage
20:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2245.codfw.wmnet
19:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2245.codfw.wmnet
19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2244.codfw.wmnet
19:53 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host lists1002.wikimedia.org
19:50 mutante: testreduce1001 - confirmed MariaDB @@datadir is /srv/data/mysql and deleting /var/lib/mysql (T277580)
19:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2244.codfw.wmnet
19:39 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2245.codfw.wmnet
19:39 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host lists1002.wikimedia.org
19:39 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2244.codfw.wmnet
19:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2252.codfw.wmnet,service=canary
19:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet,service=canary
19:33 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2252.codfw.wmnet,service=canary
19:33 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2251.codfw.wmnet,service=canary
19:24 mutante: deploy2002 - re-enabled puppet, reverted patch of scap-sync-master
18:46 mutante: deploy2002 - disable puppet, copy modified version of scap-master-sync over it that does not --exclude="**/cache/l10n/*.cdb" (for T275826)
16:01 effie: upgrade memcached on mc-gp200*
12:36 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
12:34 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
12:10 effie: upgrade memcached on mc1026,mc2026
11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:36 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
11:36 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
11:30 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
11:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
11:29 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
11:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
11:29 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:29 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:27 akosiaris@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:27 akosiaris@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
11:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
10:45 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:45 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:42 moritzm: installing dbmonitor1002 T224589
10:42 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:42 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:41 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:41 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:11 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
10:10 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
10:05 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
10:04 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
09:40 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
09:36 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
08:22 elukey: upload alluxio 2.4.1 to thirdparty/bigtop15 on stretch/buster-wikimedia
07:16 ryankemper: T275885 `ryankemper@cumin1001:~$ sudo cumin 'P{relforge*}' 'sudo run-puppet-agent'` (change hadn't been merged when I ran the agent earlier)
04:04 eileen: civicrm revision changed from 99bf1c9210 to 39d24e8b0a, config revision is 26b02db7ba
03:27 ryankemper: [wdqs] `ryankemper@wdqs1013:~$ sudo systemctl restart wdqs-blazegraph`
03:26 ryankemper: T275885 `ryankemper@cumin1001:~$ sudo cumin 'P{relforge*}' 'sudo run-puppet-agent'`
02:43 ryankemper: T275885 Revoking current `relforge` TLS cert in advance of generation of new cert: `ryankemper@puppetmaster1001:/srv/private$ sudo puppet cert clean relforge.svc.eqiad.wmnet`
00:51 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/LiquidThreads/classes/Thread.php: T277772 (duration: 00m 58s)
00:45 mutante: testreduce1001 - stop mysql; rsyncing /var/lib/mysql to /srv/data/mysql (T277580)

2021-03-18

23:56 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Don't define a default icon (T274199) (duration: 00m 57s)
23:38 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: ActorStore::getActorById - fall back to master. (T277795) (duration: 00m 57s)
23:35 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/user/ActorStore.php: Backport: ActorStore::getActorById - fall back to master. (T277795) (duration: 00m 58s)
23:25 dduvall@deploy1002: Synchronized .pipeline: config: Use build environment HTTP proxy for APT sources (T277109) (duration: 01m 02s)
23:06 brennen: train status: 1.36.0-wmf.35 (T274939) stable on all wikis after deploy of hotfix for T277795
22:53 brennen@deploy1002: Synchronized php-1.36.0-wmf.35/includes/specials/SpecialContributions.php: Backport: ActorStore::getActorById - fall back to master. (T277795) (duration: 01m 07s)
22:30 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
22:29 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
22:25 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
20:37 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/LiquidThreads/classes/Thread.php: (no justification provided) (duration: 01m 05s)
19:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.35
18:28 legoktm: re-enabled puppet on registry*
18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 44eddcc: hrwiki: Deploy Growth features to newcomers (T275684) (duration: 01m 08s)
18:12 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 179d9e5: mswiki: Enable Growth features in stealth mode (T277562; 2/2) (duration: 01m 08s)
18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 179d9e5: mswiki: Enable Growth features in stealth mode (T277562; 1/2) (duration: 01m 11s)
17:58 legoktm: disabled puppet on registry* for rolling out https://gerrit.wikimedia.org/r/672537
17:50 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 55aa6cb: tewiki: Enable Growth features in stealth mode (T277491; 2/2) (duration: 01m 08s)
17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2242.codfw.wmnet
17:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 55aa6cb: tewiki: Enable Growth features in stealth mode (T277491; 1/2) (duration: 01m 10s)
17:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 04342e9: simplewiki: Enable Growth team features in stealth mode (T277550) (duration: 01m 09s)
17:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 04342e9: simplewiki: Enable Growth team features in stealth mode (T277550) (duration: 01m 10s)
17:40 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
17:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2242.codfw.wmnet
17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2241.codfw.wmnet
17:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2241.codfw.wmnet
17:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2240.codfw.wmnet
16:54 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2240.codfw.wmnet
16:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2239.codfw.wmnet
16:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2239.codfw.wmnet
16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2242.codfw.wmnet
16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2241.codfw.wmnet
16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2240.codfw.wmnet
16:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2239.codfw.wmnet
15:33 shdubsh: clean up dead letter queue and restart all logstashes
14:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
14:37 dcausse: repooling wdqs1005
14:29 hashar: Restarting CI Jenkins for plugin upgrade
13:49 elukey: reboot analytics1066
13:23 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/Wikibase/repo: languageLabelDescriptionAliases: use getLanguageNameByCode (T275611 T277722) (duration: 01m 14s)
12:58 jbond42: upload cas_6.3.2 to apt buster-wikimedia
11:37 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
11:34 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
11:25 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
11:24 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: 896c9f0: flaggedrevs: Disable multiple dimensions in hewikisource (duration: 01m 09s)
11:20 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/GrowthExperiments/includes/HomepageHooks.php: 3b2aa1a: Remove variant C from list of valid variants (T277727) (duration: 01m 09s)
11:16 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
11:14 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
11:11 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0005676: GrowthExperiments: set $wgGEHomepageNewAccountVariants to D only (T277727) (duration: 01m 10s)
11:08 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: NOOP: e7f5eac: Enable CentralAuth IRC feed in beta cluster (T277432) (duration: 01m 12s)
09:13 _joe_: hard reboot of snapshot1005
09:04 _joe_: attempted reboot of snapshot1005, read-only filesystem and probably disks are broken beyond repair
08:27 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - T272836
08:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14946 and previous config saved to /var/cache/conftool/dbconfig/20210318-080258-root.json
08:02 akosiaris: reimage ml-serve1004 to debug a docker volume_group issue
07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14945 and previous config saved to /var/cache/conftool/dbconfig/20210318-074754-root.json
07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14944 and previous config saved to /var/cache/conftool/dbconfig/20210318-073250-root.json
07:20 dcausse: depooling & restarting blazegraph on wdqs1005
07:19 marostegui: Deploy schema change on s4 codfw master, lag will appear - T276150 T276156
07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P14943 and previous config saved to /var/cache/conftool/dbconfig/20210318-071747-root.json
07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
07:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1161 to dbctl, depooled T258361', diff saved to https://phabricator.wikimedia.org/P14942 and previous config saved to /var/cache/conftool/dbconfig/20210318-063241-marostegui.json
06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2120', diff saved to https://phabricator.wikimedia.org/P14941 and previous config saved to /var/cache/conftool/dbconfig/20210318-062201-marostegui.json
06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P14940 and previous config saved to /var/cache/conftool/dbconfig/20210318-060445-marostegui.json
03:46 andrewbogott: restarting slapd on seaborgium, serpens, and r-o ldap replicas (we're getting irregular connection failures)
00:05 eileen: tools revision changed from b7b4060c30 to ef54260b0d

2021-03-17

23:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c730dd5: idwiki: Deploy Growth features to newcomers (T259024) (duration: 01m 08s)
23:40 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 5c14e7d: Define confirmed group in MediaWikiServices hook (T275334, T277704, T275310, T275333) (duration: 01m 08s)
23:30 ebernhardson@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/CirrusSearch/profiles/FallbackProfiles.config.php: Add fallback profile including glent m1 (duration: 01m 42s)
22:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
22:25 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
22:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
22:23 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
20:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1184.eqiad.wmnet with reason: REIMAGE
20:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
20:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1184.eqiad.wmnet with reason: REIMAGE
20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: REIMAGE
20:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
20:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
20:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: REIMAGE
20:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1180.eqiad.wmnet with reason: REIMAGE
20:43 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
20:42 andrew@deploy1002: Finished deploy [horizon/deploy@17ea780]: display volume usage summaries (duration: 03m 34s)
20:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1179.eqiad.wmnet with reason: REIMAGE
20:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1180.eqiad.wmnet with reason: REIMAGE
20:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: REIMAGE
20:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1179.eqiad.wmnet with reason: REIMAGE
20:39 andrew@deploy1002: Started deploy [horizon/deploy@17ea780]: display volume usage summaries
20:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: REIMAGE
20:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: REIMAGE
20:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: REIMAGE
20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2238.codfw.wmnet
20:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2238.codfw.wmnet
20:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: REIMAGE
20:05 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: REIMAGE
20:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2237.codfw.wmnet
19:54 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2237.codfw.wmnet
19:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2236.codfw.wmnet
19:48 andrew@deploy1002: Finished deploy [horizon/deploy@3c2d1ee]: support VM resizing (duration: 03m 42s)
19:44 andrew@deploy1002: Started deploy [horizon/deploy@3c2d1ee]: support VM resizing
19:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2236.codfw.wmnet
19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2238.codfw.wmnet
19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2237.codfw.wmnet
19:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2236.codfw.wmnet
19:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2235.codfw.wmnet
19:29 mutante: testreduce1001 - rebooted, fdisk /dev/sdb, create partition table, create primary partition, mkfs.ext4 /dev/vdb1
19:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2235.codfw.wmnet
19:18 andrew@deploy1002: Finished deploy [horizon/deploy@8967660]: clean up a reverted hack (duration: 03m 25s)
19:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2234.codfw.wmnet
19:14 andrew@deploy1002: Started deploy [horizon/deploy@8967660]: clean up a reverted hack
19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.35 (duration: 01m 26s)
19:05 mutante: ganeti1011 - rebooting VM testreduce1001 on ganeti level for T277580
19:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.35
19:02 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2234.codfw.wmnet
19:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2233.codfw.wmnet
18:58 catrope@deploy1002: Synchronized php-1.36.0-wmf.35/extensions/WikimediaEvents/: sessionTick: Tick right away on sessionReset (T277515) (duration: 01m 10s)
18:52 catrope@deploy1002: Synchronized php-1.36.0-wmf.35/vendor/: Bump wikimedia/parsoid to 0.13.0-a28 (T276649) (duration: 01m 18s)
18:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2233.codfw.wmnet
18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2235.codfw.wmnet
18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2234.codfw.wmnet
18:43 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2233.codfw.wmnet
18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2232.codfw.wmnet
18:31 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Define Portal and Portal talk namespace for niawiki (T277671) (duration: 01m 11s)
18:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2232.codfw.wmnet
18:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2231.codfw.wmnet
18:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2231.codfw.wmnet
17:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2230.codfw.wmnet
17:50 razzi: update firewall rules to allow mysql-sqoop in analytics-in4 to access clouddb1021 - https://gerrit.wikimedia.org/r/c/operations/homer/public/+/672797
17:47 ejegg: updated payments-wiki from 0405ea1723 to b06009c099
17:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2230.codfw.wmnet
17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:50 andrew@deploy1002: Finished deploy [horizon/deploy@8c50f27]: more support for disabled flavors (duration: 02m 32s)
16:48 andrew@deploy1002: Started deploy [horizon/deploy@8c50f27]: more support for disabled flavors
16:45 andrew@deploy1002: Finished deploy [horizon/deploy@8c50f27]: more support for disabled flavors (duration: 00m 07s)
16:45 andrew@deploy1002: Started deploy [horizon/deploy@8c50f27]: more support for disabled flavors
16:44 andrew@deploy1002: Finished deploy [horizon/deploy@e4fd934]: more support for disabled flavors (duration: 00m 07s)
16:44 andrew@deploy1002: Started deploy [horizon/deploy@e4fd934]: more support for disabled flavors
16:38 effie: upgrade memcached on mc1025, mc2025
16:06 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.35
16:04 dancy@deploy1002: Synchronized php-1.36.0-wmf.35/includes/Revision/RevisionRecord.php: (no justification provided) (duration: 00m 58s)
15:54 ejegg: updated standalone SmashPig deployment from 58b070db1a to 250a8570d1
15:23 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dbmonitor1002.wikimedia.org
14:56 jmm@cumin1001: START - Cookbook sre.ganeti.makevm for new host dbmonitor1002.wikimedia.org
14:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
14:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14935 and previous config saved to /var/cache/conftool/dbconfig/20210317-142532-root.json
14:18 jayme: rebooting restreduce1001 for T277580
14:17 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14934 and previous config saved to /var/cache/conftool/dbconfig/20210317-141028-root.json
14:02 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=sessionstore
14:02 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-analytics
14:01 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2f1b28] (duration: 04m 19s)
13:59 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
13:58 moritzm: added bullseye tftpboot environment T275873
13:56 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d2f1b28]
13:56 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28] (thin): Regular analytics weekly train THIN [analytics/refinery@d2f1b28] (duration: 00m 06s)
13:56 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28] (thin): Regular analytics weekly train THIN [analytics/refinery@d2f1b28]
13:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14933 and previous config saved to /var/cache/conftool/dbconfig/20210317-135522-root.json
13:54 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
13:52 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
13:52 otto@deploy1002: Finished deploy [analytics/refinery@d2f1b28]: Regular analytics weekly train [analytics/refinery@d2f1b28] (duration: 11m 36s)
13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-analytics-external
13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-logging-external
13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=api-gateway
13:47 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=echostore
13:47 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
13:46 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
13:41 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
13:40 otto@deploy1002: Started deploy [analytics/refinery@d2f1b28]: Regular analytics weekly train [analytics/refinery@d2f1b28]
13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: Slowly repool db1087', diff saved to https://phabricator.wikimedia.org/P14932 and previous config saved to /var/cache/conftool/dbconfig/20210317-134018-root.json
13:38 kormat: stopping db2137:s5 T277632
13:33 kormat: stopping db2089:s5 T277632
13:31 otto@deploy1002: Finished deploy [analytics/aqs/deploy@3e92346]: deploy aqs as part of train - T207171, T263697 (duration: 03m 24s)
13:27 otto@deploy1002: Started deploy [analytics/aqs/deploy@3e92346]: deploy aqs as part of train - T207171, T263697
13:23 jynus: stopping s5 instance on db2099 and restoring from backup T277632
13:17 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventstreams
13:14 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventstreams-internal
13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mobileapps
13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=wikifeeds
13:13 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=termbox
13:12 moritzm: installing tiff security updates
12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=similar-users
12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=push-notifications
12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=proton
12:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=linkrecommendation
12:44 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=blubberoid
12:44 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=apertium
12:11 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mathoid
12:10 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=eventgate-main
11:49 marostegui: Deploy schema change on s8, lag will appear on wiki replicas T276150 T276156
11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for schema change', diff saved to https://phabricator.wikimedia.org/P14931 and previous config saved to /var/cache/conftool/dbconfig/20210317-114746-marostegui.json
11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14930 and previous config saved to /var/cache/conftool/dbconfig/20210317-114601-root.json
11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14929 and previous config saved to /var/cache/conftool/dbconfig/20210317-113057-root.json
11:20 jayme: switch restbase-async back to codfw (the newly initialized cluster)
11:17 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=codfw
11:17 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14928 and previous config saved to /var/cache/conftool/dbconfig/20210317-111553-root.json
11:09 moritzm: restarting tomcat on idp.wikimedia.org
11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Slowly repool db1109', diff saved to https://phabricator.wikimedia.org/P14927 and previous config saved to /var/cache/conftool/dbconfig/20210317-110050-root.json
09:59 moritzm: imported PHP 5.6.40 to thirdparty/php56 T224589
09:47 vgutierrez: restart varnish-fe on cp5011
09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 for schema change', diff saved to https://phabricator.wikimedia.org/P14926 and previous config saved to /var/cache/conftool/dbconfig/20210317-092443-marostegui.json
09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14925 and previous config saved to /var/cache/conftool/dbconfig/20210317-092357-root.json
09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14924 and previous config saved to /var/cache/conftool/dbconfig/20210317-090853-root.json
09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=recommendation-api
09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=cxserver
09:04 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=citoid
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14923 and previous config saved to /var/cache/conftool/dbconfig/20210317-090108-root.json
08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 T276302', diff saved to https://phabricator.wikimedia.org/P14922 and previous config saved to /var/cache/conftool/dbconfig/20210317-085852-marostegui.json
08:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14921 and previous config saved to /var/cache/conftool/dbconfig/20210317-085350-root.json
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14920 and previous config saved to /var/cache/conftool/dbconfig/20210317-084605-root.json
08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P14919 and previous config saved to /var/cache/conftool/dbconfig/20210317-083846-root.json
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14918 and previous config saved to /var/cache/conftool/dbconfig/20210317-083101-root.json
08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P14917 and previous config saved to /var/cache/conftool/dbconfig/20210317-081557-root.json
07:50 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - T272836
07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for schema change', diff saved to https://phabricator.wikimedia.org/P14916 and previous config saved to /var/cache/conftool/dbconfig/20210317-073403-marostegui.json
07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14915 and previous config saved to /var/cache/conftool/dbconfig/20210317-073024-root.json
07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14914 and previous config saved to /var/cache/conftool/dbconfig/20210317-071520-root.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14913 and previous config saved to /var/cache/conftool/dbconfig/20210317-070017-root.json
06:52 marostegui: Stop MySQL on db1082 to clone db1161 (lag will appear on s5 on wikireplicas) - T258361
06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 to clone db1161 T258361', diff saved to https://phabricator.wikimedia.org/P14911 and previous config saved to /var/cache/conftool/dbconfig/20210317-065146-marostegui.json
06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2150 into s7 T275633', diff saved to https://phabricator.wikimedia.org/P14910 and previous config saved to /var/cache/conftool/dbconfig/20210317-064606-marostegui.json
06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Slowly repool db1111', diff saved to https://phabricator.wikimedia.org/P14909 and previous config saved to /var/cache/conftool/dbconfig/20210317-064513-root.json
06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2150 to s7, depooled T275633', diff saved to https://phabricator.wikimedia.org/P14908 and previous config saved to /var/cache/conftool/dbconfig/20210317-060358-marostegui.json
05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for schema change', diff saved to https://phabricator.wikimedia.org/P14907 and previous config saved to /var/cache/conftool/dbconfig/20210317-054206-marostegui.json
02:25 eileen: civicrm revision changed from 8c137b94f0 to 99bf1c9210, config revision is ef2767ab91
01:55 eileen: civicrm revision changed from 550be50105 to 8c137b94f0, config revision is ef2767ab91

2021-03-16

23:56 krinkle@deploy1002: Synchronized php-1.36.0-wmf.35/includes/Revision/: I8619ab9e92b, T277362, T275531 (duration: 00m 58s)
23:51 krinkle@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/Scribunto/: I84e8732d8d - tmp logging (duration: 00m 58s)
23:47 Krinkle: There is an uncommitted dirty diff in /srv/mediawiki-staging/php-1.36.0-wmf.34/extensions/WikimediaMaintenance/createExtensionTables.php
23:31 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I1ca4f30c2, T262612 (duration: 00m 57s)
23:22 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Icd6635cb302cc, T277332 (duration: 00m 58s)
23:07 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I8d8c94d95c6 (duration: 00m 59s)
23:03 twentyafterfour: applied hotfix to phabricator/src/infrastructure/customfield/storage/PhabricatorCustomFieldStorage.php and restarted php-fpm
23:02 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I4097cbcb1d5 (duration: 00m 59s)
22:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Ie24eb2077 (duration: 00m 58s)
20:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2232.codfw.wmnet
20:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2231.codfw.wmnet
20:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2230.codfw.wmnet
20:49 andrew@deploy1002: Finished deploy [horizon/deploy@e4fd934]: tiny horizon patch to support flavor deprecation (duration: 03m 44s)
20:45 andrew@deploy1002: Started deploy [horizon/deploy@e4fd934]: tiny horizon patch to support flavor deprecation
20:15 XioNoX: remove DMZ zone from pfw3-eqiad - T174203
20:00 brennen: 1.36.0-wmf.35 train status (T274939): blocked at group0 on T277362
19:52 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.34
19:52 XioNoX: commit changes to pfw3-eqiad - T274422
19:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.35
19:31 dancy@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.35 (duration: 33m 41s)
19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2229.codfw.wmnet
19:11 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2229.codfw.wmnet
19:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2229.codfw.wmnet
19:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2228.codfw.wmnet
19:07 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2228.codfw.wmnet
19:06 XioNoX: commit changes to pfw3-codfw - T274422
18:58 dancy@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.35
18:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2228.codfw.wmnet
18:48 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:43 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:41 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
18:03 ppchelko@deploy1002: Finished deploy [restbase/deploy@f99ddaa]: Add new wikis T275837 T271983 T273466 T276127 T273460 T276249 (duration: 31m 31s)
17:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster hosts, not in use
17:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster hosts, not in use
17:37 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2227.codfw.wmnet
17:32 ppchelko@deploy1002: Started deploy [restbase/deploy@f99ddaa]: Add new wikis T275837 T271983 T273466 T276127 T273460 T276249
17:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2227.codfw.wmnet
17:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2226.codfw.wmnet
16:47 eevans@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
16:44 eevans@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
16:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2242.codfw.wmnet
16:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2241.codfw.wmnet
16:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2240.codfw.wmnet
16:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2226.codfw.wmnet
16:20 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2227.codfw.wmnet
16:20 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2226.codfw.wmnet
16:17 mutante: testreduce1001 - gzip /var/log/daemon.log.1 ; apt-get clean .. free some disk space
15:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16 days, 16:00:00 on acrux.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
15:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 16 days, 16:00:00 on acrux.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
15:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16 days, 16:00:00 on acrab.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
15:46 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 16 days, 16:00:00 on acrab.codfw.wmnet with reason: Extend downtime for like a month until we remove the VMs
15:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14905 and previous config saved to /var/cache/conftool/dbconfig/20210316-153446-root.json
15:32 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: T277006 (duration: 04m 56s)
15:27 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: T277006
15:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14904 and previous config saved to /var/cache/conftool/dbconfig/20210316-151943-root.json
15:07 hashar@deploy1002: Finished deploy [integration/docroot@cf787a5]: (no justification provided) (duration: 00m 30s)
15:06 hashar@deploy1002: Started deploy [integration/docroot@cf787a5]: (no justification provided)
15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14903 and previous config saved to /var/cache/conftool/dbconfig/20210316-150439-root.json
15:03 hashar@deploy1002: Finished deploy [integration/docroot@44d5685]: Verify check can restart php-fpm # T275468 (duration: 00m 07s)
15:03 hashar@deploy1002: Started deploy [integration/docroot@44d5685]: Verify check can restart php-fpm # T275468
14:58 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T276251 T276129 T275839)
14:53 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2015.codfw.wmnet
14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P14902 and previous config saved to /var/cache/conftool/dbconfig/20210316-144935-root.json
14:37 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T276251 T276129 T275839)
13:45 moritzm: powercycling ganeti2015, stuck on reboot
13:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
13:35 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
13:35 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'staging' .
13:34 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
13:33 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'test' .
13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'staging' .
13:32 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
13:31 moritzm: drain ganeti2015
13:31 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
13:31 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
13:30 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P14901 and previous config saved to /var/cache/conftool/dbconfig/20210316-132844-marostegui.json
13:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14900 and previous config saved to /var/cache/conftool/dbconfig/20210316-132814-root.json
13:28 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
13:27 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
13:26 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
13:24 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 7fb50c3: trvwiki: set logo to File:Wikipedia-logo-v2-trv.svg (T276246; 2/2) (duration: 00m 57s)
13:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
13:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
13:23 urbanecm@deploy1002: Synchronized static/images/project-logos/: 7fb50c3: trvwiki: set logo to File:Wikipedia-logo-v2-trv.svg (T276246; 1/2) (duration: 01m 01s)
13:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
13:22 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
13:22 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
13:22 urbanecm@deploy1002: sync-file aborted: 7fb50c3: trvwiki: set logo to File:Wikipedia-logo-v2-trv.svg (T276246) (duration: 00m 00s)
13:20 moritzm: drain ganeti2014
13:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
13:19 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:19 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:19 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
13:18 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
13:18 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
13:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
13:16 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
13:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
13:15 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
13:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14899 and previous config saved to /var/cache/conftool/dbconfig/20210316-131310-root.json
13:13 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
13:13 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
13:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
13:12 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
13:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
13:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
13:09 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
13:09 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'staging' .
13:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
13:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'staging' .
13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
13:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
13:05 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
13:04 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
13:04 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
13:03 akosiaris: sync all services on the new codfw kubernetes cluster T277191
13:02 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
13:02 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'staging' .
12:59 moritzm: drain ganeti2013
12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14898 and previous config saved to /var/cache/conftool/dbconfig/20210316-125807-root.json
12:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:53 Urbanecm: New wiki creation is done
12:51 volans@cumin1001: START - Cookbook sre.dns.netbox
12:50 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: 1426d04: flaggedrevs: Simplify the config a bit (duration: 00m 58s)
12:46 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 06s)
12:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating mnwwiktionary (T276125) (duration: 00m 57s)
12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14897 and previous config saved to /var/cache/conftool/dbconfig/20210316-124303-root.json
12:42 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating mnwwiktionary (T276125) (duration: 01m 00s)
12:41 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating mnwwiktionary (T276125) (duration: 01m 01s)
12:40 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating mnwwiktionary (T276125)
12:39 urbanecm@deploy1002: Synchronized dblists: Creating mnwwiktionary (T276125) (duration: 00m 57s)
12:39 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
12:37 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating mnwwiktionary (T276125) (duration: 00m 58s)
12:36 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating mnwwiktionary (T276125) (duration: 00m 58s)
12:34 urbanecm@deploy1002: Synchronized langlist: Creating trvwiki (T276246) (duration: 00m 58s)
12:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating trvwiki (T276246) (duration: 00m 57s)
12:32 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating trvwiki (T276246) (duration: 00m 58s)
12:31 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating trvwiki (T276246)
12:29 urbanecm@deploy1002: Synchronized dblists: Creating trvwiki (T276246) (duration: 00m 57s)
12:28 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
12:28 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating trvwiki (T276246) (duration: 01m 02s)
12:27 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating trvwiki (T276246) (duration: 00m 57s)
12:20 urbanecm@deploy1002: Synchronized langlist: Creating taywiki (T275803) (duration: 00m 57s)
12:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating taywiki (T275803) (duration: 00m 58s)
12:17 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating taywiki (T275803) (duration: 00m 57s)
12:17 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
12:16 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating taywiki (T275803) (duration: 00m 58s)
12:14 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating taywiki (T275803)
12:12 urbanecm@deploy1002: Synchronized dblists: Creating taywiki (T275803) (duration: 00m 58s)
12:11 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating taywiki (T275803) (duration: 01m 02s)
12:10 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating taywiki (T275803) (duration: 00m 59s)
12:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster host
12:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs1011.eqiad.wmnet with reason: New buster host
12:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
11:54 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=kubesvc
11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172 for schema change', diff saved to https://phabricator.wikimedia.org/P14896 and previous config saved to /var/cache/conftool/dbconfig/20210316-114310-marostegui.json
11:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2015.codfw.wmnet
11:32 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2016.codfw.wmnet
11:32 effie: upgrade memached in mc1023, mc2023
11:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2006.codfw.wmnet
11:30 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2016.codfw.wmnet
11:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2015.codfw.wmnet
11:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2006.codfw.wmnet
11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14895 and previous config saved to /var/cache/conftool/dbconfig/20210316-112931-root.json
11:28 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubernetes2006.codfw.wmnet
11:28 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2006.codfw.wmnet
11:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c444517: 4e66529: dff200b: Enable DiscussionTools features on several projects (T276493; T276498; T277103) (duration: 00m 57s)
11:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2005.codfw.wmnet
11:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2017.codfw.wmnet
11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f0d5465: Enable DiscussionTools beta features on enwiki (T273146) (duration: 00m 58s)
11:15 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2005.codfw.wmnet
11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14893 and previous config saved to /var/cache/conftool/dbconfig/20210316-111427-root.json
11:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 835f9ab: Enable ContentTranslation as a default tool in Amharic, Maltese and Uzbek Wikipedias (T276765) (duration: 01m 00s)
11:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2014.codfw.wmnet with reason: REIMAGE
11:08 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=kubemaster,name=.*,cluster=kubernetes
11:08 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=kubemaster,name=.*,cluster=kubernetes
11:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2013.codfw.wmnet with reason: REIMAGE
11:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2014.codfw.wmnet with reason: REIMAGE
11:05 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2012.codfw.wmnet with reason: REIMAGE
11:04 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2013.codfw.wmnet with reason: REIMAGE
11:03 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: REIMAGE
11:02 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2012.codfw.wmnet with reason: REIMAGE
11:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2011.codfw.wmnet with reason: REIMAGE
11:00 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2004.codfw.wmnet with reason: REIMAGE
10:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2017.codfw.wmnet
10:59 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: REIMAGE
10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14892 and previous config saved to /var/cache/conftool/dbconfig/20210316-105924-root.json
10:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2011.codfw.wmnet with reason: REIMAGE
10:58 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: REIMAGE
10:57 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2008.codfw.wmnet with reason: REIMAGE
10:55 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: REIMAGE
10:55 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2007.codfw.wmnet with reason: REIMAGE
10:55 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2004.codfw.wmnet with reason: REIMAGE
10:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2008.codfw.wmnet with reason: REIMAGE
10:53 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2003.codfw.wmnet with reason: REIMAGE
10:52 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2007.codfw.wmnet with reason: REIMAGE
10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14891 and previous config saved to /var/cache/conftool/dbconfig/20210316-105128-root.json
10:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2002.codfw.wmnet with reason: REIMAGE
10:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2003.codfw.wmnet with reason: REIMAGE
10:49 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2006.codfw.wmnet
10:49 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2005.codfw.wmnet
10:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2001.codfw.wmnet with reason: REIMAGE
10:49 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2002.codfw.wmnet with reason: REIMAGE
10:47 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2001.codfw.wmnet with reason: REIMAGE
10:47 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2015.codfw.wmnet
10:46 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,service=kubesvc,name=kubernetes2016.codfw.wmnet
10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P14890 and previous config saved to /var/cache/conftool/dbconfig/20210316-104420-root.json
10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14889 and previous config saved to /var/cache/conftool/dbconfig/20210316-103625-root.json
10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 60%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14887 and previous config saved to /var/cache/conftool/dbconfig/20210316-102121-root.json
10:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
10:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
10:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14886 and previous config saved to /var/cache/conftool/dbconfig/20210316-100617-root.json
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
10:03 moritzm: drain ganeti2012
10:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
09:59 akosiaris: Push new certs for kubemaster.svc.codfw.wmnet - T277191
09:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 49%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14885 and previous config saved to /var/cache/conftool/dbconfig/20210316-095113-root.json
09:50 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2006.codfw.wmnet
09:48 moritzm: drain ganeti2011
09:46 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2005.codfw.wmnet
09:46 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2006.codfw.wmnet
09:44 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2005.codfw.wmnet
09:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubetcd2004.codfw.wmnet
09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P14884 and previous config saved to /var/cache/conftool/dbconfig/20210316-094117-marostegui.json
09:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubetcd2004.codfw.wmnet
09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14883 and previous config saved to /var/cache/conftool/dbconfig/20210316-093609-root.json
09:34 akosiaris: poweroff acrux and acrab T277191
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14881 and previous config saved to /var/cache/conftool/dbconfig/20210316-092204-root.json
09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 20%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14880 and previous config saved to /var/cache/conftool/dbconfig/20210316-092106-root.json
09:18 akosiaris: switch restbase-async to eqiad since the kubernetes codfw cluster is being reinitialized and it makes little sense to have it there while the callers will run in eqiad only
09:15 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-async
09:12 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=restbase-async
09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=wikifeeds
09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=termbox
09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=similar-users
09:12 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=sessionstore
09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=recommendation-api
09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=push-notifications
09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=proton
09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mobileapps
09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mathoid
09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=linkrecommendation
09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventstreams-internal
09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventstreams
09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-main
09:11 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-logging-external
09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-analytics-external
09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-analytics
09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=echostore
09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=cxserver
09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=citoid
09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=blubberoid
09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=api-gateway
09:10 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=apertium
09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14879 and previous config saved to /var/cache/conftool/dbconfig/20210316-090701-root.json
09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 15%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14878 and previous config saved to /var/cache/conftool/dbconfig/20210316-090602-root.json
09:05 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:05 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:05 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:05 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:05 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:04 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:04 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:03 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:03 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:03 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:03 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:02 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:02 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:01 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:01 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:00 jayme@cumin1001: START - Cookbook sre.discovery.service-route
08:59 akosiaris: starting the k8s codfw cluster reinitialization process
08:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize codfw k8s cluster with new etcd
08:59 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 18 hosts with reason: Reinitialize codfw k8s cluster with new etcd
08:57 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
08:56 jayme@cumin1001: START - Cookbook sre.discovery.service-route
08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14877 and previous config saved to /var/cache/conftool/dbconfig/20210316-085157-root.json
08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14876 and previous config saved to /var/cache/conftool/dbconfig/20210316-085058-root.json
08:47 marostegui: Check tables on db2150 db2120 T276742
08:42 moritzm: remove Java 8 from contint/releases T269354
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: Slowly repool db1076', diff saved to https://phabricator.wikimedia.org/P14875 and previous config saved to /var/cache/conftool/dbconfig/20210316-083653-root.json
08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14874 and previous config saved to /var/cache/conftool/dbconfig/20210316-083555-root.json
08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 2%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14873 and previous config saved to /var/cache/conftool/dbconfig/20210316-082051-root.json
08:18 godog: enable nick enforcing for logmsgbot - T276303
08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 1%: Slowly repool db1162', diff saved to https://phabricator.wikimedia.org/P14872 and previous config saved to /var/cache/conftool/dbconfig/20210316-080547-root.json
07:51 godog: swift eqiad-prod: less weight for ms-be[1019-1026] - T272836
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14871 and previous config saved to /var/cache/conftool/dbconfig/20210316-072910-root.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14870 and previous config saved to /var/cache/conftool/dbconfig/20210316-071407-root.json
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14869 and previous config saved to /var/cache/conftool/dbconfig/20210316-065903-root.json
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2148', diff saved to https://phabricator.wikimedia.org/P14868 and previous config saved to /var/cache/conftool/dbconfig/20210316-065840-marostegui.json
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2108', diff saved to https://phabricator.wikimedia.org/P14867 and previous config saved to /var/cache/conftool/dbconfig/20210316-065814-marostegui.json
06:52 marostegui: Stop MySQL on db2120 to clone db2150 - T275633
06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2120 T275633', diff saved to https://phabricator.wikimedia.org/P14865 and previous config saved to /var/cache/conftool/dbconfig/20210316-065148-marostegui.json
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: Repool db1136', diff saved to https://phabricator.wikimedia.org/P14864 and previous config saved to /var/cache/conftool/dbconfig/20210316-064358-root.json
05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
05:35 marostegui: Stop MySQL on db1162 to clone db1162 T258361
05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P14862 and previous config saved to /var/cache/conftool/dbconfig/20210316-053516-marostegui.json

2021-03-15

23:31 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove back-compat from when IRC feed servers was a string (T224579) (duration: 00m 59s)
23:24 legoktm@deploy1002: Synchronized wmf-config/: Define IRC feed servers as an array in {Production,Labs}Services.php (T224579) (duration: 00m 59s)
23:23 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Support having multiple IRC feed servers (T224579) (duration: 00m 58s)
23:13 legoktm@deploy1002: conftool action : set/pooled=inactive; selector: name=mw2225.codfw.wmnet
23:11 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: GlobalWatchlist: allow watching up to 50 sites (T276195) (duration: 01m 04s)
21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2239.codfw.wmnet
21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2238.codfw.wmnet
21:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2237.codfw.wmnet
21:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2236.codfw.wmnet
21:02 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@4300929]: convert_to_esbulk: Accept partial hour timestamps (duration: 03m 02s)
20:59 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@4300929]: convert_to_esbulk: Accept partial hour timestamps
20:55 legoktm: re-enabled puppet on kubestage2001, uncordoned kubestage2002
20:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2225.codfw.wmnet
19:57 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@82e0654]: prepare_mw_rev_score: Correct scores_export to bulk_ingest (duration: 01m 49s)
19:55 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@82e0654]: prepare_mw_rev_score: Correct scores_export to bulk_ingest
19:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2225.codfw.wmnet
19:53 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mw2224.codfw.wmnet
19:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2224.codfw.wmnet
19:43 eevans@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
19:37 eevans@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
19:27 eevans@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
18:56 dduvall@deploy1002: Synchronized .pipeline: config: Initial multiversion pipeline configuration pipeline: add building the webserver image (T274182) (duration: 00m 59s)
18:55 dduvall@deploy1002: Synchronized multiversion/: config: Initial multiversion pipeline configuration pipeline: add building the webserver image (T274182) (duration: 00m 59s)
18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e5a7284: Enable DiscussionsTools for enwikibooks (T276851) (duration: 00m 59s)
18:41 legoktm: puppet disabled on kubestage1001 for debugging docker-registry credentials
18:38 urbanecm@deploy1002: Synchronized wmf-config/config/enwikibooks.yaml: b6a8df0: Enable visualeditor on enwikibooks by default (T276851; 2/2) (duration: 01m 00s)
18:37 foks: removing 1 file from eowiki, for legal compliance
18:35 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: b6a8df0: Enable visualeditor on enwikibooks by default (T276851; 1/2) (duration: 00m 58s)
18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b70a75c: Configure default search namespaces for thwikisource (T275280) (duration: 00m 59s)
18:18 hoo: Updated the Wikidata property suggester with data from the 2021-03-08 JSON dump (with pre-applied T132839 workarounds)
18:17 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: a7eb550: Use master version of clientError.js (duration: 00m 58s)
18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a8234a9: Add deleterevision right to botadmin group on fawiki (T277358) (duration: 00m 59s)
18:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2223.codfw.wmnet
18:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2235.codfw.wmnet
18:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2234.codfw.wmnet
17:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2223.codfw.wmnet
17:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2222.codfw.wmnet
17:30 hnowlan: disabling puppet on aqs100[4-9].eqiad.wmnet to test change to password logic in puppet
17:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2222.codfw.wmnet
17:29 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2223.codfw.wmnet
17:29 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2222.codfw.wmnet
17:29 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2221.codfw.wmnet
17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2221.codfw.wmnet
17:03 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
17:03 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
16:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2221.codfw.wmnet
16:58 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2224.codfw.wmnet
16:58 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2220.codfw.wmnet
16:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2220.codfw.wmnet
16:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2224.codfw.wmnet
16:48 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2224.codfw.wmnet
16:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2220.codfw.wmnet
16:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2233.codfw.wmnet
16:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2232.codfw.wmnet
16:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2231.codfw.wmnet
16:29 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet
16:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet
16:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet
16:23 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
16:23 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
16:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
16:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
16:06 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
16:05 moritzm: draining ganeti2010
16:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
15:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
15:48 moritzm: draining ganeti2009
15:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2007.codfw.wmnet
15:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti2007.codfw.wmnet
15:33 moritzm: draining ganeti2007
15:27 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: REIMAGE
15:24 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: REIMAGE
15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14858 and previous config saved to /var/cache/conftool/dbconfig/20210315-151648-root.json
15:16 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
15:14 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
15:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
15:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
15:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14857 and previous config saved to /var/cache/conftool/dbconfig/20210315-150144-root.json
14:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14856 and previous config saved to /var/cache/conftool/dbconfig/20210315-144641-root.json
14:36 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
14:36 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
14:32 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
14:32 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
14:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P14855 and previous config saved to /var/cache/conftool/dbconfig/20210315-143137-root.json
14:28 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P14854 and previous config saved to /var/cache/conftool/dbconfig/20210315-140809-marostegui.json
14:04 dcausse: re-pooling wdqs1005
13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14853 and previous config saved to /var/cache/conftool/dbconfig/20210315-135426-root.json
13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14852 and previous config saved to /var/cache/conftool/dbconfig/20210315-133921-root.json
13:25 Urbanecm: Deploy security patch for T152394
13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14851 and previous config saved to /var/cache/conftool/dbconfig/20210315-132418-root.json
13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14849 and previous config saved to /var/cache/conftool/dbconfig/20210315-130914-root.json
12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P14848 and previous config saved to /var/cache/conftool/dbconfig/20210315-123930-marostegui.json
12:32 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/MobileFrontend/: 41a2aaa: Revert "Rewite MoveLeadParagraphTransform based on mobile apps approach" (T277302) (duration: 00m 58s)
12:31 Lucas_WMDE: maintenance scripts for T270249 completed successfully, no more terms for deleted items found on stat1007
12:30 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/GrowthExperiments/: fa2abfa: Manual submodule update of GrowthExperiments repository (T276966) (duration: 00m 59s)
12:29 Lucas_WMDE: RemoveDeletedItemsFromTermStore.php finished in 5m39s
12:23 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 5555,9593p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, remaining 4039 items
12:22 Lucas_WMDE: RemoveDeletedItemsFromTermStore.php finished in 8min
12:19 _joe_: depooled mw1347 for testing
12:13 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 555,5554p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, 5000 items
12:12 Lucas_WMDE: finished in 43s
12:11 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 55,554p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, 500 items
12:10 Lucas_WMDE: finished in 5.1s
12:10 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds "$(sed -n 5,54p T270249.ids | tr '\n' ',' | sed 's/,$//')" # T270249, 50 items
11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14847 and previous config saved to /var/cache/conftool/dbconfig/20210315-115826-root.json
11:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
11:50 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14846 and previous config saved to /var/cache/conftool/dbconfig/20210315-114323-root.json
11:34 moritzm: restarting FPM on mw canaries to pick up new libtiff
11:30 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
11:28 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14844 and previous config saved to /var/cache/conftool/dbconfig/20210315-112819-root.json
11:22 moritzm: installing tiff security updates
11:17 moritzm: installing golang-1.7 security updates
11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P14843 and previous config saved to /var/cache/conftool/dbconfig/20210315-111315-root.json
11:00 volans: upgraded spicerack on cumin1001 to 0.0.49-1+deb10u1
10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P14842 and previous config saved to /var/cache/conftool/dbconfig/20210315-105855-marostegui.json
10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14841 and previous config saved to /var/cache/conftool/dbconfig/20210315-105820-root.json
10:56 volans@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2001.codfw.wmnet with reason: test
10:55 volans@cumin2001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2001.codfw.wmnet with reason: test
10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14840 and previous config saved to /var/cache/conftool/dbconfig/20210315-104316-root.json
10:42 moritzm: installing pygments security updates on buster
10:33 volans: upgraded spicerack on cumin2001 to 0.0.49-1+deb10u1
10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14839 and previous config saved to /var/cache/conftool/dbconfig/20210315-102813-root.json
10:26 kormat@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: schema change T267767', diff saved to https://phabricator.wikimedia.org/P14838 and previous config saved to /var/cache/conftool/dbconfig/20210315-102648-kormat.json
10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: Repool db1076', diff saved to https://phabricator.wikimedia.org/P14837 and previous config saved to /var/cache/conftool/dbconfig/20210315-101309-root.json
10:11 kormat@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: schema change T267767', diff saved to https://phabricator.wikimedia.org/P14836 and previous config saved to /var/cache/conftool/dbconfig/20210315-101143-kormat.json
10:03 kormat@cumin1001: dbctl commit (dc=all): 'db1114 depooling: schema change T267767', diff saved to https://phabricator.wikimedia.org/P14835 and previous config saved to /var/cache/conftool/dbconfig/20210315-100337-kormat.json
10:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1114.eqiad.wmnet with reason: schema change T267767
10:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1114.eqiad.wmnet with reason: schema change T267767
09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P14834 and previous config saved to /var/cache/conftool/dbconfig/20210315-095607-marostegui.json
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14833 and previous config saved to /var/cache/conftool/dbconfig/20210315-094920-root.json
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14832 and previous config saved to /var/cache/conftool/dbconfig/20210315-093416-root.json
09:23 vgutierrez: rolling restart of LVS cluster to bump depool_threshold to 0.8 on text & upload clusters - T274888
09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14831 and previous config saved to /var/cache/conftool/dbconfig/20210315-091912-root.json
09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14830 and previous config saved to /var/cache/conftool/dbconfig/20210315-090409-root.json
08:54 marostegui: Stop MySQL on db1136 T277007
08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 T277007', diff saved to https://phabricator.wikimedia.org/P14829 and previous config saved to /var/cache/conftool/dbconfig/20210315-085409-marostegui.json
08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P14828 and previous config saved to /var/cache/conftool/dbconfig/20210315-083555-marostegui.json
08:33 godog: swift eqiad-prod remove decom hosts from account/container rings - T272836 T276193
08:33 marostegui: Repool labsdb1009 T276980
07:22 elukey: powercycle ms-be1038 - no ssh, no tty available in mgmt serial console, irrecoverable error saved in ilo's system logs

2021-03-14

17:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14827 and previous config saved to /var/cache/conftool/dbconfig/20210314-175751-root.json
17:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14826 and previous config saved to /var/cache/conftool/dbconfig/20210314-174248-root.json
17:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14825 and previous config saved to /var/cache/conftool/dbconfig/20210314-172744-root.json
17:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P14824 and previous config saved to /var/cache/conftool/dbconfig/20210314-171240-root.json
14:43 gehel: depool wdqs1005 and restart blazegraph - will keep depooled until this server has catched up on lag

2021-03-13

19:02 Amir1: change default charset of all core tables in labstestwiki to binary (T269348)
18:53 Amir1: run schema changes for varbinary on wikitech (T269348)
17:38 twentyafterfour: restarted apache on gerrit1001 to resolve apache worker exhaustion see T277127
16:57 Reedy: gerrit web interface is slow/timing out
01:18 ryankemper: T266470 Re-enabled icinga service notifications for `Check no envoy runtime configuration is left persistent` on `wdqs100[9,10]`
01:04 ryankemper: T266470 merged https://gerrit.wikimedia.org/r/c/operations/dns/+/668255 && `ryankemper@authdns1001:~$ sudo authdns-update`
00:55 mutante: [wdqs1009:/etc/envoy] $ sudo /usr/local/sbin/build-envoy-config -c /etc/envoy/

2021-03-12

22:53 ryankemper: T266470 Manually disabled service notifications for `Check no envoy runtime configuration is left persistent`, will need to circle back on Monday to restore notifications
22:10 legoktm: imported mailman-puppetmaster.mailman.eqiad1.wikimedia.cloud facts to puppet-compiler
21:52 mutante: puppetmaster1001 sudo puppet cert clean testreduce.discovery.wmnet (T266509)
21:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2219.codfw.wmnet
20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2219.codfw.wmnet
20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2218.codfw.wmnet
20:32 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2218.codfw.wmnet
20:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2217.codfw.wmnet
20:22 eevans@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2217.codfw.wmnet
20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2219.codfw.wmnet
20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2218.codfw.wmnet
20:14 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2217.codfw.wmnet
19:47 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2376.codfw.wmnet,service=canary
19:47 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw2374.codfw.wmnet,service=canary
19:47 ebernhardson: start in-place reindex testwiki in eqiad, codfw, cloudelastic cirrus clusters for T269493
19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2374.codfw.wmnet
19:41 mutante: mw2374, mw2376 - depooling to turn them into canaries
19:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2376.codfw.wmnet
19:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2374.codfw.wmnet
19:09 cstone: tools revision changed from 532f8ecb33 to b7b4060c30
18:28 bblack: authdns1001.wikimedia.org,dns2001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
18:24 bblack: dns[15]001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
18:21 bblack: dns[34]001.wikimedia.org - upgrade gdnsd to 3.6.0 (half the servers have been on this for a couple weeks now, just finishing up the rollout)
18:03 mutante: depooling mw2244,mw2245 (API on old hardware), mw2229,mw2230 (app on old hardware) - T277119
18:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2245.codfw.wmnet
18:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2244.codfw.wmnet
18:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2230.codfw.wmnet
18:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2229.codfw.wmnet
17:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
17:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
16:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
14:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14818 and previous config saved to /var/cache/conftool/dbconfig/20210312-143450-root.json
14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14817 and previous config saved to /var/cache/conftool/dbconfig/20210312-141947-root.json
14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14816 and previous config saved to /var/cache/conftool/dbconfig/20210312-140443-root.json
13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P14815 and previous config saved to /var/cache/conftool/dbconfig/20210312-134940-root.json
13:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1088.eqiad.wmnet
13:14 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1088.eqiad.wmnet
13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312', diff saved to https://phabricator.wikimedia.org/P14814 and previous config saved to /var/cache/conftool/dbconfig/20210312-131033-marostegui.json
12:12 vgutierrez: restart ats-tls on cp3051
11:55 effie: upgrade memcached on mc1022, mc2022
11:22 hnowlan: corrected git_server for logstash-logback-encoder, cassandra/twcs and cassandra/metrics-collector on deploy1002
09:45 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
09:45 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
09:44 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
09:43 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
09:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx1001.wikimedia.org
09:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mx1001.wikimedia.org
09:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2001.wikimedia.org
09:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mx2001.wikimedia.org
09:07 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling (duration: 01m 35s)
09:05 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling
09:00 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling (duration: 00m 09s)
09:00 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling
08:59 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling (duration: 00m 10s)
08:59 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@9a408b2]: T273847 export queries to relforge dag deployment - elastic-template handling
08:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
08:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2002.codfw.wmnet
08:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host pybal-test2002.codfw.wmnet
08:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
08:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
08:01 moritzm: installing openjpeg2 security updates
07:16 marostegui: Stop mysql on db2108 to clone db2148
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 T276742', diff saved to https://phabricator.wikimedia.org/P14811 and previous config saved to /var/cache/conftool/dbconfig/20210312-071628-marostegui.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14810 and previous config saved to /var/cache/conftool/dbconfig/20210312-071400-root.json
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2148 T276742', diff saved to https://phabricator.wikimedia.org/P14809 and previous config saved to /var/cache/conftool/dbconfig/20210312-070219-marostegui.json
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 60%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14808 and previous config saved to /var/cache/conftool/dbconfig/20210312-065857-root.json
06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314 for table checking T276742', diff saved to https://phabricator.wikimedia.org/P14807 and previous config saved to /var/cache/conftool/dbconfig/20210312-065008-marostegui.json
06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 30%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14806 and previous config saved to /var/cache/conftool/dbconfig/20210312-064353-root.json
06:30 marostegui: Deploy schema change on s2 codfw master, lag will appear - T276150 T276156
06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P14805 and previous config saved to /var/cache/conftool/dbconfig/20210312-062850-root.json
06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for schema change', diff saved to https://phabricator.wikimedia.org/P14804 and previous config saved to /var/cache/conftool/dbconfig/20210312-061306-marostegui.json
06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1088 from dbctl T276025', diff saved to https://phabricator.wikimedia.org/P14803 and previous config saved to /var/cache/conftool/dbconfig/20210312-061118-marostegui.json
04:14 eileen: tools revision changed from d64b2f8cee to 532f8ecb33
01:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw2215.codfw.wmnet
00:58 mutante: shutting down mw2215
00:57 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2215.codfw.wmnet

2021-03-11

22:55 mutante: depooled mw2224 through mw2228 but not removing from DSH groups yet (T277119)
22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2228.codfw.wmnet
22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2227.codfw.wmnet
22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2226.codfw.wmnet
22:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2225.codfw.wmnet
22:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2224.codfw.wmnet
22:50 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:48 dzahn@cumin1001: START - Cookbook sre.dns.netbox
22:47 mutante: running DNS cookbook in an attempt to remove mw2216
22:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2216.codfw.wmnet
22:41 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.34
22:36 brennen: train status: 1.36.0-wmf.34 (T274938): T277229 and T266517 related issues hopefully resolved, rolling forward to all wikis
22:34 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: Backport: Do not log script errors without file uri (T266517) (duration: 01m 07s)
22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:30 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/MobileFrontend/includes/: Backport: Revert "Fix: Save user options only once when Advanced Mode is toggled" (T277229) (duration: 01m 09s)
22:28 dzahn@cumin1001: START - Cookbook sre.dns.netbox
21:57 Amir1: run populate pages in cognate (T259360)
21:28 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2222.codfw.wmnet
21:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2223.codfw.wmnet
21:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2221.codfw.wmnet
21:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2220.codfw.wmnet
21:21 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "all wikis to 1.36.0-wmf.34"
21:20 brennen: train status: 1.36.0-wmf.34 (T274938): rolling back to group1 and marking T277229 a train blocker
21:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1003.eqiad.wmnet with reason: REIMAGE
21:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1003.eqiad.wmnet with reason: REIMAGE
{{safesubst:SAL entry|1=21:14 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:670858|Enable GrowthExperiments link recommendations on testwiki (T277173)] (duration: 00m 59s)}}
21:13 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@3810277]: T273847 export queries to relforge dag deployment - correct start date (duration: 01m 53s)
21:12 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@3810277]: T273847 export queries to relforge dag deployment - correct start date
21:05 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2216.codfw.wmnet
21:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts mw2215.codfw.wmnet
21:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
21:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
21:03 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2215.codfw.wmnet
21:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw2216.codfw.wmnet with reason: decom
21:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw2216.codfw.wmnet with reason: decom
21:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw2215.codfw.wmnet with reason: decom
21:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw2215.codfw.wmnet with reason: decom
21:00 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
21:00 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
20:58 mutante: deactivating codfw API canaries on old hardware (T277119)
20:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2216.codfw.wmnet
20:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2215.codfw.wmnet
20:50 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
20:46 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@cc478d4]: T273847 export queries to relforge dag deployment (duration: 02m 09s)
20:44 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@cc478d4]: T273847 export queries to relforge dag deployment
20:35 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
20:33 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
20:28 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
20:20 mutante: phab1001 - systemctl start phabricator_clean_tmp_files - now Succeeded
20:17 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1002.eqiad.wmnet
20:13 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host matomo1002.eqiad.wmnet
20:04 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.34
19:59 mutante: phab1001 - sudo systemctl start phabricator_clean_tmp_files (manually run after conversion from cron to timer, and it fails with permission issues)
19:55 tgr_: T277173 running mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=testwiki GrowthExperiments
19:54 tgr@deploy1002: Synchronized wmf-config/: Config: Configure GrowthExperiments Add Link settings, step 2 (T277173) (duration: 01m 08s)
19:43 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:30 tgr@deploy1002: Synchronized wmf-config/: Config: Configure GrowthExperiments Add Link settings, step 1 (T277173) (duration: 01m 08s)
19:18 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wikitech: enable BetaFeatures (T125941) (duration: 01m 08s)
19:13 hnowlan@deploy1002: Finished deploy [restbase/deploy@6f0fe23]: Remove internal ratelimits that were causing service proxy issues (duration: 16m 25s)
18:56 hnowlan@deploy1002: Started deploy [restbase/deploy@6f0fe23]: Remove internal ratelimits that were causing service proxy issues
18:47 tgr_: running mwscript extensions/GrowthExperiments/maintenance/importOresTopics.php testwiki --count 1000 --verbose --wikiId enwiki --apiUrl 'https://en.wikipedia.org/w/api.php'
17:31 effie: install mecached 1.6.6-1 on mwdebug1001
16:26 effie: upgrade memcached on mc1021, mc2021
16:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:14 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14802 and previous config saved to /var/cache/conftool/dbconfig/20210311-161138-root.json
15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 60%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14801 and previous config saved to /var/cache/conftool/dbconfig/20210311-155635-root.json
15:53 cmjohnson1: updating firmware wdqs1009 T274751
15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 30%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14800 and previous config saved to /var/cache/conftool/dbconfig/20210311-154131-root.json
15:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P14799 and previous config saved to /var/cache/conftool/dbconfig/20210311-152627-root.json
15:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P14798 and previous config saved to /var/cache/conftool/dbconfig/20210311-151435-marostegui.json
15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 100%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14797 and previous config saved to /var/cache/conftool/dbconfig/20210311-150707-root.json
14:55 klausman: restarting pybal on lvs2009 T272918
14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 60%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14796 and previous config saved to /var/cache/conftool/dbconfig/20210311-145204-root.json
14:50 klausman: restarting pybal on lvs1016 T272918
14:49 klausman: restarting pybal on lvs2010 T272918
14:46 moritzm: installing openssl (1.1) security updates for stretch
14:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 30%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14795 and previous config saved to /var/cache/conftool/dbconfig/20210311-143700-root.json
14:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 10%: Repool db1113:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14794 and previous config saved to /var/cache/conftool/dbconfig/20210311-142157-root.json
14:07 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836
14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P14793 and previous config saved to /var/cache/conftool/dbconfig/20210311-140526-marostegui.json
14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14792 and previous config saved to /var/cache/conftool/dbconfig/20210311-140328-root.json
14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2149 into s3', diff saved to https://phabricator.wikimedia.org/P14791 and previous config saved to /var/cache/conftool/dbconfig/20210311-140119-marostegui.json
13:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
13:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 60%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14790 and previous config saved to /var/cache/conftool/dbconfig/20210311-134825-root.json
13:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
13:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
13:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
13:33 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:33 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 30%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14789 and previous config saved to /var/cache/conftool/dbconfig/20210311-133321-root.json
13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P14788 and previous config saved to /var/cache/conftool/dbconfig/20210311-131818-root.json
13:04 moritzm: installing openssl1.0 security updates on stretch
13:03 arturo: copy python-mwclient 0.8.4-1 from stretch-wikimedia to buster-wikimedia for T275865
13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P14787 and previous config saved to /var/cache/conftool/dbconfig/20210311-130208-marostegui.json
13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14786 and previous config saved to /var/cache/conftool/dbconfig/20210311-130103-root.json
13:00 hnowlan: imported cassandra_2.2.6-wmf5 to buster-wikimedia
12:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 60%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14785 and previous config saved to /var/cache/conftool/dbconfig/20210311-124559-root.json
12:39 hnowlan: imported cassandra_2.2.6-wmf1 to buster-wikimedia
12:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
12:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 30%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14783 and previous config saved to /var/cache/conftool/dbconfig/20210311-123056-root.json
12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
12:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
12:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
12:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
12:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
12:16 Lucas_WMDE: EU backport&config window done
12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 10%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14782 and previous config saved to /var/cache/conftool/dbconfig/20210311-121552-root.json
12:13 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds 581768,739279,774383,852302 # T270249, finished in 1.124s
12:12 Lucas_WMDE: finished in 1.124s real time
12:12 Lucas_WMDE: start of lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/RemoveDeletedItemsFromTermStore.php wikidatawiki --itemIds 581768,739279,774383,852302
12:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/LabsServices.php: Config: Update comment for irc.beta.wmflabs.org (T277081) (comment-only beta-only change) (duration: 01m 13s)
12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
12:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix obsolete comments on wgCheckUserLogLogins (T253802) (duration: 01m 08s)
12:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P14781 and previous config saved to /var/cache/conftool/dbconfig/20210311-120554-marostegui.json
12:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
11:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
11:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
11:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
11:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
11:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
11:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
11:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
11:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
11:37 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
11:35 klausman@cumin1001: START - Cookbook sre.dns.netbox
11:34 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
11:31 klausman@cumin1001: START - Cookbook sre.dns.netbox
11:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14778 and previous config saved to /var/cache/conftool/dbconfig/20210311-112747-root.json
11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
11:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
11:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
11:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 60%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14777 and previous config saved to /var/cache/conftool/dbconfig/20210311-111243-root.json
11:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
11:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
10:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 30%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14776 and previous config saved to /var/cache/conftool/dbconfig/20210311-105740-root.json
10:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
10:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
10:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P14775 and previous config saved to /var/cache/conftool/dbconfig/20210311-104236-root.json
10:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
10:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
10:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
10:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
10:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
10:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P14774 and previous config saved to /var/cache/conftool/dbconfig/20210311-101714-marostegui.json
10:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
10:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1039.eqiad.wmnet
10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2149 to dbctl, depooled, T275633', diff saved to https://phabricator.wikimedia.org/P14773 and previous config saved to /var/cache/conftool/dbconfig/20210311-101604-marostegui.json
10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14772 and previous config saved to /var/cache/conftool/dbconfig/20210311-101008-root.json
10:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1039.eqiad.wmnet
10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2109', diff saved to https://phabricator.wikimedia.org/P14771 and previous config saved to /var/cache/conftool/dbconfig/20210311-100705-marostegui.json
10:00 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 60%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14770 and previous config saved to /var/cache/conftool/dbconfig/20210311-095504-root.json
09:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1038.eqiad.wmnet
09:45 marostegui: Deploy schema change on s5 codfw master, lag will appear - T276150 T276156
09:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1038.eqiad.wmnet
09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 30%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14769 and previous config saved to /var/cache/conftool/dbconfig/20210311-094000-root.json
09:35 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1037.eqiad.wmnet
09:31 hashar: Restarting CI Jenkins
09:29 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P14768 and previous config saved to /var/cache/conftool/dbconfig/20210311-092457-root.json
09:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1037.eqiad.wmnet
09:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1036.eqiad.wmnet
09:19 effie: upgrade memcached on mc1020, mc2020
09:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1036.eqiad.wmnet
09:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1035.eqiad.wmnet
09:08 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
09:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1035.eqiad.wmnet
09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P14767 and previous config saved to /var/cache/conftool/dbconfig/20210311-090342-marostegui.json
09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14766 and previous config saved to /var/cache/conftool/dbconfig/20210311-090312-root.json
09:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1033.eqiad.wmnet
08:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1033.eqiad.wmnet
08:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1032.eqiad.wmnet
08:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1032.eqiad.wmnet
08:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1031.eqiad.wmnet
08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 60%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14765 and previous config saved to /var/cache/conftool/dbconfig/20210311-084809-root.json
08:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1031.eqiad.wmnet
08:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1030.eqiad.wmnet
08:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1030.eqiad.wmnet
08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 30%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14764 and previous config saved to /var/cache/conftool/dbconfig/20210311-083305-root.json
08:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1029.eqiad.wmnet
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109', diff saved to https://phabricator.wikimedia.org/P14762 and previous config saved to /var/cache/conftool/dbconfig/20210311-082546-marostegui.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2074', diff saved to https://phabricator.wikimedia.org/P14761 and previous config saved to /var/cache/conftool/dbconfig/20210311-082528-marostegui.json
08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074', diff saved to https://phabricator.wikimedia.org/P14760 and previous config saved to /var/cache/conftool/dbconfig/20210311-082445-marostegui.json
08:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1029.eqiad.wmnet
08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1028.eqiad.wmnet
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 10%: Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P14759 and previous config saved to /var/cache/conftool/dbconfig/20210311-081801-root.json
08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1028.eqiad.wmnet
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2108 T275633', diff saved to https://phabricator.wikimedia.org/P14758 and previous config saved to /var/cache/conftool/dbconfig/20210311-081010-marostegui.json
08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2148 to s2 T275633', diff saved to https://phabricator.wikimedia.org/P14757 and previous config saved to /var/cache/conftool/dbconfig/20210311-080944-marostegui.json
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P14756 and previous config saved to /var/cache/conftool/dbconfig/20210311-074352-marostegui.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14755 and previous config saved to /var/cache/conftool/dbconfig/20210311-073741-root.json
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 60%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14754 and previous config saved to /var/cache/conftool/dbconfig/20210311-072237-root.json
07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 30%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14753 and previous config saved to /var/cache/conftool/dbconfig/20210311-070734-root.json
06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 10%: Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P14752 and previous config saved to /var/cache/conftool/dbconfig/20210311-065230-root.json
06:48 marostegui: Stop mysql on db2108 to clone db2148 T275633
06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 T275633', diff saved to https://phabricator.wikimedia.org/P14750 and previous config saved to /var/cache/conftool/dbconfig/20210311-064821-marostegui.json
06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P14749 and previous config saved to /var/cache/conftool/dbconfig/20210311-063814-marostegui.json
06:36 marostegui: Drop testreduce from m5 - T276787
05:34 thcipriani: restarted apache2 on gerrit1001
00:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2219.codfw.wmnet
00:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2218.codfw.wmnet
00:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2217.codfw.wmnet
00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2216.codfw.wmnet
00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2215.codfw.wmnet

2021-03-10

23:49 mholloway-shell@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/EventLogging: EventLogging: Stream always in sample if the user is in debugMode (T276515) (duration: 01m 23s)
23:41 dwisehaupt: disabled silverpop daily run in process-control until utf8mb4 conversion completes on frdev1001
23:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: REIMAGE
23:10 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: REIMAGE
23:10 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry1002.eqiad.wmnet
23:01 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry1002.eqiad.wmnet
22:55 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry[2001-2002].codfw.wmnet
22:51 andrewbogott: updating puppet compiler facts to catch up with a new custom fact
22:44 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry[2001-2002].codfw.wmnet
22:40 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry1001.eqiad.wmnet
22:32 brennen@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.34 (duration: 01m 30s)
22:30 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.34
22:27 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry1001.eqiad.wmnet
22:26 brennen: train status: 1.36.0-wmf.34 (T274938): T277094 believed resolved, promoting to group1.
22:25 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: Backport: Fix client error logging (T277094) (duration: 01m 09s)
21:53 mutante: ferm/iptables docker NAT rules applied by puppet on releases servers after breaking out fules into their own profile class (T276869)
21:51 dwisehaupt: upgraded mariadb and keeping replication stopped on frdb1002 to start the utf8mb4 table alters under a root screen session
21:43 brennen: train status: 1.36.0-wmf.34 (T274938): client errors may still be missing for group0; continuing to hold for T277094 until we know what's broken.
21:40 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: Backport: Revert "Error in shouldLog logic drops most errors" (T277094) (duration: 01m 08s)
21:38 dwisehaupt: stopping mysql replication on frdev1001 and starting utf8mb4 table alters under a root screen session
21:38 dwisehaupt: stopping mysql replication on frdb1003 and starting utf8mb4 table alters under a root screen session
21:30 brennen: train status: 1.36.0-wmf.34 (T274938): logstash client error board was set up incorrectly; reverting earlier patch for T277094 and will proceed to group1.
21:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cdc47f3: jawiki: Growth features: Add help panel links (T276830) (duration: 01m 08s)
21:16 eileen: civicrm revision changed from b13e70d968 to 550be50105, config revision is 970b10b0b3
21:13 cdanis@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
21:00 cdanis@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
20:57 cdanis@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
20:56 Urbanecm: Fixing wrong sync message: urbanecm@deploy1002 Synchronized dblists/growthexperiments.dblist f72c3d6: jawiki: Enable Growth features in stealth mode (T276830) (duration: 01m 08s)
20:56 Urbanecm: Fixing wrong sync message: urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: f72c3d6: jawiki: Enable Growth features in stealth mode (T276830) (duration: 01m 07s)
20:54 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 92ae985: thwiki: Make Growth features available to newcomers (T274646) (duration: 01m 08s)
20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 92ae985: thwiki: Make Growth features available to newcomers (T274646) (duration: 01m 07s)
20:50 cdanis@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
20:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 92ae985: thwiki: Make Growth features available to newcomers (T274646) (duration: 01m 08s)
20:41 brennen@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: Backport: Error in shouldLog logic drops most errors (T277094) (duration: 01m 14s)
20:36 cdanis@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
19:58 brennen: train status: 1.36.0-wmf.34 (T274938): currently blocked at group0 as client error logging is broken (UBN ticket incoming), will hold for patch.
19:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a130e9f: Enable Growth features on eowiki in stealth mode (T276123) (duration: 01m 08s)
19:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: REIMAGE
19:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: REIMAGE
19:32 ryankemper: T266470 `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo enable-puppet "revoking old cert and generating new one with new alt_names - T266470 - root"'` && `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo run-puppet-agent'`
19:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 84271f6: Enable DiscussionTools beta features on frwiktionary (T276189) (duration: 01m 09s)
19:28 ryankemper: T266470 `ryankemper@wdqs1004:~$ sudo enable-puppet "revoking old cert and generating new one with new alt_names - T266470 - root"` && `sudo run-puppet-agent`
19:27 ryankemper: T266470 `/srv/private` commit SHA for this change is `45852086679616bccb5bba3dd6396082b0f25a3d`
19:26 ryankemper: T266470 `sudo chown -Rv gitpuppet:gitpuppet /srv/private/modules/secret/secrets/certificates/wdqs.discovery.wmnet/` && `sudo chown -v gitpuppet:gitpuppet /srv/private/modules/secret/secrets/ssl/wdqs.discovery.wmnet.key`
19:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5093618: Enable DiscussionTools beta feature for newtopictool on most wikis (T275827) (duration: 01m 08s)
19:23 ryankemper: T266470 Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/670562 (copies over new pubkey)
19:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4824679: Disable DiscussionTools Reply Tool A/B test (T276967) (duration: 01m 07s)
19:22 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/DiscussionTools/includes/Hooks/HookUtils.php: 9cb48f0: Allow users to continue using reply tool after disabling A/B test (T276967) (duration: 01m 07s)
19:20 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/DiscussionTools/includes/Hooks/HookUtils.php: 4193ff7: Allow users to continue using reply tool after disabling A/B test (T276967) (duration: 01m 09s)
19:18 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: e998086: searchSatisfaction: Allow for async initialisation (T274869) (duration: 01m 08s)
19:18 ryankemper: T266470 `sudo cergen -c 'wdqs.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d`
19:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: REIMAGE
19:16 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: d9bad12: searchSatisfaction: Allow for async initialisation (T274869) (duration: 01m 08s)
19:16 ryankemper: T266470 `sudo rm -fv certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.crt.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.csr.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.jks certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.p12 certificates/wdqs.discovery.wmnet/truststore.jks` (full paths not provided to fit the IRC line)
19:15 ryankemper: T266470 `sudo puppet cert clean wdqs.discovery.wmnet`
19:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: REIMAGE
19:14 ryankemper: T266470 on `ryankemper@cumin1001`: `sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T266470"'`
19:14 ryankemper: T266470 Temporarily disabling puppet on all `wdqs*` hosts in preparation for `wdqs.discovery.wmnet` certificate revocation
19:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: fe99c31: Remove unused config for InukaPageView (T265921) (duration: 01m 26s)
18:56 dwisehaupt: all fundraising servers are now running buster - T254198
18:37 mforns@deploy1002: Finished deploy [analytics/refinery@7fbc3c7] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656] (duration: 04m 12s)
18:33 mforns@deploy1002: Started deploy [analytics/refinery@7fbc3c7] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656]
18:33 mforns@deploy1002: Finished deploy [analytics/refinery@7fbc3c7] (thin): Regular analytics weekly train THIN [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656] (duration: 00m 07s)
18:33 mforns@deploy1002: Started deploy [analytics/refinery@7fbc3c7] (thin): Regular analytics weekly train THIN [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656]
18:32 mforns@deploy1002: Finished deploy [analytics/refinery@7fbc3c7]: Regular analytics weekly train [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656] (duration: 14m 30s)
18:18 mforns@deploy1002: Started deploy [analytics/refinery@7fbc3c7]: Regular analytics weekly train [analytics/refinery@7fbc3c700ccb3c598690da9a38990ef7cb187656]
17:48 mutante: new Wikimedia project language "trv" added - Seediq is an Atayalic language spoken in the mountains of Northern Taiwan by the Seediq and Taroko people.
17:45 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: REIMAGE
17:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: REIMAGE
17:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: REIMAGE
17:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: REIMAGE
16:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1030.eqiad.wmnet
16:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: REIMAGE
16:50 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1030.eqiad.wmnet
16:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: REIMAGE
16:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: REIMAGE
16:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: REIMAGE
16:20 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: REIMAGE
16:18 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: REIMAGE
15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repool db1127 after schema change', diff saved to https://phabricator.wikimedia.org/P14744 and previous config saved to /var/cache/conftool/dbconfig/20210310-153324-root.json
15:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sodium.wikimedia.org
15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 60%: Repool db1127 after schema change', diff saved to https://phabricator.wikimedia.org/P14743 and previous config saved to /var/cache/conftool/dbconfig/20210310-151820-root.json
15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sodium.wikimedia.org
15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 30%: Repool db1127 after schema change', diff saved to https://phabricator.wikimedia.org/P14742 and previous config saved to /var/cache/conftool/dbconfig/20210310-150316-root.json
14:53 klausman@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: cluster=ml_serve,service=kubemaster
14:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Repool db1127 after schema change', diff saved to https://phabricator.wikimedia.org/P14741 and previous config saved to /var/cache/conftool/dbconfig/20210310-144813-root.json
14:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
14:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P14740 and previous config saved to /var/cache/conftool/dbconfig/20210310-143547-marostegui.json
14:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
14:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14739 and previous config saved to /var/cache/conftool/dbconfig/20210310-142316-root.json
14:19 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
14:19 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
14:19 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
14:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
14:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14738 and previous config saved to /var/cache/conftool/dbconfig/20210310-140812-root.json
14:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
14:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
14:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
13:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
13:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14736 and previous config saved to /var/cache/conftool/dbconfig/20210310-135309-root.json
13:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
13:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
13:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
13:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
13:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
13:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
13:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
13:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
13:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
13:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2048.codfw.wmnet
13:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1029.eqiad.wmnet
13:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
12:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
12:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
12:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
12:52 ariel@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
12:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
12:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
12:47 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1029.eqiad.wmnet
12:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 623ed48: nowiki: Enable Growth features in stealth mode (T276816) (duration: 01m 07s)
12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P14734 and previous config saved to /var/cache/conftool/dbconfig/20210310-124140-marostegui.json
12:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
12:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14733 and previous config saved to /var/cache/conftool/dbconfig/20210310-123654-root.json
12:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
12:34 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.34/languages: Add shy name (same as shy-latn) (T259360) (duration: 01m 10s)
12:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
12:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
12:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
12:32 ariel@cumin1001: START - Cookbook sre.cassandra.roll-restart
12:31 ariel@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
12:31 ariel@cumin1001: START - Cookbook sre.cassandra.roll-restart
12:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
12:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
12:22 ladsgroup@deploy1002: Synchronized php-1.36.0-wmf.33/languages: Add shy name (same as shy-latn) (T259360) (duration: 01m 10s)
12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14732 and previous config saved to /var/cache/conftool/dbconfig/20210310-122150-root.json
12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
12:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
12:12 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Update several Wikidata-related configs (duration: 01m 32s)
12:09 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
12:07 klausman@cumin1001: START - Cookbook sre.dns.netbox
12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14731 and previous config saved to /var/cache/conftool/dbconfig/20210310-120647-root.json
11:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1024.eqiad.wmnet
11:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
11:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on aqs1010.eqiad.wmnet with reason: New buster host
11:34 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1013.eqiad.wmnet
11:29 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1013.eqiad.wmnet
11:27 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1024.eqiad.wmnet
11:25 kormat@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: schema change T267767', diff saved to https://phabricator.wikimedia.org/P14730 and previous config saved to /var/cache/conftool/dbconfig/20210310-112553-kormat.json
11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P14729 and previous config saved to /var/cache/conftool/dbconfig/20210310-112427-marostegui.json
11:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14728 and previous config saved to /var/cache/conftool/dbconfig/20210310-111903-root.json
11:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
11:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2039.codfw.wmnet
11:10 kormat@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: schema change T267767', diff saved to https://phabricator.wikimedia.org/P14727 and previous config saved to /var/cache/conftool/dbconfig/20210310-111049-kormat.json
11:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2039.codfw.wmnet
11:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1028.eqiad.wmnet
11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14726 and previous config saved to /var/cache/conftool/dbconfig/20210310-110359-root.json
11:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
11:00 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1028.eqiad.wmnet
10:55 kormat@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: schema change T267767', diff saved to https://phabricator.wikimedia.org/P14725 and previous config saved to /var/cache/conftool/dbconfig/20210310-105545-kormat.json
10:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1023.eqiad.wmnet
10:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2037.codfw.wmnet
10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14724 and previous config saved to /var/cache/conftool/dbconfig/20210310-104856-root.json
10:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2037.codfw.wmnet
10:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2036.codfw.wmnet
10:40 kormat@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: schema change T267767', diff saved to https://phabricator.wikimedia.org/P14723 and previous config saved to /var/cache/conftool/dbconfig/20210310-104042-kormat.json
10:40 effie: upgrade memcached on mc2019, mc1019
10:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2036.codfw.wmnet
10:38 kormat@cumin1001: dbctl commit (dc=all): 'db1168 depooling: schema change T267767', diff saved to https://phabricator.wikimedia.org/P14722 and previous config saved to /var/cache/conftool/dbconfig/20210310-103836-kormat.json
10:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1168.eqiad.wmnet with reason: schema change T267767
10:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1168.eqiad.wmnet with reason: schema change T267767
10:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
10:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
10:29 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1023.eqiad.wmnet
10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P14721 and previous config saved to /var/cache/conftool/dbconfig/20210310-101922-marostegui.json
10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2034.codfw.wmnet
10:12 marostegui: Drop testreduce_vd from m5 master - T276787
10:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2034.codfw.wmnet
10:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2033.codfw.wmnet
09:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2033.codfw.wmnet
09:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2032.codfw.wmnet
09:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2032.codfw.wmnet
09:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2031.codfw.wmnet
09:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2031.codfw.wmnet
09:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2030.codfw.wmnet
09:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2030.codfw.wmnet
09:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2029.codfw.wmnet
09:25 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: REIMAGE
09:23 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: REIMAGE
09:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2029.codfw.wmnet
09:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
08:39 marostegui: Upgrade mysql and kernel on db2132
08:25 marostegui: Upgrade mysql and kernel on db2078
08:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thorium.eqiad.wmnet
08:20 moritzm: pruning obsolete kernels from ganeti hosts in eqiad/codfw
08:17 moritzm: powercycling thorium, stuck on reboot
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14719 and previous config saved to /var/cache/conftool/dbconfig/20210310-081627-root.json
08:11 marostegui: Check tables on db1150:3315 - T276742
08:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host thorium.eqiad.wmnet
08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics-tool1001.eqiad.wmnet
08:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host analytics-tool1001.eqiad.wmnet
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14718 and previous config saved to /var/cache/conftool/dbconfig/20210310-080123-root.json
07:52 marostegui: Deploy schema change on s7 codfw (lag will appear) T276150 T276156
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14717 and previous config saved to /var/cache/conftool/dbconfig/20210310-074618-root.json
07:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
07:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change', diff saved to https://phabricator.wikimedia.org/P14716 and previous config saved to /var/cache/conftool/dbconfig/20210310-072642-marostegui.json
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P14715 and previous config saved to /var/cache/conftool/dbconfig/20210310-072508-marostegui.json
07:07 elukey: sudo apt-get remove linux-image-4.9.0-9-amd64 on sodium to free space for /boot
07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2145', diff saved to https://phabricator.wikimedia.org/P14714 and previous config saved to /var/cache/conftool/dbconfig/20210310-070642-marostegui.json
07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P14713 and previous config saved to /var/cache/conftool/dbconfig/20210310-070312-marostegui.json
07:01 elukey: remove the oldest kernel on ganeti nodes to free space for /boot
07:00 marostegui: Depool clouddb1016
06:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1111.eqiad.wmnet with reason: REIMAGE
06:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1111.eqiad.wmnet with reason: REIMAGE
06:17 elukey: reimage an-worker1111 to buster
05:27 ryankemper: T266470 Rollout of updated certificate complete. We're now ready to implement envoy for `wdqs-test` which will allow `wdqs1009` to be reachable via port 443 and thereby allow us to go live with `query-preview.wikidata.org` when the time comes
05:26 ryankemper: T266470 `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo enable-puppet "revoking old cert and generating new one with new alt_names - T266470 - root"'` and `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo run-puppet-agent'`
05:24 ryankemper: T266470 Test queries passing on `wdqs1004`, and `https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&var-cluster_name=wdqs&from=now-1h&to=now` looks as expected. Proceeding to rest of fleet
05:20 ryankemper: T266470 Enabled puppet on single public wdqs host to verify certificate update is without issue: `ryankemper@wdqs1004:~$ sudo enable-puppet "revoking old cert and generating new one with new alt_names - T266470 - root"` followed by `ryankemper@wdqs1004:~$ sudo run-puppet-agent`
05:15 ryankemper: T266470 [`/srv/private`] All changes commited to private git repo, commit SHA `ec1d6cfae8c72e4f807b343cdb9f25c27817d98d`
05:13 ryankemper: T266470 [`/srv/private`] `chown gitpuppet:gitpuppet` on all modified files (were owned by root, probably because I sudo'd - may be that a git commit hook would have caught that but explicitly chowning just to be safe)
05:06 ryankemper: T266470 New `wdqs.discovery.wmnet.crt` added to public `operations/puppet` repo: https://gerrit.wikimedia.org/r/c/operations/puppet/+/670337/
04:58 ryankemper: T266470 The above two actions mean that we're ready to generate the new certificate files. Proceeding: `sudo cergen -c 'wdqs.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d` on `ryankemper@puppetmaster1001:/srv/private`
04:57 ryankemper: T266470 `sudo rm -fv certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.crt.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.csr.pem certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.jks certificates/wdqs.discovery.wmnet/wdqs.discovery.wmnet.keystore.p12 certificates/wdqs.discovery.wmnet/truststore.jks` (full paths not provided to fit the IRC line)
04:56 ryankemper: T266470 In the `/srv/private` repo, `/srv/private/modules/secret/secrets/certificates/certificate.manifests.d/wdqs.certs.yaml` has been edited to add the relevant `alt_names`
04:55 ryankemper: T266470 Certificate revoked: `ryankemper@puppetmaster1001:/srv/private$ sudo puppet cert clean wdqs.discovery.wmnet`
04:53 ryankemper: T266470 `ryankemper@cumin1001:~$ sudo -E cumin 'A:wdqs-all' 'sudo disable-puppet "revoking old cert and generating new one with new alt_names - T266470"'`
04:52 ryankemper: T266470 Temporarily disabling puppet on all `wdqs*` hosts in preparation for `wdqs.discovery.wmnet` certificate revocation
01:08 krinkle@deploy1002: Synchronized php-1.36.0-wmf.34/extensions/NavigationTiming/modules/ext.navigationTiming.js: T276826 Ibd9ddf14d64 (duration: 01m 14s)
00:02 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: REIMAGE
00:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: REIMAGE

2021-03-09

23:59 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1002.eqiad.wmnet with reason: REIMAGE
23:58 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1001.eqiad.wmnet with reason: REIMAGE
22:04 mutante: phab1001 - manually running phab public task dumd script after making changes to redirect stdout
20:42 elukey: reimaged an-worker1091 to buster
20:41 bstorm: depooled labsdb1009 T276980
20:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE
20:25 bstorm: downtimed labsdb1009 so it doesn't keep paging T276980
20:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1091.eqiad.wmnet with reason: REIMAGE
20:09 brennen: train status: 1.36.0-wmf.32 (T274938) on group0 at 20:06:32 UTC; logs initially quiet.
20:06 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.34
19:05 brennen@deploy1002: Pruned MediaWiki: 1.36.0-wmf.31 (duration: 03m 34s)
19:04 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:59 pt1979@cumin2001: START - Cookbook sre.dns.netbox
18:54 brennen@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.34 (duration: 47m 25s)
18:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE
18:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1087.eqiad.wmnet with reason: REIMAGE
18:47 dcausse: re-pool wdqs1004
18:37 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:35 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:34 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
18:26 elukey: reimage an-worker1087 to buster
18:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
18:13 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
18:12 brennen@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.34
18:10 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:05 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:03 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
18:02 marxarelli: deleting shut down memc* deployment-prep instances to free up quota for replacement db instances (T276968)
18:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE
18:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1085.eqiad.wmnet with reason: REIMAGE
17:50 papaul: rebooting db2073 for firmware upgrade
17:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE
17:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3119d7a: sqwiki: Fix deployment of Growth features (duration: 01m 00s)
16:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1077.eqiad.wmnet with reason: REIMAGE
16:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:41 pt1979@cumin2001: START - Cookbook sre.dns.netbox
16:40 elukey: reimage analytics1077 to buster
16:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1027.eqiad.wmnet
16:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
16:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
16:31 brennen: 1.36.0-wmf.34 was branched at e175899 for T274938
16:27 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1027.eqiad.wmnet
16:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14708 and previous config saved to /var/cache/conftool/dbconfig/20210309-162116-root.json
16:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 80%: 10', diff saved to https://phabricator.wikimedia.org/P14707 and previous config saved to /var/cache/conftool/dbconfig/20210309-160613-root.json
15:56 moritzm: imported prometheus-ircd-exporter 0.2 to apt.wikimedia.org T224579
15:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14706 and previous config saved to /var/cache/conftool/dbconfig/20210309-155109-root.json
15:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1072.eqiad.wmnet with reason: REIMAGE
15:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1072.eqiad.wmnet with reason: REIMAGE
15:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repooling db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P14705 and previous config saved to /var/cache/conftool/dbconfig/20210309-153715-root.json
15:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 40%: 10', diff saved to https://phabricator.wikimedia.org/P14704 and previous config saved to /var/cache/conftool/dbconfig/20210309-153605-root.json
15:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1008.eqiad.wmnet
15:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1008.eqiad.wmnet
15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1007.eqiad.wmnet
15:28 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare KaiOS / Inuka event streams - T267344 T267345 T267346 (duration: 00m 58s)
15:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 60%: Repooling db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P14703 and previous config saved to /var/cache/conftool/dbconfig/20210309-152212-root.json
15:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14702 and previous config saved to /var/cache/conftool/dbconfig/20210309-152102-root.json
15:20 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: WikimediaEvents: Bump session_tick sampling rate to 10% (duration: 00m 58s)
15:18 elukey: reimage analytics1072 (hadoop hdfs journal node) to buster
15:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1007.eqiad.wmnet
15:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1006.eqiad.wmnet
15:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1006.eqiad.wmnet
15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 30%: Repooling db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P14701 and previous config saved to /var/cache/conftool/dbconfig/20210309-150708-root.json
15:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 20%: 10', diff saved to https://phabricator.wikimedia.org/P14700 and previous config saved to /var/cache/conftool/dbconfig/20210309-150558-root.json
15:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1005.eqiad.wmnet
14:56 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1005.eqiad.wmnet
14:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1089.eqiad.wmnet with reason: REIMAGE
14:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1090.eqiad.wmnet with reason: REIMAGE
14:53 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1089.eqiad.wmnet with reason: REIMAGE
14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: Repooling db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P14699 and previous config saved to /var/cache/conftool/dbconfig/20210309-145205-root.json
14:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1090.eqiad.wmnet with reason: REIMAGE
14:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2008.codfw.wmnet
14:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2008.codfw.wmnet
14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P14698 and previous config saved to /var/cache/conftool/dbconfig/20210309-143453-marostegui.json
14:32 volker-e@deploy1002: Finished deploy [design/style-guide@deee49c]: Deploy design/style-guide: deee49c index: Add links to our design process and work guides (#446) (duration: 00m 06s)
14:32 volker-e@deploy1002: Started deploy [design/style-guide@deee49c]: Deploy design/style-guide: deee49c index: Add links to our design process and work guides (#446)
14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1015.eqiad.wmnet with reason: REIMAGE
14:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2007.codfw.wmnet
14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P14697 and previous config saved to /var/cache/conftool/dbconfig/20210309-143033-root.json
14:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1014.eqiad.wmnet with reason: REIMAGE
14:29 elukey: drain + reimage an-worker1090/89 to Buster
14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1015.eqiad.wmnet with reason: REIMAGE
14:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
14:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2007.codfw.wmnet
14:27 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1014.eqiad.wmnet with reason: REIMAGE
14:26 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
14:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2006.codfw.wmnet
14:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2006.codfw.wmnet
14:17 jakob@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 60%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P14696 and previous config saved to /var/cache/conftool/dbconfig/20210309-141529-root.json
14:14 jakob@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
14:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2005.codfw.wmnet
14:12 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
14:10 moritzm: installing intel-microcode updates on stretch
14:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2005.codfw.wmnet
14:08 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
14:07 jgleeson: updated smashpig from 5a69abd40f to 58b070db1a
14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P14694 and previous config saved to /var/cache/conftool/dbconfig/20210309-140025-root.json
13:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1004.eqiad.wmnet
13:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1102.eqiad.wmnet with reason: REIMAGE
13:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1080.eqiad.wmnet with reason: REIMAGE
13:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1102.eqiad.wmnet with reason: REIMAGE
13:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1080.eqiad.wmnet with reason: REIMAGE
13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P14693 and previous config saved to /var/cache/conftool/dbconfig/20210309-134522-root.json
13:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1004.eqiad.wmnet
13:34 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: HW issue
13:34 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: HW issue
13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14692 and previous config saved to /var/cache/conftool/dbconfig/20210309-133124-root.json
13:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1003.eqiad.wmnet
13:27 elukey: reimage an-worker1102 and an-worker1080 (hdfs journal node) to Buster
13:21 jgleeson: updated payments-wiki from 65dbf0ed9d to 0e7800027a
13:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P14691 and previous config saved to /var/cache/conftool/dbconfig/20210309-131652-marostegui.json
13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14690 and previous config saved to /var/cache/conftool/dbconfig/20210309-131620-root.json
13:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1003.eqiad.wmnet
13:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1103.eqiad.wmnet with reason: REIMAGE
13:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1103.eqiad.wmnet with reason: REIMAGE
13:03 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE
13:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE
13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14689 and previous config saved to /var/cache/conftool/dbconfig/20210309-130116-root.json
12:59 elukey: drain + reimage an-worker1103 to Buster
12:59 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE
12:57 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE
12:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw1403.eqiad.wmnet
12:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw1402.eqiad.wmnet
12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P14688 and previous config saved to /var/cache/conftool/dbconfig/20210309-125007-marostegui.json
12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14687 and previous config saved to /var/cache/conftool/dbconfig/20210309-124931-root.json
12:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mw1403.eqiad.wmnet
12:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mw1402.eqiad.wmnet
12:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 60%: 10', diff saved to https://phabricator.wikimedia.org/P14686 and previous config saved to /var/cache/conftool/dbconfig/20210309-123427-root.json
12:33 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1038.eqiad.wmnet
12:31 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
12:30 hnowlan: regenerating interfaces and reimaging aqs101[1-5]
12:29 marostegui: Upgrade db2084 kernel
12:26 marostegui: Upgrade db2094 kernel
12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14685 and previous config saved to /var/cache/conftool/dbconfig/20210309-121924-root.json
12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1166 entirely', diff saved to https://phabricator.wikimedia.org/P14684 and previous config saved to /var/cache/conftool/dbconfig/20210309-121913-marostegui.json
12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P14683 and previous config saved to /var/cache/conftool/dbconfig/20210309-121849-root.json
12:16 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/GrowthExperiments/: dbd6f0c: Make help panel fallback to help desk if no mentor is available (T275908; T273782) (duration: 01m 01s)
12:13 marostegui: Upgrade db2080 kernel
12:06 marostegui: Upgrade db2077 kernel
12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1173 for schema change', diff saved to https://phabricator.wikimedia.org/P14682 and previous config saved to /var/cache/conftool/dbconfig/20210309-120326-marostegui.json
12:00 marostegui: Upgrade db2076 kernel
11:56 effie: restart envoy on mw1276
11:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1010.eqiad.wmnet with reason: REIMAGE
11:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1010.eqiad.wmnet with reason: REIMAGE
11:52 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw1307.eqiad.wmnet
11:42 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2004.codfw.wmnet
11:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1003.eqiad.wmnet
11:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mw1307.eqiad.wmnet
11:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mwdebug1003.eqiad.wmnet
11:29 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:29 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc1001.eqiad.wmnet
11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host doc1001.eqiad.wmnet
11:25 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2004.codfw.wmnet
11:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1001.eqiad.wmnet
11:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host webperf1001.eqiad.wmnet
11:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1002.eqiad.wmnet
11:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host webperf1002.eqiad.wmnet
11:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2002.codfw.wmnet
11:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host webperf2002.codfw.wmnet
11:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2001.codfw.wmnet
11:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host webperf2001.codfw.wmnet
11:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1037.eqiad.wmnet
10:56 moritzm: installing mariadb-10.1 updates for stretch (distro version with libs/tools only, not wmf-mariadb)
10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1037.eqiad.wmnet
10:53 dcausse: started to import lexemes on wdqs1009 (T276784)
10:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2003.codfw.wmnet
10:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:45 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
10:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2020-2027].codfw.wmnet
10:36 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2003.codfw.wmnet
10:31 moritzm: upgrading perf on stretch hosts
10:30 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
10:23 moritzm: installing gdisk security updates
10:15 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
10:14 moritzm: installing libbsd security updates
10:07 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[2020-2027].codfw.wmnet
10:00 moritzm: installing busybox security updates
09:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1001.eqiad.wmnet
09:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetboard1001.eqiad.wmnet
09:50 marostegui: Reboot db2073 for kernel upgrade (stretch)
09:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2001.codfw.wmnet
09:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetboard2001.codfw.wmnet
09:44 marostegui: Reboot db2072 for kernel upgrade (stretch)
09:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1112.eqiad.wmnet with reason: REIMAGE
09:40 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1112.eqiad.wmnet with reason: REIMAGE
09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1076.eqiad.wmnet with reason: REIMAGE
09:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1076.eqiad.wmnet with reason: REIMAGE
09:14 elukey: drain + reimage analytics1076 and an-worker1112 to Buster
09:01 moritzm: installing Linux 4.9.258 updates on Stretch hosts
08:59 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2017-2019].codfw.wmnet
08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1093.eqiad.wmnet with reason: REIMAGE
08:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1092.eqiad.wmnet with reason: REIMAGE
08:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1093.eqiad.wmnet with reason: REIMAGE
08:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1092.eqiad.wmnet with reason: REIMAGE
08:46 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[2017-2019].codfw.wmnet
08:46 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
08:46 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
08:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
08:26 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be2016.codfw.wmnet
08:12 marostegui: Stop mysql on clouddb1015:3314, 3316
07:59 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be2016.codfw.wmnet
07:50 dcausse: restarted blazegraph on wdqs1004 and depooled it to catchup lag
07:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1095.eqiad.wmnet with reason: REIMAGE
07:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1094.eqiad.wmnet with reason: REIMAGE
07:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1095.eqiad.wmnet with reason: REIMAGE
07:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1094.eqiad.wmnet with reason: REIMAGE
07:24 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
07:03 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:01 elukey: drain + reimage an-worker109[4,5] to Buster
06:58 elukey@cumin1001: START - Cookbook sre.dns.netbox
06:30 _joe_: restarting gerrit on gerrit1001, using 48 GB of heap
06:19 marostegui: Deploy schema change on s6 codfw (there will be lag on codfw) T276150 T276156
05:37 marostegui: Stop mysql on clouddb1014:3312, 3317 to transfer its data to cloudb1021
05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for table check T276742', diff saved to https://phabricator.wikimedia.org/P14675 and previous config saved to /var/cache/conftool/dbconfig/20210309-051646-marostegui.json
00:58 Krinkle: krinkle@mwmaint1002 Ran invalidateUserSesssions.php for one user
00:13 urbanecm@deploy1002: Synchronized wmf-config/config/incubatorwiki.yaml: 0d260ed: Enable modern Vector on incubator (T275479; 2/2) (duration: 00m 57s)
00:11 urbanecm@deploy1002: Synchronized dblists/desktop-improvements.dblist: 0d260ed: Enable modern Vector on incubator (T275479; 1/2) (duration: 01m 01s)
00:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ce82e0c: Logo updates (T273085) (duration: 00m 58s)
00:08 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: ce82e0c: Logo updates (T273085) (duration: 00m 58s)

2021-03-08

22:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1005.eqiad.wmnet with reason: REIMAGE
22:34 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1005.eqiad.wmnet with reason: REIMAGE
21:42 mholloway-shell@deploy1002: Synchronized wmf-config/CommonSettings.php: WikimediaEvents: Create data QA group/right on testwiki (T276515) (duration: 00m 57s)
21:18 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate Editing schemas to Event Platform on all wikis - T267343, T267353 (duration: 00m 58s)
21:04 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate Editing schemas to Event Platform on testwiki, take 2 - T267343, T267353 (duration: 00m 58s)
20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1227e2a: idwiki: Growth features: Add mentorlist (T259024) (duration: 00m 58s)
20:44 legoktm: legoktm@registry1004:~$ sudo systemctl reset-failed # to fix icinga warning
20:43 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: REIMAGE
20:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: REIMAGE
20:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5ce7b46: Set wgGEHelpPanelAskMentor to true by default (T275908) (duration: 01m 07s)
20:32 bblack: miscweb[12]002 - re-enabled puppet and deployed new cert
20:23 bblack: miscweb[12]002 - disabling puppet to remake cergen cert...
19:55 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate Editing schemas to Event Platform on testwiki - T267343, T267353 (duration: 00m 57s)
19:47 dduvall@deploy1002: Synchronized php-1.36.0-wmf.33/maintenance/: maintenance: aa6f291: 4893ddb: fa97162: 380c448: DB_NONE offline maintenance improvements (duration: 00m 58s)
19:37 dduvall@deploy1002: Synchronized wmf-config/: wmf-config/env.php,CommonSettings.php: f70049b: e53dc3a: f9b9ea1: WMF_DATACENTER, WMF_MAINTENANCE_OFFLINE handling (duration: 01m 00s)
19:37 bblack: cp-text: banning varnish-fe for req.http.host == ( 7 wikis from T274784 )
19:21 urbanecm@deploy1002: Synchronized wmf-config/config/: 1c46d0b: 1aad60b: vector: Expand Desktop Improvements pilot wiki group (T273090) (duration: 00m 58s)
19:20 urbanecm@deploy1002: Synchronized dblists/desktop-improvements.dblist: 1c46d0b: 1aad60b: vector: Expand Desktop Improvements pilot wiki group (T273090) (duration: 00m 57s)
19:14 bblack: cp-text: disabling puppet ahead of T274784 changes - https://gerrit.wikimedia.org/r/c/operations/puppet/+/669840
19:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e1cb988: Enable flood flag on hrwiki (T276560) (duration: 00m 58s)
18:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a855800: Fix sqwiki help panel links description (T275550) (duration: 00m 58s)
18:47 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: dfd9588: hiwiki: Add missing help panel link descriptions (T276450) (duration: 00m 58s)
18:37 robh@cumin1001: START - Cookbook sre.dns.netbox
18:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1116.eqiad.wmnet with reason: REIMAGE
18:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1116.eqiad.wmnet with reason: REIMAGE
18:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1115.eqiad.wmnet with reason: REIMAGE
18:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:31 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1115.eqiad.wmnet with reason: REIMAGE
18:29 robh@cumin1001: START - Cookbook sre.dns.netbox
18:11 elukey: drain + reimage an-worker11[15,16] to Buster
17:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1114.eqiad.wmnet with reason: REIMAGE
17:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1113.eqiad.wmnet with reason: REIMAGE
17:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1114.eqiad.wmnet with reason: REIMAGE
17:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1113.eqiad.wmnet with reason: REIMAGE
17:12 elukey: drain + reimage an-worker11[13,14] to Buster
16:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1110.eqiad.wmnet with reason: REIMAGE
16:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1109.eqiad.wmnet with reason: REIMAGE
16:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1110.eqiad.wmnet with reason: REIMAGE
16:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1109.eqiad.wmnet with reason: REIMAGE
16:17 elukey: drain + reimage an-worker1109/1110 to Buster
15:55 marostegui: Restart db1115 (tendril host)
15:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14669 and previous config saved to /var/cache/conftool/dbconfig/20210308-154710-root.json
15:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P14666 and previous config saved to /var/cache/conftool/dbconfig/20210308-153207-root.json
15:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1108.eqiad.wmnet with reason: REIMAGE
15:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P14665 and previous config saved to /var/cache/conftool/dbconfig/20210308-151703-root.json
15:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1108.eqiad.wmnet with reason: REIMAGE
15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1107.eqiad.wmnet with reason: REIMAGE
15:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1107.eqiad.wmnet with reason: REIMAGE
15:07 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate PrefUpdate to EventGate on all wikis - T267348 (duration: 00m 59s)
15:02 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove wgEventLoggingSchemas overrides for Growth and WMDE Tech wishes schemas - T267333, etc. (duration: 00m 59s)
15:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P14664 and previous config saved to /var/cache/conftool/dbconfig/20210308-150159-root.json
14:54 elukey: drain + reimage an-worker110[7,8] to Buster
14:51 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
14:51 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
14:48 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
14:48 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
14:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1106.eqiad.wmnet with reason: REIMAGE
14:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1105.eqiad.wmnet with reason: REIMAGE
14:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1106.eqiad.wmnet with reason: REIMAGE
14:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1105.eqiad.wmnet with reason: REIMAGE
13:51 elukey: drain + reimage an-worker110[4,5] to Buster
13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14663 and previous config saved to /var/cache/conftool/dbconfig/20210308-130712-root.json
12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P14662 and previous config saved to /var/cache/conftool/dbconfig/20210308-125208-root.json
12:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P14661 and previous config saved to /var/cache/conftool/dbconfig/20210308-123704-root.json
12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P14660 and previous config saved to /var/cache/conftool/dbconfig/20210308-122201-root.json
12:20 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/GrowthExperiments/includes/Mentorship/MentorHooks.php: 48d6c55: MentorHooks: Make mentor assignment follow same rules as HomepageHooks (T276720) (duration: 00m 58s)
11:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1088.eqiad.wmnet with reason: REIMAGE
11:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1088.eqiad.wmnet with reason: REIMAGE
10:41 elukey: drain + reimage an-worker1104/1089 to Debian Buster
10:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1084.eqiad.wmnet with reason: REIMAGE
10:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1083.eqiad.wmnet with reason: REIMAGE
10:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1084.eqiad.wmnet with reason: REIMAGE
10:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1083.eqiad.wmnet with reason: REIMAGE
10:01 marostegui: Repool clouddb1013:3311, clouddb1013:3313
09:55 _joe_: uploading new versions of docker images: php7.{2,3}-{cli,fpm}, httpd, httpd-fcgi, mediawiki-httpd, memcached T276097 T265327
09:34 _joe_: manually removed the old graphoid IP from scb server's interfaces (long-standing bug in wikimedia-lvs-realserver when removing the last managed IP)
09:19 elukey: drain + reimage an-worker108[3,4] to Buster
09:17 _joe_: regenerating puppet certs for scb200{1,2}
08:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1082.eqiad.wmnet with reason: REIMAGE
08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1081.eqiad.wmnet with reason: REIMAGE
08:53 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1082.eqiad.wmnet with reason: REIMAGE
08:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1081.eqiad.wmnet with reason: REIMAGE
08:21 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
08:20 elukey: drain + reimage an-worker108[1,2] to Buster
07:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1075.eqiad.wmnet with reason: REIMAGE
07:47 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1075.eqiad.wmnet with reason: REIMAGE
07:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1074.eqiad.wmnet with reason: REIMAGE
07:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1074.eqiad.wmnet with reason: REIMAGE
07:32 marostegui: Depool clouddb1013:3311, clouddb1013:3313 - T269211
07:23 elukey: drain + reimage analytics107[4,5] to Buster
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14657 and previous config saved to /var/cache/conftool/dbconfig/20210308-071443-root.json
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P14656 and previous config saved to /var/cache/conftool/dbconfig/20210308-065939-root.json
06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2116 T275633', diff saved to https://phabricator.wikimedia.org/P14655 and previous config saved to /var/cache/conftool/dbconfig/20210308-065300-marostegui.json
06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2092 T275633', diff saved to https://phabricator.wikimedia.org/P14654 and previous config saved to /var/cache/conftool/dbconfig/20210308-065220-marostegui.json
06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2146 T275633', diff saved to https://phabricator.wikimedia.org/P14653 and previous config saved to /var/cache/conftool/dbconfig/20210308-064953-marostegui.json
06:44 marostegui: Set innodb_change_buffering = none on all parsercache hosts T263443
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P14652 and previous config saved to /var/cache/conftool/dbconfig/20210308-064436-root.json
06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 T276742', diff saved to https://phabricator.wikimedia.org/P14651 and previous config saved to /var/cache/conftool/dbconfig/20210308-063700-marostegui.json
06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P14650 and previous config saved to /var/cache/conftool/dbconfig/20210308-062932-root.json
06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 T276742', diff saved to https://phabricator.wikimedia.org/P14649 and previous config saved to /var/cache/conftool/dbconfig/20210308-062350-marostegui.json
06:21 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)

2021-03-07

08:01 elukey: "megacli -LDSetProp -ForcedWB -Immediate -Lall -aAll" on analytics1066 - BBU looks fine, but the raid controller was using WriteThrough

2021-03-05

23:16 legoktm: imported pygments 2.8.0+dfsg-1 to apt.wm.o buster-wikimedia component/pygments (T276298)
21:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:32 pt1979@cumin2001: START - Cookbook sre.dns.netbox
21:01 legoktm: updated udplog to 1.9 on mwlog1002.eqiad.wmnet and mwlog2002.codfw.wmnet
20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts deploy1001.eqiad.wmnet
20:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts deploy1001.eqiad.wmnet
20:15 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2002.codfw.wmnet
20:15 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2001.codfw.wmnet
20:12 legoktm@deploy1002: conftool action : set/pooled=yes; selector: name=registry2004.codfw.wmnet
20:04 legoktm@deploy1002: conftool action : set/weight=10; selector: name=registry2004.codfw.wmnet
20:04 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2004.codfw.wmnet
20:02 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2004.codfw.wmnet
19:30 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2004.codfw.wmnet
19:14 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2004.codfw.wmnet
19:04 mutante: phab1001 - running public_task_dump.py (from cron job) manually
18:50 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry2004.eqiad.wmnet
18:45 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry2004.eqiad.wmnet
18:45 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1021.eqiad.wmnet with reason: REIMAGE
18:43 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1021.eqiad.wmnet with reason: REIMAGE
18:23 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:18 razzi@cumin1001: START - Cookbook sre.dns.netbox
16:58 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:54 effie: depool mw1276 and pool back
16:53 razzi@cumin1001: START - Cookbook sre.dns.netbox
16:48 razzi: edit https://netbox.wikimedia.org/dcim/devices/2078/ device name from labsdb1012 to clouddb1021
16:36 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1036.eqiad.wmnet
16:30 razzi: delete non-mgmt interfaces for labsdb1012 at https://netbox.wikimedia.org/dcim/devices/2078/interfaces/
16:28 razzi: rename https://netbox.wikimedia.org/ipam/ip-addresses/734/ DNS name from labsdb1012.mgmt.eqiad.wmnet to clouddb1021.mgmt.eqiad.wmnet
16:22 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1036.eqiad.wmnet
16:17 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1012.eqiad.wmnet
16:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1086.eqiad.wmnet with reason: REIMAGE
16:09 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1086.eqiad.wmnet with reason: REIMAGE
16:07 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1012.eqiad.wmnet
15:56 razzi: stop mariadb on labsdb1012 to reimage and rename to clouddb1021: T269211
15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1073.eqiad.wmnet with reason: REIMAGE
15:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1073.eqiad.wmnet with reason: REIMAGE
15:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
15:07 elukey: drain + reimage analytics1073 and an-worker1086 to Debian Buster
14:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:20 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
13:59 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
13:52 marostegui: Rebuild some indexes on db2102
13:38 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
13:38 marostegui@cumin1001: dbctl commit (dc=all): 'DEpool db1134', diff saved to https://phabricator.wikimedia.org/P14644 and previous config saved to /var/cache/conftool/dbconfig/20210305-133833-marostegui.json
13:24 marostegui: Check tables on db1134
12:31 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1035.eqiad.wmnet
12:24 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1035.eqiad.wmnet
11:28 marostegui: Temporarily set innodb_change_buffering = none on db1134 (s1) - T263443
11:09 marostegui: Run check table on db2092, db2116, db2145, db2146 (there will be lag)
10:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1034.eqiad.wmnet
10:47 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1034.eqiad.wmnet
10:43 jakob@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
10:38 jakob@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
10:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1033.eqiad.wmnet
10:25 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1033.eqiad.wmnet
09:54 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
09:52 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
09:50 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
09:45 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
09:31 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
09:31 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
09:31 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
09:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1078.eqiad.wmnet with reason: REIMAGE
09:28 jayme: switched back active kubernetes staging cluster to eqiad
09:28 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
09:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1078.eqiad.wmnet with reason: REIMAGE
09:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1079.eqiad.wmnet with reason: REIMAGE
09:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1079.eqiad.wmnet with reason: REIMAGE
09:21 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ms-be1034.eqiad.wmnet
09:19 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
09:12 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be1034.eqiad.wmnet
08:44 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2003.codfw.wmnet with reason: REIMAGE
08:42 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2003.codfw.wmnet with reason: REIMAGE
08:32 elukey: drain + reimage an-worker107[8,9] to Debian Buster
08:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1071.eqiad.wmnet with reason: REIMAGE
07:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1070.eqiad.wmnet with reason: REIMAGE
07:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1071.eqiad.wmnet with reason: REIMAGE
07:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1070.eqiad.wmnet with reason: REIMAGE
07:33 elukey: drain + reimage analytics107[0-1] to debian buster
06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092', diff saved to https://phabricator.wikimedia.org/P14640 and previous config saved to /var/cache/conftool/dbconfig/20210305-065137-marostegui.json
06:17 legoktm: uploaded udplog 1.9 (buster-wikimedia) to apt.wikimedia.org (T276421)
00:59 legoktm: depooled registry1001/registry1002 (old stretch VMs) - T272550
00:59 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry1002.eqiad.wmnet
00:58 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry1001.eqiad.wmnet
00:58 legoktm@deploy1002: conftool action : set/pooled=yes; selector: name=registry1004.eqiad.wmnet
00:57 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry1004.eqiad.wmnet
00:57 legoktm@deploy1002: conftool action : set/pooled=inactive; selector: name=registry1004.eqiad.codfw
00:56 legoktm@deploy1002: conftool action : set/weight=10; selector: name=registry1004.eqiad.wmnet
00:55 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry1004.eqiad.codfw
00:50 ryankemper: T266470 [ats] `sudo cumin 'A:cp-ats' 'sudo run-puppet-agent'`
00:47 ryankemper: T266470 [ats] Deploying new mappings for `query-preview.wikidata.org` microsite: https://gerrit.wikimedia.org/r/c/operations/puppet/+/668173/
00:41 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@4cc913e]: correct refinery-drop-older-than checksum (duration: 01m 34s)
00:39 ryankemper: T266470 Ran `sudo run-puppet-agent` on `miscweb1002` without issue; `/var/log/apache2/query*.log` looks as expected
00:39 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@4cc913e]: correct refinery-drop-older-than checksum
00:36 ryankemper: T266470 Deploying new `query-preview` microsite: https://gerrit.wikimedia.org/r/c/operations/puppet/+/668543
00:23 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2004.eqiad.wmnet
00:06 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2004.eqiad.wmnet

2021-03-04

23:55 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry1004.eqiad.wmnet
23:39 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry1004.eqiad.wmnet
20:12 urbanecm@deploy1002: Synchronized wmf-config/config/hiwiki.yaml: c6b04cb: Enable Growth features on hiwiki in stealth mode (T276450; 3/3) (duration: 00m 58s)
20:11 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: c6b04cb: Enable Growth features on hiwiki in stealth mode (T276450; 2/3) (duration: 00m 57s)
20:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c6b04cb: Enable Growth features on hiwiki in stealth mode (T276450; 1/3) (duration: 00m 57s)
20:08 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/GrowthExperiments/includes/HomepageModules/Help.php: 8cc65e3: cleanup: Remove help panel URL from Help homepage module (T276450; T273118) (duration: 00m 58s)
19:33 rzl: restarted apache and php7.0-fpm on doc1001 due to staleness
19:21 urbanecm@deploy1002: Synchronized wmf-config/config/sqwiki.yaml: 377bc4f: Enable Growth features on sqwiki in stealth mode (T275550; 3/3) (duration: 00m 57s)
19:20 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 377bc4f: Enable Growth features on sqwiki in stealth mode (T275550; 2/3) (duration: 00m 57s)
19:19 dwisehaupt: replication restarted on frdb1004 after utf8mb4 conversion completed.
19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 377bc4f: Enable Growth features on sqwiki in stealth mode (T275550; 1/3) (duration: 00m 57s)
19:11 jforrester@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/FlaggedRevs/frontend/specialpages/reports/ProblemChanges.php: T276386 Fix fatal calls to getConfig (duration: 01m 12s)
19:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:59 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
18:26 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2003.codfw.wmnet with reason: REIMAGE
18:25 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2003.codfw.wmnet with reason: REIMAGE
17:39 mutante: [deneb:~] $ sudo systemctl start cowbuilder_update_jessie-amd64
17:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:20 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on deploy1001.eqiad.wmnet with reason: decom
17:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on deploy1001.eqiad.wmnet with reason: decom
17:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1032.eqiad.wmnet
16:59 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1032.eqiad.wmnet
16:56 tarrow@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
16:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1069.eqiad.wmnet with reason: REIMAGE
16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1068.eqiad.wmnet with reason: REIMAGE
16:54 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1069.eqiad.wmnet with reason: REIMAGE
16:53 tarrow@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
16:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1068.eqiad.wmnet with reason: REIMAGE
16:47 pt1979@cumin2001: START - Cookbook sre.dns.netbox
16:39 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1031.eqiad.wmnet
16:33 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1031.eqiad.wmnet
16:23 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
16:20 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
16:13 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1026.eqiad.wmnet
16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2145', diff saved to https://phabricator.wikimedia.org/P14635 and previous config saved to /var/cache/conftool/dbconfig/20210304-161226-marostegui.json
16:08 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1026.eqiad.wmnet
16:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1025.eqiad.wmnet
15:55 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1025.eqiad.wmnet
15:52 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
15:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1024.eqiad.wmnet
15:28 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
15:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1067.eqiad.wmnet with reason: REIMAGE
15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1066.eqiad.wmnet with reason: REIMAGE
15:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1067.eqiad.wmnet with reason: REIMAGE
15:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1066.eqiad.wmnet with reason: REIMAGE
15:21 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
15:12 elukey: drain + reimage analytics106[6,7] to Debian Buster
15:11 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1024.eqiad.wmnet
14:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1065.eqiad.wmnet with reason: REIMAGE
14:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1065.eqiad.wmnet with reason: REIMAGE
14:38 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:35 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:34 pt1979@cumin2001: START - Cookbook sre.dns.netbox
14:30 jayme@cumin1001: START - Cookbook sre.dns.netbox
14:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts neon.eqiad.wmnet
14:18 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts neon.eqiad.wmnet
14:15 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts neon.eqiad.wmnet
14:15 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts neon.eqiad.wmnet
14:04 liw@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.33
13:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1064.eqiad.wmnet with reason: REIMAGE
13:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1063.eqiad.wmnet with reason: REIMAGE
13:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1064.eqiad.wmnet with reason: REIMAGE
13:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1063.eqiad.wmnet with reason: REIMAGE
13:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2116', diff saved to https://phabricator.wikimedia.org/P14632 and previous config saved to /var/cache/conftool/dbconfig/20210304-134521-marostegui.json
13:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
13:44 volans: uploaded spicerack_0.0.49 to apt.wikimedia.org buster-wikimedia
13:35 moritzm: restarting mw canaries for libzstd update
13:32 elukey: drain + reimage analytics10[63,64] to Debian Buster
13:29 moritzm: installing libzstd security updates on Buster
13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2146 to dbctl T275633', diff saved to https://phabricator.wikimedia.org/P14631 and previous config saved to /var/cache/conftool/dbconfig/20210304-131301-marostegui.json
13:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1062.eqiad.wmnet with reason: REIMAGE
13:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1061.eqiad.wmnet with reason: REIMAGE
13:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1062.eqiad.wmnet with reason: REIMAGE
13:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1061.eqiad.wmnet with reason: REIMAGE
12:48 elukey: drain + reimage analytics10[61,62] to Debian Buster
12:45 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
12:40 mbsantos@deploy1002: Finished deploy [tilerator/deploy@6fcbb9f]: (no justification provided) (duration: 00m 14s)
12:40 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove conflicting gadget configuration for hewiki (T276330) (duration: 01m 12s)
12:40 mbsantos@deploy1002: Started deploy [tilerator/deploy@6fcbb9f]: (no justification provided)
12:34 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
12:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db1115.eqiad.wmnet,dbmonitor1001.wikimedia.org with reason: Restart db1115 to fix memory leak
12:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on db1115.eqiad.wmnet,dbmonitor1001.wikimedia.org with reason: Restart db1115 to fix memory leak
12:10 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
12:00 marostegui: Stop mysql on db1117:3321 to clone db1159
11:42 jakob@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2145 to s1 (and repool db2116) - T275633', diff saved to https://phabricator.wikimedia.org/P14625 and previous config saved to /var/cache/conftool/dbconfig/20210304-114052-marostegui.json
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2145 into dbctl depooled - T275633', diff saved to https://phabricator.wikimedia.org/P14624 and previous config saved to /var/cache/conftool/dbconfig/20210304-112848-marostegui.json
11:27 _joe_: restarted redis on mc2027 to pick up the replication change
11:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1059.eqiad.wmnet with reason: REIMAGE
11:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1059.eqiad.wmnet with reason: REIMAGE
11:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Needs fixing after T274472
11:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Needs fixing after T274472
11:08 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1022.eqiad.wmnet
11:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1060.eqiad.wmnet with reason: REIMAGE
11:02 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1022.eqiad.wmnet
11:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1060.eqiad.wmnet with reason: REIMAGE
10:40 elukey: drain + reimage analytics1059/1060 to Debian Buster
10:32 moritzm: uploaded screen 4.2.1-3+deb8u1+wmf1 to jessie-wikimedia
09:32 elukey: install linux 5.10 on an-worker[1097-1101] (GPU workers) and reboot them
09:30 kormat: disabling puppet on all db hosts while deploying a puppet monitoring change T275497
09:19 moritzm: uploaded udplog 1.8.5+deb10u1 to buster-wikimedia
08:45 elukey@deploy1002: Finished deploy [analytics/refinery@605f8b8]: Fix for geoeditors monthly job (duration: 11m 03s)
08:33 elukey@deploy1002: Started deploy [analytics/refinery@605f8b8]: Fix for geoeditors monthly job
07:38 elukey: reboot an-worker1096 to pick up 5.10 kernel
06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 T276025', diff saved to https://phabricator.wikimedia.org/P14622 and previous config saved to /var/cache/conftool/dbconfig/20210304-062503-marostegui.json
06:11 marostegui: Stop MySQL on db2116 to clone db2145 T275633
06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2116 T275633', diff saved to https://phabricator.wikimedia.org/P14621 and previous config saved to /var/cache/conftool/dbconfig/20210304-061134-marostegui.json
05:20 kart_: Updated apertium to 2021-03-03-170806-production (T274262)
05:15 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
05:11 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
05:10 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
01:24 twentyafterfour: phabricator upgrade complete
01:22 twentyafterfour: restarting php7.3-fpm on phab1001 to complete phabricator upgrade
00:02 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e47f735]: search_satisfaction_daily: make files readable by druid ingestion (duration: 25m 35s)

2021-03-03

23:36 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e47f735]: search_satisfaction_daily: make files readable by druid ingestion
23:08 legoktm@deploy1002: conftool action : set/pooled=yes; selector: name=registry2003.codfw.wmnet
22:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwmaint2001.codfw.wmnet
22:51 legoktm@deploy1002: conftool action : set/weight=10; selector: name=registry2003.codfw.wmnet
22:50 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry2003.codfw.wmnet
22:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwmaint2001.codfw.wmnet
22:05 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2003.codfw.wmnet
21:58 mutante: puppetmaster1001 - signing puppet cert for gitlab1001.wikmedia.org (T274459)
21:53 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@7f37d40]: replace refinery-drop-hive-partitions with refinery-drop-older-than (duration: 01m 37s)
21:51 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@7f37d40]: replace refinery-drop-hive-partitions with refinery-drop-older-than
21:50 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2003.codfw.wmnet
21:30 legoktm@deploy1002: conftool action : set/pooled=yes; selector: name=registry1003.eqiad.wmnet
21:25 legoktm@deploy1002: conftool action : set/weight=10; selector: name=registry1003.eqiad.wmnet
21:21 legoktm@deploy1002: conftool action : set/pooled=no; selector: name=registry1003.eqiad.wmnet
21:16 legoktm@deploy1002: conftool action : set/pooled=yes; selector: name=registry1002.eqiad.wmnet
20:35 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab1001.wikimedia.org
20:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Revert: Enable Growth features on sqwiki in stealth mode (T275550) (duration: 01m 10s)
20:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0120778: Enable Growth features on sqwiki in stealth mode (T275550) (duration: 01m 09s)
20:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-backup2002.codfw.wmnet with reason: REIMAGE
20:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup2002.codfw.wmnet with reason: REIMAGE
19:57 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/GrowthExperiments/: 4cba184: Help panel: Do not require help desk to be configured (T273118) (duration: 01m 10s)
19:53 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.32/extensions/GrowthExperiments/: a036d9f: Help panel: Do not require help desk to be configured (T273118) (duration: 01m 10s)
19:48 legoktm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry1003.eqiad.wmnet
19:41 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab1001.wikimedia.org
19:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7acb37c: dawiki: Deploy Growth features to newcomers (T256126) (duration: 01m 09s)
19:38 urbanecm@deploy1002: sync-file aborted: 7acb37c: dawiki: Deploy Growth features to newcomers (duration: 00m 03s)
19:33 legoktm@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry1003.eqiad.wmnet
19:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7221371: rowiki: Make Growth features available to ro newcomers (T275130) (duration: 01m 10s)
19:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
19:14 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/WikibaseMediaInfo/src/Special/SpecialMediaSearch.php: b741dc3: Also requet timestamp|snippet from non-page results (T271174; T276353) (duration: 01m 09s)
19:08 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
19:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-backup2001.codfw.wmnet with reason: REIMAGE
18:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup2001.codfw.wmnet with reason: REIMAGE
18:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
18:52 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
18:51 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
18:49 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
18:49 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
18:47 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
18:46 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
18:45 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
18:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts deploy2001.codfw.wmnet
18:43 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
18:42 legoktm: uploaded python3-docker-report 0.0.11 to buster-wikimedia
18:40 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
18:39 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
18:37 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
18:37 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
18:36 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
18:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts deploy2001.codfw.wmnet
18:32 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
18:30 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
18:30 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
18:29 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
18:27 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
18:27 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
18:26 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
18:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
18:26 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
18:25 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
18:24 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
18:24 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
18:23 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
18:22 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
18:21 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
18:20 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
18:17 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
18:17 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
18:17 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
18:17 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
18:16 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
18:16 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
18:15 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
18:15 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
18:12 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
18:09 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
17:56 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
17:56 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
17:49 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
17:49 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
17:46 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
17:31 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubestage1002.eqiad.wmnet with reason: REIMAGE
17:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1002.eqiad.wmnet with reason: REIMAGE
17:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubestage1001.eqiad.wmnet with reason: REIMAGE
17:29 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1001.eqiad.wmnet with reason: REIMAGE
17:16 dwisehaupt: correction for last log with correct host - stopping mysql replication on frdb1004 and starting utf8mb4 table alters under a root screen session
17:15 dwisehaupt: stopping mysql replication on frdb2001 and starting utf8mb4 table alters under a root screen session
17:14 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Set destination_event_serivce: eventgate-main for rdf-streaming-updater streams - T273901 (duration: 01m 08s)
17:13 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
17:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Resyncing database from scratch
17:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Resyncing database from scratch
17:09 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
16:40 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:36 dzahn@cumin1001: START - Cookbook sre.dns.netbox
16:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab1001.eqiad.wmnet
16:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab1001.eqiad.wmnet
16:29 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts gitlab1002.eqiad.wmnet
16:28 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: canary_events_enabled: true for rdf-streaming-updater streams - T273901 (duration: 01m 49s)
16:26 mutante: deleting gitlab VMs - we have to start over and decom old VMs, then create new VMs with public IPs (T274459)
16:25 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab1002.eqiad.wmnet
16:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gitlab1002.eqiad.wmnet with reason: decom
16:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gitlab1002.eqiad.wmnet with reason: decom
16:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gitlab1001.eqiad.wmnet with reason: decom
16:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gitlab1001.eqiad.wmnet with reason: decom
16:18 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1006.eqiad.wmnet
16:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1005.eqiad.wmnet
16:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd1006.eqiad.wmnet
16:12 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd1005.eqiad.wmnet
16:11 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd1004.eqiad.wmnet
16:09 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd1004.eqiad.wmnet
16:07 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts neon.eqiad.wmnet
16:05 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts neon.eqiad.wmnet
15:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1021.eqiad.wmnet
15:34 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1021.eqiad.wmnet
15:27 jayme: staging.svc.eqiad.wmnet now (temporarily) points to the staging-codfw kubernetes cluster (during upgrade in eqiad)
15:27 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
15:26 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
15:25 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
15:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1027.eqiad.wmnet
15:19 liw@deploy1002: Synchronized php: group1 wikis to 1.36.0-wmf.33 (duration: 01m 08s)
15:18 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1027.eqiad.wmnet
15:18 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.33
15:13 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/CentralAuth/: af899b6: Transform the first parameter to string (T276316) (duration: 01m 11s)
14:48 effie: upgrade memcached on mc1027,mc2027
14:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1101.eqiad.wmnet with reason: REIMAGE
14:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1100.eqiad.wmnet with reason: REIMAGE
14:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1101.eqiad.wmnet with reason: REIMAGE
14:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1099.eqiad.wmnet with reason: REIMAGE
14:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1100.eqiad.wmnet with reason: REIMAGE
14:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1099.eqiad.wmnet with reason: REIMAGE
14:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1018.eqiad.wmnet
13:58 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1018.eqiad.wmnet
13:09 godog: swift eqiad-prod: remove ssd weight for ms-be1034 - T276193
12:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1017.eqiad.wmnet
12:48 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1017.eqiad.wmnet
12:42 Urbanecm: Deploy a security patch for T276306
12:29 urbanecm@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/GrowthExperiments/ServiceWiring.php: cf635b4: Do not open DB connections during service initialization (T276307) (duration: 01m 11s)
12:26 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1016.eqiad.wmnet
12:26 Urbanecm: urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 90a205f: Add ReferenceTooltips and other gadget names for ReferencePreviews (T274353) (duration: 01m 10s)
12:20 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1016.eqiad.wmnet
12:04 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1014.eqiad.wmnet
11:58 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1014.eqiad.wmnet
11:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14616 and previous config saved to /var/cache/conftool/dbconfig/20210303-113349-root.json
11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P14615 and previous config saved to /var/cache/conftool/dbconfig/20210303-111843-root.json
11:07 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P14614 and previous config saved to /var/cache/conftool/dbconfig/20210303-110339-root.json
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: Slowly repool db1164 in s1 for the first time', diff saved to https://phabricator.wikimedia.org/P14613 and previous config saved to /var/cache/conftool/dbconfig/20210303-105726-root.json
10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P14612 and previous config saved to /var/cache/conftool/dbconfig/20210303-104836-root.json
10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P14611 and previous config saved to /var/cache/conftool/dbconfig/20210303-104522-marostegui.json
10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14610 and previous config saved to /var/cache/conftool/dbconfig/20210303-104302-root.json
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 90%: Slowly repool db1164 in s1 for the first time', diff saved to https://phabricator.wikimedia.org/P14609 and previous config saved to /var/cache/conftool/dbconfig/20210303-104223-root.json
10:38 jbond42: upload new wmf-laptop 0.5.0 package
10:37 vgutierrez: rolling restart of ats-tls on eqiad
10:34 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
10:34 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
10:34 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P14608 and previous config saved to /var/cache/conftool/dbconfig/20210303-102758-root.json
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: Slowly repool db1164 in s1 for the first time', diff saved to https://phabricator.wikimedia.org/P14607 and previous config saved to /var/cache/conftool/dbconfig/20210303-102719-root.json
10:25 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
10:25 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P14605 and previous config saved to /var/cache/conftool/dbconfig/20210303-101255-root.json
10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 60%: Slowly repool db1164 in s1 for the first time', diff saved to https://phabricator.wikimedia.org/P14604 and previous config saved to /var/cache/conftool/dbconfig/20210303-101215-root.json
10:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1003.eqiad.wmnet
10:00 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1003.eqiad.wmnet
10:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1003.eqiad.wmnet
09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P14602 and previous config saved to /var/cache/conftool/dbconfig/20210303-095751-root.json
09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: Slowly repool db1164 in s1 for the first time', diff saved to https://phabricator.wikimedia.org/P14601 and previous config saved to /var/cache/conftool/dbconfig/20210303-095712-root.json
09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudnet1003.eqiad.wmnet with reason: HW issue
09:54 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudnet1003.eqiad.wmnet with reason: HW issue
09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P14600 and previous config saved to /var/cache/conftool/dbconfig/20210303-095417-marostegui.json
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P14599 and previous config saved to /var/cache/conftool/dbconfig/20210303-095351-root.json
09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 30%: Slowly repool db1164 in s1 for the first time', diff saved to https://phabricator.wikimedia.org/P14598 and previous config saved to /var/cache/conftool/dbconfig/20210303-094208-root.json
09:41 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1132,1135-1138].eqiad.wmnet
09:39 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1132,1135-1138].eqiad.wmnet
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P14597 and previous config saved to /var/cache/conftool/dbconfig/20210303-093847-root.json
09:31 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1003.eqiad.wmnet
09:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1138.eqiad.wmnet with reason: REIMAGE
09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1138.eqiad.wmnet with reason: REIMAGE
09:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1137.eqiad.wmnet with reason: REIMAGE
09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: Slowly repool db1164 in s1 for the first time', diff saved to https://phabricator.wikimedia.org/P14596 and previous config saved to /var/cache/conftool/dbconfig/20210303-092705-root.json
09:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1137.eqiad.wmnet with reason: REIMAGE
09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P14595 and previous config saved to /var/cache/conftool/dbconfig/20210303-092343-root.json
09:16 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
09:16 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 15%: Slowly repool db1164 in s1 for the first time', diff saved to https://phabricator.wikimedia.org/P14594 and previous config saved to /var/cache/conftool/dbconfig/20210303-091201-root.json
09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P14593 and previous config saved to /var/cache/conftool/dbconfig/20210303-090840-root.json
09:02 zpapierski@deploy1002: Finished deploy [wdqs/wdqs@dbfd1f6]: Deploying emergency fix - WDQS 0.3.66 (duration: 08m 17s)
09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P14592 and previous config saved to /var/cache/conftool/dbconfig/20210303-090030-marostegui.json
08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 10%: Slowly repool db1164 in s1 for the first time', diff saved to https://phabricator.wikimedia.org/P14591 and previous config saved to /var/cache/conftool/dbconfig/20210303-085658-root.json
08:54 zpapierski@deploy1002: Started deploy [wdqs/wdqs@dbfd1f6]: Deploying emergency fix - WDQS 0.3.66
08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1164 in s1 T258361', diff saved to https://phabricator.wikimedia.org/P14590 and previous config saved to /var/cache/conftool/dbconfig/20210303-085014-marostegui.json
08:48 test: tcpircbot --joe
08:40 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1136.eqiad.wmnet with reason: REIMAGE
08:40 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1136.eqiad.wmnet with reason: REIMAGE
08:32 godog: stop/mask tcpircbot-logmsgbot on pontoon-icinga-01 - T276299
07:30 _joe_: test
07:17 _joe_: test log
06:41 marostegui: Testing log
06:27 ryankemper: T275345 T274555 `sudo confctl select 'name=elastic2054.codfw.wmnet' set/pooled=yes` on `ryankemper@puppetmaster1001`
06:26 ryankemper: T275345 T274555 `sudo confctl select 'name=elastic2045.codfw.wmnet' set/pooled=yes` on `ryankemper@puppetmaster1001`
06:21 ryankemper: T275345 T274555 Re-pooling `elastic2045` and `elastic2054` (commands follow)
06:20 ryankemper: T275345 T274555 `curl -H 'Content-Type: application/json' -XPUT http://localhost:9400/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_name": null,"_ip": null}'` => `{"acknowledged":true,"persistent":{},"transient":{}}`}}
06:18 ryankemper: T275345 T274555 `curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_name": null,"_ip": null}'` => `{"acknowledged":true,"persistent":{},"transient":{}}`}}
06:17 ryankemper: T275345 T274555 Unbanning `elastic2045` and `elastic2054` from our cluster now that both hosts have been re-imaged and are running without errors (commands follow)
06:15 ryankemper: T274555 Removed downtime for `elastic2054`
05:32 ryankemper: T274555 `sudo -i wmf-auto-reimage-host --conftool -p T274555 elastic2054.codfw.wmnet` on `ryankemper@cumin2001` tmux session `elastic_reimage_elastic2054`
05:27 ryankemper: Downtime `wdqs1012` until `2021-03-03 19:25:40` (~14 hours from now). Its `wdqs-updater` is failing; ultimately it's blazegraph journal is probably in a bad state meaning we'd have to copy one over from a healthy node, but not kicking that off right now so that we can investigate a little bit first
05:16 ryankemper: T275345 `ryankemper@elastic2045:~$ sudo apt-get upgrade wmf-elasticsearch-search-plugins`
03:50 ryankemper: Depooled `wdqs1012` until I've got its updater back online
03:24 ryankemper: `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` ~2 mins ago
02:45 ejegg: updated fundraising CiviCRM from e1dacbe348 to b13e70d968
02:09 ejegg: updated payments-wiki from 365bf54393 to 65dbf0ed9d
00:42 Urbanecm: Finished deployment in Evening B&C window; logmsgbot is currently down, and a simple restart did not bring it back up
00:41 Urbanecm: 00:40:16 Synchronized wmf-config/config/idwiki.yaml: 80edca8: Enable Growth features in idwiki in stealth mode (T259024; 3/3) (duration: 01m 09s)
00:38 Urbanecm: 00:38:12 Synchronized dblists/growthexperiments.dblist: 80edca8: Enable Growth features in idwiki in stealth mode (T259024; 2/3) (duration: 01m 10s)
00:31 Urbanecm: 00:31:26 Synchronized wmf-config/InitialiseSettings.php: 80edca8: Enable Growth features in idwiki in stealth mode (T259024; 1/3) (duration: 01m 11s)
00:21 dwisehaupt: replication restarted on frdb2001 after utf8mb4 conversion completed.
00:21 mutante: alert1001 systemctl restart tcpircbot-logmsgbot
00:08 urbanecm@deploy1002: sync-file aborted: 80edca8: Enable Growth features in idwiki in stealth mode (T259024; 1/3) (duration: 06m 45s)

2021-03-02

23:52 mutante: mwmaint2002 - find /home -nouser -delete
23:42 shdubsh: restart kibana to finalize phatality 7.10 deployment
23:38 twentyafterfour@deploy1002: Finished deploy [releng/phatality@4d0f053]: sudoer rules fixed, trying again: deploy phatality (duration: 00m 06s)
23:38 twentyafterfour@deploy1002: Started deploy [releng/phatality@4d0f053]: sudoer rules fixed, trying again: deploy phatality
23:27 twentyafterfour@deploy1002: Finished deploy [releng/phatality@4d0f053]: trying again: deploy phatality 7.10 (duration: 00m 37s)
23:27 twentyafterfour@deploy1002: Started deploy [releng/phatality@4d0f053]: trying again: deploy phatality 7.10
23:22 twentyafterfour@deploy1002: Finished deploy [releng/phatality@4d0f053]: deploy phatality 7.10 (duration: 00m 05s)
23:22 twentyafterfour@deploy1002: Started deploy [releng/phatality@4d0f053]: deploy phatality 7.10
23:20 twentyafterfour@deploy1002: Finished deploy [releng/phatality@4d0f053]: deploy phatality 7.10 (duration: 01m 01s)
23:19 twentyafterfour@deploy1002: Started deploy [releng/phatality@4d0f053]: deploy phatality 7.10
23:11 mutante: mwmaint2002 - rsyncing home dirs from mwmaint1002 (T275905)
23:09 ebernhardson: restart weged prometheus-wmf-elasticsearch-exporter-9200 on elastic2042
23:03 mforns@deploy1002: Finished deploy [analytics/refinery@3bd0858] (hadoop-test): Regular analytics weekly train TEST- forgot version bump [analytics/refinery@3bd0858d0c3b524e6d170099d1e2f3d12fad495d] (duration: 04m 56s)
22:58 mforns@deploy1002: Started deploy [analytics/refinery@3bd0858] (hadoop-test): Regular analytics weekly train TEST- forgot version bump [analytics/refinery@3bd0858d0c3b524e6d170099d1e2f3d12fad495d]
22:53 mforns@deploy1002: Finished deploy [analytics/refinery@3bd0858] (thin): Regular analytics weekly train THIN- forgot bnump up [analytics/refinery@3bd0858d0c3b524e6d170099d1e2f3d12fad495d] (duration: 00m 06s)
22:53 mforns@deploy1002: Started deploy [analytics/refinery@3bd0858] (thin): Regular analytics weekly train THIN- forgot bnump up [analytics/refinery@3bd0858d0c3b524e6d170099d1e2f3d12fad495d]
22:53 mforns@deploy1002: Finished deploy [analytics/refinery@3bd0858]: Regular analytics weekly train- forgot bump up [analytics/refinery@3bd0858d0c3b524e6d170099d1e2f3d12fad495d] (duration: 18m 41s)
22:34 mforns@deploy1002: Started deploy [analytics/refinery@3bd0858]: Regular analytics weekly train- forgot bump up [analytics/refinery@3bd0858d0c3b524e6d170099d1e2f3d12fad495d]
22:23 mforns@deploy1002: Finished deploy [analytics/refinery@af99602] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@af99602101018664670a76d28cd755caf07dcde7] (duration: 07m 30s)
22:16 mforns@deploy1002: Started deploy [analytics/refinery@af99602] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@af99602101018664670a76d28cd755caf07dcde7]
22:14 mforns@deploy1002: Finished deploy [analytics/refinery@af99602] (thin): Regular analytics weekly train THIN [analytics/refinery@af99602101018664670a76d28cd755caf07dcde7] (duration: 00m 07s)
22:14 mforns@deploy1002: Started deploy [analytics/refinery@af99602] (thin): Regular analytics weekly train THIN [analytics/refinery@af99602101018664670a76d28cd755caf07dcde7]
22:12 mforns@deploy1002: Finished deploy [analytics/refinery@af99602]: Regular analytics weekly train [analytics/refinery@af99602101018664670a76d28cd755caf07dcde7] (duration: 13m 09s)
21:59 mforns@deploy1002: Started deploy [analytics/refinery@af99602]: Regular analytics weekly train [analytics/refinery@af99602101018664670a76d28cd755caf07dcde7]
21:58 mforns@deploy1002: deploy aborted: Regular analytics weekly train [analytics/refinery@COMMIT_HASH] (duration: 00m 01s)
21:57 mforns@deploy1002: Started deploy [analytics/refinery@af99602]: Regular analytics weekly train [analytics/refinery@COMMIT_HASH]
21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mwmaint2001.codfw.wmnet with reason: decom
21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mwmaint2001.codfw.wmnet with reason: decom
21:51 legoktm: copied docker-registry package from stretch-wikimedia to buster-wikimedia (T272550)
20:47 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I7f387bf19e5f prep wgChronologyProtectorStash ahead of wmf.33 roll out to ensure cross-wiki consistency (duration: 01m 18s)
20:04 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
20:03 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
20:00 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
20:00 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
19:56 papaul: codfw mgmt is going down for 5 minutes for maintenance thank youn
19:53 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
19:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c48e40a: Enable babel categorize on thwikisource (T275283) (duration: 01m 09s)
19:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:39 pt1979@cumin2001: START - Cookbook sre.dns.netbox
19:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f6fa5b3: Set local timezone for trwikivoyage to UTC (T275598) (duration: 01m 09s)
19:15 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
19:13 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
18:59 ebernhardson: apply merge.policy.deletes_pct_allowed=20 to production-search-codfw commonswiki_file to encourage merging away deleted docs from T271493
18:53 mholloway-shell@deploy1002: Synchronized php-1.36.0-wmf.32/extensions/EventLogging: Fix timestamp format for migrated events (T276235) (duration: 01m 10s)
18:42 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
18:40 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
18:28 dduvall@deploy1002: Synchronized private/readme.php: Config: Extend wmfSwiftConfig placeholder keys (duration: 01m 09s)
18:21 mholloway-shell@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/EventLogging: Fix timestamp format for migrated events (T276235) (duration: 01m 09s)
18:12 vgutierrez: rolling restart of ats-tls on esams
17:46 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@869a29b]: ores_bulk_ingest: Increase drafttopic error_threshold to 1 per 500 (duration: 02m 55s)
17:43 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@869a29b]: ores_bulk_ingest: Increase drafttopic error_threshold to 1 per 500
17:39 legoktm@deploy1002: Synchronized php-1.36.0-wmf.32/extensions/Graph/: Do not log graph errors to WMF servers (duration: 01m 08s)
17:21 legoktm@deploy1002: Synchronized php-1.36.0-wmf.32/extensions/ContentTranslation/: Re-apply: CX3 Build 0.1.0+20210223 (duration: 01m 10s)
16:37 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
16:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum1001.eqiad.wmnet
16:33 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum1001.eqiad.wmnet
16:14 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
16:14 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
16:10 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
16:10 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
15:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1084 (re)pooling @ 100%: Repool db1084 after cloning db1164', diff saved to https://phabricator.wikimedia.org/P14563 and previous config saved to /var/cache/conftool/dbconfig/20210302-155932-root.json
15:56 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
15:56 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
15:55 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet
15:52 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet
15:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1084 (re)pooling @ 85%: Repool db1084 after cloning db1164', diff saved to https://phabricator.wikimedia.org/P14562 and previous config saved to /var/cache/conftool/dbconfig/20210302-154429-root.json
15:35 tgr@deploy1002: Synchronized php-1.36.0-wmf.33/extensions/GrowthExperiments/: Backport: HomepageHooks: Block search data hook if link recommendations are off (T276224) (duration: 01m 13s)
15:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1084 (re)pooling @ 75%: Repool db1084 after cloning db1164', diff saved to https://phabricator.wikimedia.org/P14561 and previous config saved to /var/cache/conftool/dbconfig/20210302-152925-root.json
15:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1084 (re)pooling @ 50%: Repool db1084 after cloning db1164', diff saved to https://phabricator.wikimedia.org/P14560 and previous config saved to /var/cache/conftool/dbconfig/20210302-151422-root.json
15:00 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
15:00 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
14:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1084 (re)pooling @ 25%: Repool db1084 after cloning db1164', diff saved to https://phabricator.wikimedia.org/P14559 and previous config saved to /var/cache/conftool/dbconfig/20210302-145918-root.json
14:57 vgutierrez: rolling restart of ats-tls on codfw
14:53 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
14:53 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1084 (re)pooling @ 10%: Repool db1084 after cloning db1164', diff saved to https://phabricator.wikimedia.org/P14558 and previous config saved to /var/cache/conftool/dbconfig/20210302-144415-root.json
14:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
14:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
14:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
14:29 jynus: dropping db grants for bacula from m1 T274809
14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1084 (re)pooling @ 5%: Repool db1084 after cloning db1164', diff saved to https://phabricator.wikimedia.org/P14557 and previous config saved to /var/cache/conftool/dbconfig/20210302-142911-root.json
14:07 jynus: dropping database bacula from m1 (with replication) T274809
14:04 liw@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.33
13:57 moritzm: installing bind9 security updates on stretch (client-side tools/libs only)
13:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kubestagemaster1001.eqiad.wmnet
13:44 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kubemaster1002.eqiad.wmnet
13:42 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubestagemaster1001.eqiad.wmnet
13:25 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubemaster1002.eqiad.wmnet
13:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kubemaster1001.eqiad.wmnet
13:16 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kubemaster2001.codfw.wmnet
13:13 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
13:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
13:08 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubemaster1001.eqiad.wmnet
12:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kubemaster2002.codfw.wmnet
12:53 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
12:46 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubemaster2001.codfw.wmnet
12:44 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1012.eqiad.wmnet
12:43 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host kubemaster2001.codfw.wmnet
12:39 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1012.eqiad.wmnet
12:32 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
12:32 jayme@cumin1001: START - Cookbook sre.discovery.service-route
12:28 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
12:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 29952b4: vector: Stage 3 of WVUI search treatment A/B test (T249297) (duration: 01m 08s)
12:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5674d2a: Enable SectionTranslation in testwiki (T275596) (duration: 01m 09s)
12:13 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2003.codfw.wmnet
12:12 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2003.codfw.wmnet
12:12 mbsantos@deploy1002: Finished deploy [tilerator/deploy@8d3d81c]: (no justification provided) (duration: 00m 15s)
12:11 mbsantos@deploy1002: Started deploy [tilerator/deploy@8d3d81c]: (no justification provided)
12:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2002.codfw.wmnet
12:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: af89965: Remove test2wiki from wgContentTranslationAsBetaFeature (duration: 01m 38s)
12:02 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2002.codfw.wmnet
12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 to clone db1164 T258361', diff saved to https://phabricator.wikimedia.org/P14554 and previous config saved to /var/cache/conftool/dbconfig/20210302-115959-marostegui.json
11:53 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagetcd2001.codfw.wmnet
11:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@937deb5]: (no justification provided) (duration: 00m 03s)
11:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@937deb5]: (no justification provided)
11:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagetcd2001.codfw.wmnet
11:47 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2001.codfw.wmnet
11:40 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
11:16 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubemaster2002.codfw.wmnet
11:16 akosiaris@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubemaster2001.codfw.wmnet
11:12 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
11:11 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
10:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1028.eqiad.wmnet
10:31 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1028.eqiad.wmnet
10:30 effie: upgrade memcached on mc2024, mc1028
10:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1119.eqiad.wmnet
10:18 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1119.eqiad.wmnet
10:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE
10:09 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE
10:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
10:03 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1130-1131].eqiad.wmnet
09:52 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1130-1131].eqiad.wmnet
09:46 liw@deploy1002: Finished scap: testwikis wikis to 1.36.0-wmf.33 (duration: 36m 20s)
09:43 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1124-1128].eqiad.wmnet
09:41 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1124-1128].eqiad.wmnet
09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1120-1123].eqiad.wmnet
09:37 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1120-1123].eqiad.wmnet
09:36 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1119.eqiad.wmnet
09:33 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1119.eqiad.wmnet
09:12 liw@deploy1002: Started scap: testwikis wikis to 1.36.0-wmf.33
08:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE
08:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE
08:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE
08:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE
08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE
08:54 vgutierrez: rolling restart of ats-tls on ulsfo
08:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE
08:39 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
08:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE
08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE
08:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE
08:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE
08:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE
08:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE
08:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE
07:59 liw: 1.36.0-wmf.33 was branched at 800e1f8 for T274937
07:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE
07:58 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE
07:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE
07:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE
07:54 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
07:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE
07:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE
07:27 ryankemper: Pooled `elastic106[0,4]` (Noticed I never re-pooled these hosts after resolving an incident last week)
07:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE
07:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE
07:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE
05:40 Amir1: apply gerrit:667757 on mwdebug1002 to test T259360
05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2152 into s8 as vslow - T275633', diff saved to https://phabricator.wikimedia.org/P14551 and previous config saved to /var/cache/conftool/dbconfig/20210302-053814-marostegui.json
00:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0f08e8b: Update the Persian Wikipedia logos (T261033; 2/2) (duration: 00m 56s)
00:58 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: 0f08e8b: Update the Persian Wikipedia logos (T261033; 1/2) (duration: 00m 56s)
00:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 97ebf75: Separate Wikivoyage wordmark and icon (T261033; T273477) (duration: 00m 56s)
00:53 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: 97ebf75: Separate Wikivoyage wordmark and icon (T261033; T273477) (duration: 00m 56s)
00:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 61647cd: Fixes max-width configuration for new Vector (T260091) (duration: 00m 56s)
00:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6cc8521: Enable og tags on non-wikidata wikis (T157145) (duration: 00m 56s)
00:37 urbanecm@deploy1002: Synchronized wmf-config/config/hrwiki.yaml: REDEPLOY: d53834e: Enable Growth features on hrwiki in stealth mode (3/3; T275684) (duration: 00m 56s)
00:36 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: REDEPLOY: d53834e: Enable Growth features on hrwiki in stealth mode (2/3; T275684) (duration: 00m 56s)
00:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: d53834e: Enable Growth features on hrwiki in stealth mode (1/3; T275684) (duration: 00m 55s)
00:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: Config: EventLoggingSchemas: Bump HomepageVisit version (T275615) (duration: 00m 56s)
00:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: 21cb6f5: Revert "Revert "vector: Stage 2 of WVUI search treatment A/B test"" (T249297) (duration: 00m 56s)
00:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: 599b739: Simplify deployment of Growth team features (3/3; T276091) (duration: 00m 56s)
00:27 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: REDEPLOY: de0f741: Simplify deployment of Growth team features (2/3; T276091) (duration: 00m 57s)
00:26 urbanecm@deploy1002: sync-file aborted: REDEPLOY: de0f741: Simplify deployment of Growth team features (2/3; T276091) (duration: 00m 25s)
00:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: e991806: Simplify deployment of Growth team features (1/3; T276091) (duration: 00m 56s)
00:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: Revert: vector: Stage 2 of WVUI search treatment A/B test (T249297) (duration: 00m 56s)
00:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOY: 1edcbb5: vector: Stage 2 of WVUI search treatment A/B test (T249297) (duration: 00m 56s)
00:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOYING: 2a8ece1: GrowthExperiments: set GELinkRecommendationsUseEventGate (duration: 00m 57s)
00:18 urbanecm@deploy1002: sync-file aborted: 2a8ece1: GrowthExperiments: set GELinkRecommendationsUseEventGate (duration: 00m 05s)
00:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REDEPLOYING: 92f6597: rowiki: Update help panel links (T275130) (duration: 00m 59s)
00:16 pt1979@cumin2001: START - Cookbook sre.dns.netbox
00:11 mutante: deploy2002 - ran 'git etch' in /srv/mediawiki-staging

2021-03-01

23:05 eileen: civicrm revision changed from 04a029958c to e1dacbe348, config revision is 643477b35d
23:01 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@61e7533]: ores_bulk_ingest: Handle unexpected api response (duration: 01m 33s)
23:00 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@61e7533]: ores_bulk_ingest: Handle unexpected api response
22:57 mholloway-shell@deploy1002: Synchronized php-1.36.0-wmf.32/extensions/WikimediaEvents: Fix: Restore exporting wgWMESchemaEditAttemptStepSamplingRate to JS (duration: 00m 57s)
22:41 mstyles@deploy1002: Finished deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix (T270103) (duration: 02m 04s)
22:39 mstyles@deploy1002: Started deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix (T270103)
22:22 dwisehaupt: ran the following on frdb2001 to allow replication to continue after conversion to utf8mb4 charset: set global slave_type_conversions = ALL_NON_LOSSY;
22:21 dwisehaupt: stopping mysql replication on frdb2001 and starting utf8mb4 table alters under a root screen session
22:16 eileen: civicrm revision changed from f07390ff87 to 04a029958c, config revision is 643477b35d
22:12 twentyafterfour@deploy1002: Finished scap: (no justification provided) (duration: 16m 24s)
21:57 twentyafterfour: running scap sync from the new server deply1002
21:56 twentyafterfour@deploy1002: Started scap: (no justification provided)
21:54 mstyles@deploy1002: Finished deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix (T270103) (duration: 02m 34s)
21:52 mstyles@deploy1002: Started deploy [wikimedia/discovery/analytics@ca2c5b5]: import commons ttl dag fix (T270103)
21:49 mutante: deploy1002 - removed scap-global-lock, unlocked scap
21:43 phamhi: rebooted clouddb1013 for maintenance
21:38 mutante: cumin 'mw*' 'grep master_rsync /etc/scap.cfg' showed all mw servers are now using deploy1002 (T265963)
21:30 shdubsh: completed removal of kafka logging inputs to legacy logstash cluster - T234854
21:18 mutante: mw1262 - running puppet to switch to new deployment server, scap pull
21:16 effie: pooling mw1262 back
21:08 mutante: [mwdebug1001:~] $ /usr/local/lib/nagios/plugins/check_mw_versions --deployhost deploy1002.eqiad.wmnet - OKAY: wikiversions in sync (T265963)
21:05 mutante: re-enabling puppet on deploy1001 - running puppet on deploy*, switching eqiad scap master and deployment_server globally (T265963)
20:37 mutante: deploy1001 - disable puppet and manually create scap-global-lock - NO DEPLOYMENTS
20:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1029.eqiad.wmnet
20:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1029.eqiad.wmnet
20:28 effie: upgrade mc1029, mc2029 to memcached 1.6
19:55 urbanecm@deploy1001: Synchronized wmf-config/config/hrwiki.yaml: d53834e: Enable Growth features on hrwiki in stealth modeEnable Growth features on hrwiki in stealth mode (3/3; T275684) (duration: 00m 54s)
19:54 urbanecm@deploy1001: sync-file aborted: d53834e: Enable Growth features on hrwiki in stealth modeEnable Growth features on hrwiki in stealth mode (3/3; T275684) (duration: 00m 03s)
19:53 urbanecm@deploy1001: Synchronized dblists/growthexperiments.dblist: d53834e: Enable Growth features on hrwiki in stealth modeEnable Growth features on hrwiki in stealth mode (2/3; T275684) (duration: 00m 56s)
19:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d53834e: Enable Growth features on hrwiki in stealth modeEnable Growth features on hrwiki in stealth mode (1/3; T275684) (duration: 00m 55s)
19:41 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: EventLoggingSchemas: Bump HomepageVisit version (T275615) (duration: 00m 56s)
19:34 phuedx@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Revert "vector: Stage 2 of WVUI search treatment A/B test"" (T249297) (duration: 00m 54s)
19:20 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:02 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 599b739: Simplify deployment of Growth team features (3/3; T276091) (duration: 01m 00s)
19:01 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: de0f741: Simplify deployment of Growth team features (2/3; T276091) (duration: 00m 57s)
18:56 pt1979@cumin2001: START - Cookbook sre.dns.netbox
18:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e991806: Simplify deployment of Growth team features (1/3; T276091) (duration: 00m 57s)
18:42 mutante: mwmaint2002.mgmt - racadm serveraction powerup
18:26 ryankemper: [Relforge] Lifting downtime on `relforge1004` now that T275658 is done
18:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1307.eqiad.wmnet
18:24 mutante: mw1307 - back to stretch now
18:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
18:20 mutante: mwmaint2002 - shutting down for maintenance
18:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1098.eqiad.wmnet with reason: REIMAGE
18:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1098.eqiad.wmnet with reason: REIMAGE
18:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mwmaint2002.codfw.wmnet with reason: new install
18:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mwmaint2002.codfw.wmnet with reason: new install
18:00 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
18:00 mutante: puppetmaster1001 - generating mcrouter cert for mwmaint2002 T275905
17:58 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
17:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
17:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
17:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
17:07 mutante: our latest Wikipedia language edition ready to move on from the incubator https://tay.wikipedia.org
17:05 mutante: new Wikimedia project language - tay - Atayal is spoken by the Atayal people of Taiwan
17:03 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
16:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1097.eqiad.wmnet with reason: REIMAGE
16:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1097.eqiad.wmnet with reason: REIMAGE
16:20 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
15:57 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
15:11 vgutierrez: rolling restart of ats-tls on cp[5007-5011]
14:49 marostegui: Failover m3 proxy back to dbproxy1020
14:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1030.eqiad.wmnet
14:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1030.eqiad.wmnet
14:18 effie: upgrade mc1030 mc2030 to memcached 1.6
14:07 marostegui: Upgrade dbproxy1020 kernel
14:05 moritzm: installing openldap security updates on stretch (client-side tools/libs only, slapd instances all on Buster and fixed)
13:22 moritzm: instaling docker.io security updates for Buster
12:26 awight: EU config deployments complete
12:10 awight@deploy1001: Synchronized wmf-config: Config: GrowthExperiments: set GELinkRecommendationsUseEventGate (T274198) (duration: 01m 05s)
11:49 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 55s)
11:48 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 55s)
10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14547 and previous config saved to /var/cache/conftool/dbconfig/20210301-104842-root.json
10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 85%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14546 and previous config saved to /var/cache/conftool/dbconfig/20210301-103338-root.json
10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14545 and previous config saved to /var/cache/conftool/dbconfig/20210301-101835-root.json
10:15 vgutierrez: restart ats-tls on cp5012
10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 65%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14544 and previous config saved to /var/cache/conftool/dbconfig/20210301-100331-root.json
09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14543 and previous config saved to /var/cache/conftool/dbconfig/20210301-094828-root.json
09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 40%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14542 and previous config saved to /var/cache/conftool/dbconfig/20210301-093324-root.json
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14541 and previous config saved to /var/cache/conftool/dbconfig/20210301-092536-root.json
09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 30%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14540 and previous config saved to /var/cache/conftool/dbconfig/20210301-091820-root.json
09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 85%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14539 and previous config saved to /var/cache/conftool/dbconfig/20210301-091032-root.json
09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14538 and previous config saved to /var/cache/conftool/dbconfig/20210301-090317-root.json
08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14537 and previous config saved to /var/cache/conftool/dbconfig/20210301-085529-root.json
08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 20%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14536 and previous config saved to /var/cache/conftool/dbconfig/20210301-084813-root.json
08:45 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 92f6597: rowiki: Update help panel links (T275130) (duration: 01m 08s)
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 65%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14535 and previous config saved to /var/cache/conftool/dbconfig/20210301-084025-root.json
08:38 elukey: reboot an-worker1112
08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 15%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14534 and previous config saved to /var/cache/conftool/dbconfig/20210301-083310-root.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14533 and previous config saved to /var/cache/conftool/dbconfig/20210301-082521-root.json
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14532 and previous config saved to /var/cache/conftool/dbconfig/20210301-081806-root.json
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 40%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14531 and previous config saved to /var/cache/conftool/dbconfig/20210301-081018-root.json
08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14530 and previous config saved to /var/cache/conftool/dbconfig/20210301-080303-root.json
07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14529 and previous config saved to /var/cache/conftool/dbconfig/20210301-075514-root.json
07:53 marostegui: Upgrade pc1010 pc2008 pc200 to 10.4.18
07:53 elukey: clean up old logs + apt-get clean + puppet clientbucket on an-coord1001 to free space
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 4%: Slowly pool db1168 for the first time', diff saved to https://phabricator.wikimedia.org/P14528 and previous config saved to /var/cache/conftool/dbconfig/20210301-074759-root.json
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 15%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14527 and previous config saved to /var/cache/conftool/dbconfig/20210301-074011-root.json
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Give some more weight to db1168', diff saved to https://phabricator.wikimedia.org/P14526 and previous config saved to /var/cache/conftool/dbconfig/20210301-072957-marostegui.json
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14525 and previous config saved to /var/cache/conftool/dbconfig/20210301-072507-root.json
07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Give some more weight to db1168', diff saved to https://phabricator.wikimedia.org/P14524 and previous config saved to /var/cache/conftool/dbconfig/20210301-071047-marostegui.json
07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14523 and previous config saved to /var/cache/conftool/dbconfig/20210301-071004-root.json
07:05 marostegui: Stop MySQL on db2082 to clone db2152 - T275633
06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: Repool db1134 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P14521 and previous config saved to /var/cache/conftool/dbconfig/20210301-065500-root.json
06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1168 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14520 and previous config saved to /var/cache/conftool/dbconfig/20210301-064704-marostegui.json
06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1168 to dbctl T258361!', diff saved to https://phabricator.wikimedia.org/P14519 and previous config saved to /var/cache/conftool/dbconfig/20210301-064603-marostegui.json
06:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1092.eqiad.wmnet
06:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1092.eqiad.wmnet

2021-02-28

14:17 gehel: repooled wdqs1011 - catched up on lag

2021-02-27

21:19 dwisehaupt: ran the following on frdb2002 to allow replication to continue after conversion to utf8mb4 charset: set global slave_type_conversions = ALL_NON_LOSSY;
18:44 gehel: depooled wdqs1011 to catch up on lag
18:37 gehel: powercycling wdqs1011
00:08 mutante: deploy1002 - rsyncing home dirs from deploy1001

2021-02-26

20:29 mutante: deploy2001 - /srv/mediawiki-staging sudo find . -name *.cdb delete - deleted 190 GB of old cdb files (T275826 T265963)
18:31 dwisehaupt: starting the utf8mb4 table alters on frdb2002 under a root screen session
17:59 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
17:57 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
15:04 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:59 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:57 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:51 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:49 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:44 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:43 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:38 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:37 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:31 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:30 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:25 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
14:22 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
14:17 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:56 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:51 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:51 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:45 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:44 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
13:38 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
13:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1031.eqiad.wmnet
13:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1031.eqiad.wmnet
12:59 effie: upgrade memcached on mc1031, mc2031
12:40 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
12:40 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
12:40 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
12:40 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
12:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
12:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
12:22 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
12:22 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
12:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
12:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
12:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
12:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Add new vslow,dump host to codfw s4 - T275633', diff saved to https://phabricator.wikimedia.org/P14508 and previous config saved to /var/cache/conftool/dbconfig/20210226-121438-marostegui.json
12:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
12:11 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1003.wikimedia.org
12:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
12:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
12:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
12:10 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
12:07 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
12:00 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1003.wikimedia.org
12:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1004.wikimedia.org
11:59 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
11:59 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
11:59 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
11:58 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
11:55 jbond42: delete exim messages in the queue ro root@wikimedia.org older then 7200 seconds and younger the 10800 seconds on mx1001
11:54 jbond42: delete exim messages in the queue ro root@wikimedia.org older then 7200 seconds and younger the 10800 seconds on mx2001
11:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
11:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
11:47 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
11:42 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
11:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1005.wikimedia.org
11:41 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
11:38 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
11:38 vgutierrez: rolling restart of ats-tls on cp500[1-5]
11:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
11:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
11:33 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
11:32 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
11:30 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
11:27 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE
11:21 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
11:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
11:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'production' .
11:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
11:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .
11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
11:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
11:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
11:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE
11:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
11:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'production' .
11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
11:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
11:17 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
11:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
11:16 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
11:16 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
11:16 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
11:16 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
11:16 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
11:15 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
11:15 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
11:12 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
11:10 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
11:05 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
11:05 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
11:05 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
11:05 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
11:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
11:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
11:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
11:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' .
11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
11:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
11:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
11:00 dcaro@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
10:59 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephmon2002-dev.codfw.wmnet
10:55 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephmon2002-dev.codfw.wmnet
10:54 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephmon2003-dev.codfw.wmnet
10:50 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephmon2003-dev.codfw.wmnet
10:50 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephmon2003-dev.codfw.wmnet
10:46 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
10:44 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1039.eqiad.wmnet
10:44 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
10:44 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
10:43 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephmon2003-dev.codfw.wmnet
10:41 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephmon2002-dev.codfw.wmnet
10:38 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephmon2002-dev.codfw.wmnet
10:38 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephmon2001-dev.codfw.wmnet
10:38 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt1039.eqiad.wmnet
10:34 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephmon2001-dev.codfw.wmnet
10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
10:33 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' .
10:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
10:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
10:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
10:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
10:32 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
10:31 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
10:31 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
10:31 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
10:31 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
10:31 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
10:28 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2003-dev.codfw.wmnet
10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14505 and previous config saved to /var/cache/conftool/dbconfig/20210226-102254-root.json
10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE
10:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1096.eqiad.wmnet with reason: REIMAGE
10:14 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2003-dev.codfw.wmnet
10:09 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2002-dev.codfw.wmnet
10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 85%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14504 and previous config saved to /var/cache/conftool/dbconfig/20210226-100750-root.json
10:06 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2001-dev.wikimedia.org
10:05 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2002-dev.codfw.wmnet
09:59 aborrero@cumin2001: START - Cookbook sre.hosts.reboot-single for host cloudweb2001-dev.wikimedia.org
09:59 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2003-dev.wikimedia.org
09:55 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2002-dev.wikimedia.org
09:54 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2001-dev.codfw.wmnet
09:52 aborrero@cumin2001: START - Cookbook sre.hosts.reboot-single for host cloudservices2003-dev.wikimedia.org
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14503 and previous config saved to /var/cache/conftool/dbconfig/20210226-095247-root.json
09:50 aborrero@cumin2001: START - Cookbook sre.hosts.reboot-single for host cloudservices2002-dev.wikimedia.org
09:50 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2001-dev.codfw.wmnet
09:48 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd2001-dev.codfw.wmnet
09:43 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcephosd2001-dev.codfw.wmnet
09:41 aborrero@cumin2001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudcontrol2001-dev.wikimedia.org
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 65%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14502 and previous config saved to /var/cache/conftool/dbconfig/20210226-093743-root.json
09:33 aborrero@cumin2001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2001-dev.wikimedia.org
09:28 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:24 root@cumin1001: START - Cookbook sre.dns.netbox
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14501 and previous config saved to /var/cache/conftool/dbconfig/20210226-092240-root.json
09:13 jbond42: pupet enabled post sudoers fix, running puppet fleet wide with cumin -b 15 '*' 'run-puppet-agent '
09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 40%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14500 and previous config saved to /var/cache/conftool/dbconfig/20210226-090736-root.json
08:55 jbond42: disabled puppet pending rollback of https://gerrit.wikimedia.org/r/666899
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14498 and previous config saved to /var/cache/conftool/dbconfig/20210226-085233-root.json
08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 15%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14497 and previous config saved to /var/cache/conftool/dbconfig/20210226-083729-root.json
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14496 and previous config saved to /var/cache/conftool/dbconfig/20210226-082226-root.json
08:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1058.eqiad.wmnet with reason: REIMAGE
08:17 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1058.eqiad.wmnet with reason: REIMAGE
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14495 and previous config saved to /var/cache/conftool/dbconfig/20210226-080722-root.json
08:04 elukey: run ipmi mc reset cold for analytics1058 - mgmt responding to pings and ipmi, but not to ssh
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repool db1169 after cloning db1134', diff saved to https://phabricator.wikimedia.org/P14494 and previous config saved to /var/cache/conftool/dbconfig/20210226-075219-root.json
07:02 marostegui: Stop MySQL on db2106 to clone db2147 T275633
07:01 elukey: reboot an-worker1099 to clear out kernel soft lockup errors
06:59 elukey: restart datanode on an-worker1099 - soft lockup kernel errors
06:53 kartik@deploy1001: Synchronized php-1.36.0-wmf.32/extensions/ContentTranslation: Bump ContentTranslation to e6b1a7c to include lost {{gerrit|666327}} backport (duration: 00m 58s)
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1092 from dbctl T275019', diff saved to https://phabricator.wikimedia.org/P14492 and previous config saved to /var/cache/conftool/dbconfig/20210226-063914-marostegui.json
06:32 kartik@deploy1001: Synchronized php-1.36.0-wmf.32/extensions/ContentTranslation: Resync ContentTranslation for {{gerrit|666327}} (duration: 01m 16s)
06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 to clone db1134 T275343', diff saved to https://phabricator.wikimedia.org/P14490 and previous config saved to /var/cache/conftool/dbconfig/20210226-061705-marostegui.json
05:29 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2045.codfw.wmnet with reason: REIMAGE
05:27 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2045.codfw.wmnet with reason: REIMAGE
05:25 ryankemper: [relforge] Downtimed `relforge1004` until `2021-03-02 07:23:36` (https://phabricator.wikimedia.org/T275658 is in flight to fix broken `kibana.service`)
05:07 ryankemper: T275345 `sudo -i wmf-auto-reimage-host --conftool -p T275345 elastic2045.codfw.wmnet` on `ryankemper@cumin2001` tmux session `elastic_reimage_elastic1065`
04:23 ryankemper: T267927 [WDQS Data Reload] `sudo -i cookbook sre.wdqs.data-reload wdqs2008.codfw.wmnet --task-id T267927 --reload-data wikidata --reason 'T267927: Reload wikidata jnl from fresh dumps' --reuse-downloaded-dump --depool` on `ryankemper@cumin2001` tmux session `wdqs_data_reload_2008`
04:21 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
00:14 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.32/extensions/Graph/: 9d5cf34: Do not log graph errors to WMF servers (T274557) (duration: 01m 36s)

2021-02-25

23:55 mutante: deploy1002, deploy2002 - scap-master-sync deploy1001.eqiad.wmnet (T265963)
23:41 mutante: deploy2001 2/2 - because rsync is --delete but also --exclude="**/cache/l10n/*.cdb" --exclude="*.swp" you can't expect /srv/mediawiki-staging to be the same size on 2 servers
23:39 mutante: deploy2001 - scap-master-sync from deploy1001 runs and attempts to --delete files to stay in sync but fails to do so because *.cdb files are in cache dirs and rsync does not want to delete non-empty directories, this leads to build up of the size of /srv/mediawiki-staging to 10 times the size of eqiad
23:34 mutante: deploy2001 - scap-master-sync from deploy1001
23:13 mutante: deploy1002 - /usr/local/bin/scap-master-sync deploy1001.eqiad.wmnet
23:01 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.30 (duration: 04m 20s)
21:38 legoktm: pushed new version of docker-registry.discovery.wmnet/wikimedia-buster image
21:20 mutante: deploy2001 - rsynced /srv/deployment from deploy1001 after gerrit:666757
20:57 eileen: civicrm revision changed from 604d07c859 to f07390ff87, config revision is 643477b35d
20:35 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.32 refs T274936
20:17 tgr@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/GrowthExperiments/: Backport: Impact module: Add "not rendered" state (T270294, T275615) (duration: 01m 08s)
19:40 tgr@deploy1001: Synchronized php-1.36.0-wmf.32/extensions/GrowthExperiments/: Backport: Impact module: Add "not rendered" state (T270294, T275615) (duration: 01m 26s)
19:16 ryankemper: T267927 Downloading dumps: `sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 -O /srv/wdqs/latest-all.ttl.bz2 && sudo https_proxy=webproxy.codfw.wmnet:8080 wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.bz2 -O /srv/wdqs/latest-lexemes.ttl.bz2` on `ryankemper@wdqs2008` tmux session `download_latest_dumps`
18:59 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:59 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
18:59 ryankemper: T267927 Manual puppet run got `wdqs2008` present in puppetdb again. Now being blocked by lack of host key for `wdqs2008` present on `cumin2001`, so I'm running puppet on `cumin2001` to get the latest state of `/etc/ssh/ssh_known_hosts`
18:57 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:57 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
18:56 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
18:50 ryankemper: T267927 Trying to kick off data reload on `wdqs2008` from `cumin2001` fails because of `spicerack.remote.RemoteError: No hosts provided`. Doing some spelunking through IRC history looks like this happens when a host is not present in puppetDB. I'm confirmed `wdqs2008` is absent on puppetboard, so running puppet agent to get it re-registered (hopefully)
18:38 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:38 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
18:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:37 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
18:36 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:36 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-reload
18:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'production' .
18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .
18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
18:30 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
18:27 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
18:25 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
18:23 bblack: dns[1235]002 - upgrade gdnsd to 3.6.0 (dns4002 and authdns2001 already running it for some time!)
18:21 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
18:20 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
18:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'production' .
18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
18:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
18:08 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
18:08 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
17:58 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
17:58 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
17:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
17:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
17:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
17:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
17:27 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' .
17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
17:26 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
17:25 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
17:25 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
17:25 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
17:25 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
17:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
17:19 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'production' .
17:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
17:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
17:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .
17:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
17:18 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
17:17 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
17:07 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
17:07 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
17:06 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
17:06 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
17:06 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'canary' .
17:05 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
17:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
17:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
17:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
17:04 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'production' .
17:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
17:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
17:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
17:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'canary' .
17:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
17:01 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
17:01 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
17:00 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
17:00 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
16:54 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
16:54 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
16:50 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
16:50 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
16:39 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
16:39 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
16:39 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
16:39 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' .
16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
16:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
16:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
16:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
16:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
16:36 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
16:36 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
16:34 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
16:34 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
16:28 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
16:23 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
16:17 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
16:16 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
16:16 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
16:16 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
16:16 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
15:38 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-serve-ctrl2002.codfw.wmnet
15:26 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: REIMAGE
15:24 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: REIMAGE
15:23 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl2002.codfw.wmnet
15:23 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-serve-ctrl2001.codfw.wmnet
15:05 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl2001.codfw.wmnet
15:00 moritzm: installing libmaxminddb updates from buster 10.8 point release
14:59 vgutierrez: pool cp4032
14:42 vgutierrez: depool cp4032 for ats-tls/NUMA tests
14:35 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-serve-ctrl1002.eqiad.wmnet
14:27 moritzm: installing postgresql security updates on buster
14:24 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-serve-ctrl1001.eqiad.wmnet
14:22 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl1002.eqiad.wmnet
14:20 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-serve-ctrl1002.eqiad.wmnet
14:17 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl1002.eqiad.wmnet
14:16 moritzm: installing cairo security updates on buster
14:14 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-serve-ctrl1002.eqiad.wmnet
14:10 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl1002.eqiad.wmnet
14:09 kormat@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-serve-ctrl1001.eqiad.wmnet
13:57 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: REIMAGE
13:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: REIMAGE
13:55 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: REIMAGE
13:53 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: REIMAGE
13:15 akosiaris: reinitialize all of staging-codfw. kubestage2* and kubestagemaster* have been scheduled downtime in icinga.
12:32 moritzm: installing openssl security updates on Buster
12:20 Lucas_WMDE: EU backport&config window done
12:16 phuedx@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [stage 1] Enable WVUI search by default to logged-in modern Vector users except on pilot wikis (T249297) (duration: 01m 31s)
11:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1134.eqiad.wmnet with reason: REIMAGE
11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1134.eqiad.wmnet with reason: REIMAGE
11:47 jbond42: upload new wmf-laptop package
11:40 marostegui: Stop MySQL on db1134 to reimage it to buster T275343
11:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on dborch1001.wikimedia.org with reason: Restart for new kernel
11:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on dborch1001.wikimedia.org with reason: Restart for new kernel
11:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host otrs1001.eqiad.wmnet
11:22 moritzm: reset-failed ifup@ens5.service on otrs1001 T273026
11:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host otrs1001.eqiad.wmnet
11:15 moritzm: rebooting otrs1001 (ticket.wikimedia.org) for a kernel update
10:59 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1117-1118].eqiad.wmnet
10:57 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1117-1118].eqiad.wmnet
10:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE
10:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1117.eqiad.wmnet with reason: REIMAGE
10:40 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE
10:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1117.eqiad.wmnet with reason: REIMAGE
10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1088 (re)pooling @ 100%: After cloning db1168', diff saved to https://phabricator.wikimedia.org/P14481 and previous config saved to /var/cache/conftool/dbconfig/20210225-103719-root.json
10:34 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
10:32 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1088 (re)pooling @ 75%: After cloning db1168', diff saved to https://phabricator.wikimedia.org/P14480 and previous config saved to /var/cache/conftool/dbconfig/20210225-102215-root.json
10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1088 (re)pooling @ 50%: After cloning db1168', diff saved to https://phabricator.wikimedia.org/P14479 and previous config saved to /var/cache/conftool/dbconfig/20210225-100712-root.json
10:05 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2003.codfw.wmnet with reason: REIMAGE
10:03 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2004.codfw.wmnet with reason: REIMAGE
10:01 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2003.codfw.wmnet with reason: REIMAGE
10:01 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2004.codfw.wmnet with reason: REIMAGE
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1088 (re)pooling @ 25%: After cloning db1168', diff saved to https://phabricator.wikimedia.org/P14477 and previous config saved to /var/cache/conftool/dbconfig/20210225-095208-root.json
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1088 (re)pooling @ 10%: After cloning db1168', diff saved to https://phabricator.wikimedia.org/P14476 and previous config saved to /var/cache/conftool/dbconfig/20210225-093705-root.json
09:32 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE
09:32 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE
09:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1032.eqiad.wmnet
09:14 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1032.eqiad.wmnet
09:10 effie: upgrade memcached on mc1032, mc2032, mc2036
08:32 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:29 volans@cumin2001: START - Cookbook sre.dns.netbox
08:15 vgutierrez: restart ats-tls on cp5006 to enable parent proxies support - T274888
08:15 XioNoX: un-drain lumen eqiad-codfw link for BW testing
08:07 XioNoX: drain lumen eqiad-codfw link for BW testing
06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 to clone db1168 T258361', diff saved to https://phabricator.wikimedia.org/P14474 and previous config saved to /var/cache/conftool/dbconfig/20210225-065018-marostegui.json
06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 T275019', diff saved to https://phabricator.wikimedia.org/P14473 and previous config saved to /var/cache/conftool/dbconfig/20210225-063243-marostegui.json
00:29 ryankemper: T274204 Restored service health on `elastic106[0,4,5]` via `sudo apt-get remove --purge wmf-elasticsearch-search-plugins --yes && sudo dpkg -i /var/cache/apt/archives/wmf-elasticsearch-search-plugins_6.5.4-4~stretch_all.deb && sudo puppet agent -tv`. There's some sort of issue with `6.5.4-5~stretch` that we will need to circle back and investigate; for now the fleet is staying on `6.5.4-4~stretch`
00:05 ryankemper: T274204 `Ctrl+C`'d out of the current rolling-upgrade; the 3 hosts that have their elasticsearch systemd units in a failing state are running the latest plugin version, meaning the new version is likely the cause of the failures
00:01 mutante: mwlog1001 - temp disabling puppet to deploy gerrit::661200 - because this is a jessie
00:01 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=97)

2021-02-24

23:42 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
23:30 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=99)
23:18 ryankemper: T274204 `sudo -i cookbook sre.elasticsearch.rolling-upgrade search_eqiad "eqiad cluster restarts" --task-id T274204 --nodes-per-run 3`
23:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
23:17 ryankemper: T274204 Beginning rolling-upgrade of `eqiad` CirrusSearch cluster to upgrade to `wmf-elasticsearch-search-plugins/stretch-wikimedia 6.5.4-5~stretch`, see tmux session `elastic_rolling_upgrade` on `ryankemper@cumin1001`
23:13 eileen: civicrm revision is 5e042e6e57, config revision is 8572611a32
22:09 ryankemper: T265113 Unbanned `elastic1063` from both Elasticsearch clusters (`production-search-eqiad` and `production-search-omega-eqiad`)
22:03 Urbanecm: Deploy security patches for T275669
20:59 andrew@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
20:59 andrew@cumin1001: Added views for new wiki: mniwiki T273465
20:43 mstyles@deploy1001: Finished deploy [wikimedia/discovery/analytics@44fba51]: add import ttl dags - T270103 (duration: 02m 33s)
20:40 mstyles@deploy1001: Started deploy [wikimedia/discovery/analytics@44fba51]: add import ttl dags - T270103
20:36 andrew@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
20:35 andrew@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
20:35 andrew@cumin1001: Added views for new wiki: mniwiktionary T273459
20:16 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.32 refs T274936 (duration: 01m 10s)
20:15 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.32 refs T274936
20:12 andrew@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
19:52 mstyles@deploy1001: Finished deploy [wikimedia/discovery/analytics@44fba51]: airflow dags for importing ttl data (duration: 00m 42s)
19:51 mstyles@deploy1001: Started deploy [wikimedia/discovery/analytics@44fba51]: airflow dags for importing ttl data
19:32 andrew@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
19:21 andrew@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
19:14 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: f9f968a: Remove unneeded $wgHiddenPrefs[] = visualeditor-betatempdisable (T273188) (duration: 01m 04s)
19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: f21fc4a: Enable SecurePoll logging for votewiki, testwiki (T273990) (duration: 01m 08s)
17:40 bblack: authdns2001 - trial upgrade gdnsd to 3.6.0-1~wmf1
16:49 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
16:47 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
16:47 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
16:45 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
16:45 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
16:42 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
16:41 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
16:40 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
16:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ml-serve[1003-1004].eqiad.wmnet with reason: Reimaging failures due to broken partman recipe
16:15 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ml-serve[1003-1004].eqiad.wmnet with reason: Reimaging failures due to broken partman recipe
15:54 milimetric@deploy1001: Finished deploy [analytics/refinery@5908f27] (test): Train hotfix (duration: 00m 13s)
15:54 milimetric@deploy1001: Started deploy [analytics/refinery@5908f27] (test): Train hotfix
15:54 milimetric@deploy1001: Finished deploy [analytics/refinery@5908f27] (thin): Train hotfix (duration: 00m 06s)
15:54 milimetric@deploy1001: Started deploy [analytics/refinery@5908f27] (thin): Train hotfix
15:54 milimetric@deploy1001: Finished deploy [analytics/refinery@5908f27]: Train hotfix (duration: 11m 36s)
15:42 milimetric@deploy1001: Started deploy [analytics/refinery@5908f27]: Train hotfix
15:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate all WMDE Technical Wishes schemas to EventGate on all wikis (duration: 01m 05s)
15:21 milimetric@deploy1001: Finished deploy [analytics/refinery@bcb1a69] (test): Regular analytics weekly train TEST [analytics/refinery@bcb1a69] (duration: 00m 13s)
15:21 milimetric@deploy1001: Started deploy [analytics/refinery@bcb1a69] (test): Regular analytics weekly train TEST [analytics/refinery@bcb1a69]
15:21 milimetric@deploy1001: Finished deploy [analytics/refinery@bcb1a69] (thin): Regular analytics weekly train THIN [analytics/refinery@bcb1a69] (duration: 00m 06s)
15:21 milimetric@deploy1001: Started deploy [analytics/refinery@bcb1a69] (thin): Regular analytics weekly train THIN [analytics/refinery@bcb1a69]
15:21 milimetric@deploy1001: Finished deploy [analytics/refinery@bcb1a69]: Regular analytics weekly train [analytics/refinery@bcb1a69] (duration: 17m 10s)
15:06 godog: bounce icinga on alert1001 - reported high latency
15:06 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate HomepageVisit and ServerSideAccountCreation EL streams to all wikis - T267333 (duration: 01m 05s)
15:03 milimetric@deploy1001: Started deploy [analytics/refinery@bcb1a69]: Regular analytics weekly train [analytics/refinery@bcb1a69]
15:01 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve[1001-1004].eqiad.wmnet with reason: Reimaging for T272918
15:01 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve[1001-1004].eqiad.wmnet with reason: Reimaging for T272918
14:50 bblack: dns4002 - trial upgrade gdnsd to 3.6.0-1~wmf1
14:29 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2152.codfw.wmnet with reason: REIMAGE
14:29 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2152.codfw.wmnet with reason: REIMAGE
14:25 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2016.codfw.wmnet with reason: REIMAGE
14:25 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2016.codfw.wmnet with reason: REIMAGE
14:13 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2151.codfw.wmnet with reason: REIMAGE
14:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2151.codfw.wmnet with reason: REIMAGE
14:05 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2150.codfw.wmnet with reason: REIMAGE
14:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: REIMAGE
13:46 marostegui: Compare data between db1134 and db1163 T275343
13:34 moritzm: restarting FPM/mcrouter on mw canaries to pick up openssl updates
13:11 moritzm: installing openssl security updates on buster
12:32 Urbanecm: Two undeployed patches were reverted to unbreak deployments (666340, 666341), cc marxarelli
12:25 phuedx@deploy1001: Synchronized php-1.36.0-wmf.32/extensions/WikimediaEvents: Backport: Fix dynamically loaded instruments (duration: 01m 11s)
12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P14465 and previous config saved to /var/cache/conftool/dbconfig/20210224-122043-root.json
12:18 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
12:17 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
12:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
12:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
12:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
12:06 hnowlan: restarting mtail on A:mw-api or A:parsoid or A:mw-jobrunner or A:mw
12:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P14464 and previous config saved to /var/cache/conftool/dbconfig/20210224-120538-root.json
11:54 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
11:53 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
11:51 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P14463 and previous config saved to /var/cache/conftool/dbconfig/20210224-115034-root.json
11:45 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
11:44 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
11:42 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
11:39 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P14462 and previous config saved to /var/cache/conftool/dbconfig/20210224-113531-root.json
11:33 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
11:32 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
11:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
11:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
11:24 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
11:23 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
11:22 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
11:22 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P14461 and previous config saved to /var/cache/conftool/dbconfig/20210224-112027-root.json
11:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
11:15 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
11:14 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P14460 and previous config saved to /var/cache/conftool/dbconfig/20210224-111301-marostegui.json
11:12 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
11:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: serer issue
11:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: serer issue
10:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P14459 and previous config saved to /var/cache/conftool/dbconfig/20210224-105204-root.json
10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P14458 and previous config saved to /var/cache/conftool/dbconfig/20210224-103700-root.json
10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P14457 and previous config saved to /var/cache/conftool/dbconfig/20210224-102157-root.json
10:20 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
10:19 moritzm: installing gnutls28 bugfix updates from Buster 10.8 point release
10:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
10:13 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
10:10 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P14456 and previous config saved to /var/cache/conftool/dbconfig/20210224-100653-root.json
10:04 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
10:02 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
09:56 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P14455 and previous config saved to /var/cache/conftool/dbconfig/20210224-095150-root.json
09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P14454 and previous config saved to /var/cache/conftool/dbconfig/20210224-094523-marostegui.json
09:34 marostegui: Update pc2007, pc2010, db2071
09:31 marostegui: Update db1077
09:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1033.eqiad.wmnet
09:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1033.eqiad.wmnet
09:19 effie: upgrade memcached on mc1033, mc2033
09:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1002.wikimedia.org with reason: REIMAGE
09:06 volans: run "sudo find . -user root -exec chown netbox. '{}' \;" in /srv/deployment/netbox/deploy-cache/revs on netbox* hosts to prevent scap failures on cleanup - T265084
09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1002.wikimedia.org with reason: REIMAGE
09:01 elukey: roll restart druid brokers on druid public
08:58 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
08:53 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
08:52 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
08:52 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
08:50 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
08:50 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
08:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
08:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
08:35 moritzm: reimaging bast1002 to Buster
08:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
08:32 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
08:30 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
08:26 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
08:04 jynus: restarting db2101, db2139, db2141 T271913
07:56 moritzm: installing remaining openldap updates for buster
06:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1090.eqiad.wmnet
06:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1090.eqiad.wmnet
04:10 ryankemper: T267927 [WDQS Data Reload] Running `/srv/deployment/wdqs/wdqs/loadData.sh -n wdq -d /srv/wdqs/munged/ -s 864` on `ryankemper@wdqs2008` tmux session `data_reload`
04:04 ryankemper: [WDQS] Depooled `wdqs2008`
03:16 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2149.codfw.wmnet with reason: REIMAGE
03:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: REIMAGE
03:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2148.codfw.wmnet with reason: REIMAGE
03:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2148.codfw.wmnet with reason: REIMAGE
02:58 ryankemper: [WDQS Data Reload] Restarting reload on test node `wdqs1009` from where it last left off: `/srv/deployment/wdqs/wdqs/loadData.sh -n wdq -d /srv/wdqs/munged/ -s 947`
02:57 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
02:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2147.codfw.wmnet with reason: REIMAGE
02:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2147.codfw.wmnet with reason: REIMAGE
02:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2146.codfw.wmnet with reason: REIMAGE
02:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2146.codfw.wmnet with reason: REIMAGE
02:30 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
02:29 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
02:29 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
02:27 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 (duration: 06m 24s)
02:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@25549e7]: ores_bulk_ingest: use backoffs starting at 30sec (duration: 01m 37s)
02:22 gehel@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
02:22 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@25549e7]: ores_bulk_ingest: use backoffs starting at 30sec
02:20 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b5fc9d5]: 0.3.64
02:18 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 (duration: 11m 22s)
02:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE
02:07 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.64` on canary `wdqs1003`; proceeding to rest of fleet
02:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE
02:06 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b5fc9d5]: 0.3.64
02:06 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.64`. Pre-deploy tests passing on canary `wdqs1003`
00:58 volker-e@deploy1001: Finished deploy [design/style-guide@a66b5b6]: Deploy design/style-guide: a66b5b6 “Components”: Add “Dialogs” (#430) (duration: 00m 06s)
00:58 volker-e@deploy1001: Started deploy [design/style-guide@a66b5b6]: Deploy design/style-guide: a66b5b6 “Components”: Add “Dialogs” (#430)
00:47 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4ee50e3]: ores_bulk_ingest: more retry on error (duration: 01m 37s)
00:45 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4ee50e3]: ores_bulk_ingest: more retry on error
00:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE
00:02 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2145.codfw.wmnet with reason: REIMAGE

2021-02-23

22:52 chaomodus: Netbox 2.10 upgrade complete T265084
22:28 crusnov@deploy1001: Finished deploy [netbox/deploy@dabbf5e]: Deploying Netbox 2.10.4-wmf to production T265084 (duration: 06m 11s)
22:25 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
22:25 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
22:23 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
22:23 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
22:22 crusnov@deploy1001: Started deploy [netbox/deploy@dabbf5e]: Deploying Netbox 2.10.4-wmf to production T265084
22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
22:17 chaomodus: deploying Netbox 2.10 to production and associated work
21:48 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix typos in wgEventLoggingSchemas (duration: 01m 05s)
21:38 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.32 refs T274936
21:36 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@1344853]: apply spark env_vars to executors too (duration: 01m 46s)
21:34 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1344853]: apply spark env_vars to executors too
21:28 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.32 refs T274936 (duration: 36m 52s)
21:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
21:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
21:00 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@46a8ae1]: ores_bulk_ingest: namespace is not plural (duration: 01m 41s)
21:00 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
21:00 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
20:58 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@46a8ae1]: ores_bulk_ingest: namespace is not plural
20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab1002.eqiad.wmnet
20:52 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.32 refs T274936
20:44 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: No-op: math enable talking to mathoid directly in labs, T274436 (duration: 00m 57s)
20:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix typo in visualeditortemplatedialoguse - T275015 (duration: 01m 01s)
20:13 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo cluster: Reboot kafka nodes - razzi@cumin1001
20:04 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab1002.eqiad.wmnet
19:54 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:43 ryankemper: [WDQS Deploy] Disk space low on `wdqs1009`, rolling back so that can be addressed
19:43 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b5fc9d5]: 0.3.64 (duration: 08m 01s)
19:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Declare WMDE Technical Wishes streams and migrate to EventGate on testwiki (duration: 02m 41s)
19:36 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.64` on canary `wdqs1003`; proceeding to rest of fleet
19:35 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b5fc9d5]: 0.3.64
19:35 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.64`. Pre-deploy tests passing on canary `wdqs1003`
19:33 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab1001.eqiad.wmnet
19:32 legoktm: re-enabling puppet on registry*
19:30 legoktm: pushed new wikimedia-buster image
19:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3969cae]: new dag ores_bulk_ingest (duration: 01m 32s)
19:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3969cae]: new dag ores_bulk_ingest
19:10 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
19:08 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
19:08 legoktm: disabling puppet on registry* except registry2001 while rolling out https://gerrit.wikimedia.org/r/664683
19:04 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
18:41 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab1001.eqiad.wmnet
18:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@c2190da]: environment and venv builder for ores_bulk_ingest (duration: 01m 40s)
18:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c2190da]: environment and venv builder for ores_bulk_ingest
18:15 ebernhardson@deploy1001: deploy aborted: environment and venv builder for ores_bulk_ingest (duration: 00m 16s)
18:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c2190da]: environment and venv builder for ores_bulk_ingest
18:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:07 pt1979@cumin2001: START - Cookbook sre.dns.netbox
17:29 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
17:29 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
17:22 longma: wmf/1.36.0-wmf.32 was branched at 03c382f for T274936
17:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1034.eqiad.wmnet
17:18 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
17:18 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
17:17 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1034.eqiad.wmnet
17:16 effie: upgrade memcached on mc1034, mc2034 - T270315
17:01 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
17:01 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
16:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
16:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
16:55 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
16:55 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
16:48 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEvents: Enable session tick instrument on all wikis (T274172) (duration: 00m 58s)
16:46 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
16:46 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
16:42 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
16:42 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
16:25 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka jumbo cluster: Reboot kafka nodes - razzi@cumin1001
16:02 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Declare TranslationRecommendation event streams - T271163 (duration: 00m 58s)
15:52 jynus: previous message should say 15:38 T267338
15:51 jynus: started swift codfw backup stress test at 14:38 with 10 threads T267338
15:44 elukey: reboot an-launcher1002 for kernel updates
15:35 moritzm: restarting PHP/Apache on mw canaries for gnutls update
15:23 moritzm: installing gnutls28 bugfix updates from Buster 10.8 point release
15:17 elukey: deploy a new term to the analytics-in4 filter on cr1/cr2-eqiad (see https://gerrit.wikimedia.org/r/c/operations/homer/public/+/665814)
14:55 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wgEventLoggingSchemas overrides for QuickSurvey and NavigationTiming (duration: 00m 56s)
14:51 elukey: drop /srv/backup-1007 on stat1008 to free space
14:41 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SpecialMuteSubmit to EventGate on all wikis - T268517 (duration: 00m 58s)
14:40 otto@deploy1001: sync-file aborted: Migrate SpecialMuteSubmit to EventGate on all wikis - T268517 (duration: 00m 05s)
14:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE
14:15 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: REIMAGE
14:07 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
14:05 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
14:05 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
14:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
14:02 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
14:00 moritzm: restarting PHP/Apache on mw canaries for openldap update
13:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
13:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
13:54 moritzm: installing openldap security updates on buster (just client-side tools/libs, all slapd instance already fixed)
13:54 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
13:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
13:49 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
12:54 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
12:54 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
12:48 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
12:48 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
12:44 kharlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
12:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/ContentTranslation/app/: ee77c4a: bump ContentTranslation (T275385) (duration: 00m 59s)
12:37 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
12:36 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
12:35 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
12:34 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
12:32 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
12:32 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
12:31 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
12:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 8b7ca4c: thwikisource: Add NS 102 and NS 114 as content namespace (T275282) (duration: 00m 56s)
12:30 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
12:29 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
12:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
12:26 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
12:19 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
12:17 jayme: running puppet on deploy1001
12:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add sources to specialSiteLinkGroups Wikibase setting (T138332) (duration: 01m 00s)
11:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1035.eqiad.wmnet
11:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1035.eqiad.wmnet
11:18 effie: upgrade memcached on mc1035, mc2035 - T270315
10:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbmonitor2001.wikimedia.org
09:58 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbmonitor2001.wikimedia.org
09:45 vgutierrez: reload nginx on cloudelastic100[56]
09:44 moritzm: installing screen security updates on stretch
09:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 6 hosts with reason: Restart mariadb to pick up config changes T266913
09:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on 6 hosts with reason: Restart mariadb to pick up config changes T266913
09:35 moritzm: installing bind security updates on buster (client-side tools/libs)
09:10 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
09:10 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
09:06 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
08:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1001.eqiad.wmnet
08:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
08:40 Urbanecm: [urbanecm@mwmaint1002 ~/altwiki]$ mwscript namespaceDupes.php altwiki --fix
08:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9f434e2: Add ВП as an alias for NS_PROJECT in altwiki (T271980) (duration: 00m 59s)
08:39 Urbanecm: Run mwscript updateSpecialPages.php --wiki=altwiki
08:02 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
07:56 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
07:56 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
07:13 hashar: Restarting CI Jenkins for plugin upgrade # T271683
05:13 krinkle@deploy1001: Finished deploy [integration/docroot@44d5685]: I307e8f4f6979 (duration: 00m 06s)
05:13 krinkle@deploy1001: Started deploy [integration/docroot@44d5685]: I307e8f4f6979
00:46 eileen: civicrm revision changed from c535ac603a to 5e042e6e57, config revision is ef64f705bb

2021-02-22

23:59 mutante: logstash2031 - systemctl reset-failed
23:53 mutante: stat1007 - same problem and alerts as stat1004
23:52 mutante: stat1004 - systemctl reset-failed to clear icinga alerts for systemd state caused by jupyterhub singleuser services
23:47 dpifke@deploy1001: Finished deploy [performance/arc-lamp@1f3bce1]: Revert https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/664600 (duration: 00m 05s)
23:47 dpifke@deploy1001: Started deploy [performance/arc-lamp@1f3bce1]: Revert https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/664600
23:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1286.eqiad.wmnet
23:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1286.eqiad.wmnet
23:34 milimetric@deploy1001: Finished deploy [analytics/refinery@3de01b5] (thin): Fix camus (duration: 00m 07s)
23:34 milimetric@deploy1001: Started deploy [analytics/refinery@3de01b5] (thin): Fix camus
23:33 milimetric@deploy1001: Finished deploy [analytics/refinery@3de01b5]: Fix camus (duration: 14m 03s)
23:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1014.eqiad.wmnet with reason: REIMAGE
23:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1014.eqiad.wmnet with reason: REIMAGE
23:22 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
23:22 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
23:19 milimetric@deploy1001: Started deploy [analytics/refinery@3de01b5]: Fix camus
23:18 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
23:18 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
23:09 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
23:09 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1410.eqiad.wmnet
23:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1412.eqiad.wmnet
23:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1412.eqiad.wmnet
23:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1410.eqiad.wmnet
22:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1286.eqiad.wmnet with reason: REIMAGE
22:50 legoktm: disabling puppet on mwdebug1001 to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/664903
22:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1286.eqiad.wmnet with reason: REIMAGE
22:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE
22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE
22:42 krinkle@deploy1001: Synchronized w/fatal-error.php: df694d695 (duration: 00m 56s)
22:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE
22:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE
22:31 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
22:31 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
22:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1279.eqiad.wmnet
22:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1312.eqiad.wmnet
22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1314.eqiad.wmnet
21:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1314.eqiad.wmnet
21:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1279.eqiad.wmnet
21:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1312.eqiad.wmnet
21:00 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T273463 T271985 T273468)
20:59 sbassett: Deployed security patch for T274883
20:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1279.eqiad.wmnet with reason: REIMAGE
20:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1279.eqiad.wmnet with reason: REIMAGE
20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1312.eqiad.wmnet with reason: REIMAGE
20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1312.eqiad.wmnet with reason: REIMAGE
20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1314.eqiad.wmnet with reason: REIMAGE
20:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1314.eqiad.wmnet with reason: REIMAGE
20:39 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T273463 T271985 T273468)
20:29 mutante: mw1279 (canary) - reimaging to buster
20:29 mutante: mw1279 (canary) - reimaging to stretch
20:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1349.eqiad.wmnet
20:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1349.eqiad.wmnet
20:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1316.eqiad.wmnet
20:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1316.eqiad.wmnet
20:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1315.eqiad.wmnet
20:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1315.eqiad.wmnet
20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1349.eqiad.wmnet with reason: REIMAGE
20:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1349.eqiad.wmnet with reason: REIMAGE
19:36 urbanecm@deploy1001: Synchronized wmf-config/config/rowiki.yaml: fc7b071: Enable GrowthExperiments on rowiki (T275130; 3/3) (duration: 00m 55s)
19:35 urbanecm@deploy1001: Synchronized dblists/growthexperiments.dblist: fc7b071: Enable GrowthExperiments on rowiki (T275130; 2/3) (duration: 00m 55s)
19:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fc7b071: Enable GrowthExperiments on rowiki (T275130; 1/3) (duration: 00m 55s)
19:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1315.eqiad.wmnet with reason: REIMAGE
19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1315.eqiad.wmnet with reason: REIMAGE
19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1316.eqiad.wmnet with reason: REIMAGE
19:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1316.eqiad.wmnet with reason: REIMAGE
19:08 urbanecm@deploy1001: Synchronized dblists/growthexperiments.dblist: 902b685: Enable GrowthExperiments on thwiki (T274646) (duration: 00m 54s)
19:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 902b685: Enable GrowthExperiments on thwiki (T274646) (duration: 00m 56s)
17:18 ppchelko@deploy1001: Finished deploy [restbase/deploy@c5c4b2d] (dev-cluster): remove graphoid (duration: 03m 09s)
17:15 ppchelko@deploy1001: Started deploy [restbase/deploy@c5c4b2d] (dev-cluster): remove graphoid
16:51 Urbanecm: Run scap pull on mwmaint1002 to clear any local changes
16:50 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 22s)
16:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating mniwiktionary (T273457) (duration: 00m 56s)
16:46 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating mniwiktionary (T273457)
16:45 urbanecm@deploy1001: Synchronized dblists: Creating mniwiktionary (T273457) (duration: 00m 56s)
16:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
16:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
16:44 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating mniwiktionary (T273457) (duration: 00m 56s)
16:42 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating mniwiktionary (T273457) (duration: 00m 55s)
16:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
16:36 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
16:26 dpifke@deploy1001: Finished deploy [performance/arc-lamp@1f3bce1]: Deploy ArcLamp fixes for T273565 and T273640 (duration: 00m 05s)
16:26 dpifke@deploy1001: Started deploy [performance/arc-lamp@1f3bce1]: Deploy ArcLamp fixes for T273565 and T273640
16:19 urbanecm@deploy1001: Synchronized langlist: Creating mniwiki (T273456) (duration: 00m 54s)
16:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating mniwiki (T273456) (duration: 00m 56s)
16:17 urbanecm@deploy1001: Synchronized wmf-config/logos.php: Creating mniwiki (T273456) (duration: 00m 56s)
16:15 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating mniwiki (T273456) (duration: 00m 55s)
16:14 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating mniwiki (T273456)
16:13 urbanecm@deploy1001: Synchronized dblists: Creating mniwiki (T273456) (duration: 00m 57s)
16:12 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating mniwiki (T273456) (duration: 00m 55s)
16:11 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating mniwiki (T273456) (duration: 00m 56s)
16:08 urbanecm@deploy1001: Synchronized langlist: Creating altwiki (T271980) (duration: 00m 55s)
16:03 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating altwiki (T271980) (duration: 00m 55s)
16:02 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating altwiki (T271980)
16:00 urbanecm@deploy1001: Synchronized dblists: Creating altwiki (T271980) (duration: 00m 54s)
15:59 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating altwiki (T271980) (duration: 00m 59s)
15:57 Urbanecm: Temporarily replace /srv/mediawiki/php-1.36.0-wmf.31/extensions/WikimediaMaintenance/addWiki.php with /home/urbanecm/addWiki.php at mwmaint1002 to unbreak addWiki.php
15:53 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
15:43 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating altwiki (T271980) (duration: 00m 56s)
15:32 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
14:16 herron: roll restarting kafkamon hosts for updates
13:57 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
13:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4001.ulsfo.wmnet
13:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/ContentTranslation/app/: f9e823e: CX3 Build 0.1.0+20210216 (fixes missing bits in T271397) (duration: 00m 55s)
13:40 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3001.esams.wmnet
13:37 moritzm: installing openldap security updates on corp replicas
13:36 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/FlaggedRevs/extension.json: a4cd98e: Grant sysops review and unreviewed pages right by default (apparently i forgot to rebase the first time, resync; T275293) (duration: 00m 57s)
13:32 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus4001.ulsfo.wmnet
13:31 godog: reset-failed ifup@ens14 on prometheus3001 - T273026
13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
13:29 akosiaris: repool sessionstore in eqiad after sessionstore certificate refresh. T274564
13:29 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
13:27 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus3001.esams.wmnet
13:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
13:17 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
13:16 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
13:16 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 27 hosts with reason: Restarting cloudcanary instances
13:16 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 27 hosts with reason: Restarting cloudcanary instances
13:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Slowly repool db1175', diff saved to https://phabricator.wikimedia.org/P14439 and previous config saved to /var/cache/conftool/dbconfig/20210222-131153-root.json
12:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Slowly repool db1175', diff saved to https://phabricator.wikimedia.org/P14438 and previous config saved to /var/cache/conftool/dbconfig/20210222-125650-root.json
12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Slowly repool db1175', diff saved to https://phabricator.wikimedia.org/P14437 and previous config saved to /var/cache/conftool/dbconfig/20210222-124146-root.json
12:32 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=sessionstore
12:28 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=sessionstore
12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Slowly repool db1175', diff saved to https://phabricator.wikimedia.org/P14436 and previous config saved to /var/cache/conftool/dbconfig/20210222-122643-root.json
12:24 urbanecm@deploy1001: Synchronized wmf-config//throttle.php: d806f3a: Add a throttle rule for for edit-a-thon (T275237) (duration: 00m 54s)
12:22 akosiaris: depool sessionstore in eqiad for sessionstore certificate refresh. T274564
12:21 akosiaris: repool sessionstore in codfw after sessionstore certificate refresh. T274564
12:21 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=sessionstore
12:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/FlaggedRevs/extension.json: a4cd98e: Grant sysops review and unreviewed pages right by default (T275293) (duration: 00m 55s)
12:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7bd26dc: Add inaturalist-open-data.s3.amazonaws.com to copyupload list (T275318) (duration: 00m 56s)
12:15 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 391900b: ukwikivoyage: Enable block AbuseFilter action (T275271) (duration: 00m 55s)
12:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a1f8ce4: Enable Section Translation on Bengali Wikipedia (T271397) (duration: 00m 56s)
12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: Slowly repool db1175', diff saved to https://phabricator.wikimedia.org/P14435 and previous config saved to /var/cache/conftool/dbconfig/20210222-121139-root.json
12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P14434 and previous config saved to /var/cache/conftool/dbconfig/20210222-120717-marostegui.json
12:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 4775fb6: Adjust CX MT threshold to 90 for Vietnamese Wikipedia (T275121) (duration: 00m 57s)
12:02 moritzm: installing openldap security updates on serpens/seaborgium
11:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1036.eqiad.wmnet
11:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1036.eqiad.wmnet
11:53 effie: upgrading memecached to 1.6 on mc1036
11:50 volans: upgrading python3-wmflib fleet wide to 0.0.7-1+deb10u1
11:27 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 27 hosts with reason: Restarting cloudcanary instances
11:27 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 27 hosts with reason: Restarting cloudcanary instances
11:26 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
11:26 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
11:26 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
11:26 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt-wdqs[1001-1003].eqiad.wmnet with reason: Restarting cloudcanary instances
11:22 godog: roll restart prometheus on cloudmetrics*
11:21 godog: roll restart prometheus on prometheus*
11:12 godog: restart prometheus on prometheus2004 to apply changes - T273278
11:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Slowly repool db1166', diff saved to https://phabricator.wikimedia.org/P14433 and previous config saved to /var/cache/conftool/dbconfig/20210222-111032-root.json
10:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Slowly repool db1166', diff saved to https://phabricator.wikimedia.org/P14432 and previous config saved to /var/cache/conftool/dbconfig/20210222-105528-root.json
10:49 _joe_: removing stray old builds from compiler1003
10:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Slowly repool db1166', diff saved to https://phabricator.wikimedia.org/P14431 and previous config saved to /var/cache/conftool/dbconfig/20210222-104025-root.json
10:36 _joe_: manually removed the restbase-http ipvs entry from the load balancers
10:30 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=sessionstore
10:29 akosiaris: depool sessionstore in codfw for sessionstore certificate refresh. T274564
10:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Slowly repool db1166', diff saved to https://phabricator.wikimedia.org/P14430 and previous config saved to /var/cache/conftool/dbconfig/20210222-102521-root.json
10:16 _joe_: restarting pybal on lvs1015 to pick up restbase http removal
10:12 _joe_: restarting pybal on lvs1016 to pick up restbase http removal
10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Slowly repool db1166', diff saved to https://phabricator.wikimedia.org/P14429 and previous config saved to /var/cache/conftool/dbconfig/20210222-101018-root.json
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P14428 and previous config saved to /var/cache/conftool/dbconfig/20210222-100653-marostegui.json
09:51 _joe_: restarting low-traffic pybals in codfw to remove the restbase http endpoint
09:35 marostegui: Deploy schema change on s3 codfw master, there will be lag on s3 codfw - T273359
09:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
09:20 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
09:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1005.eqiad.wmnet
09:04 moritzm: installing screen security updates on Buster
09:00 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1005.eqiad.wmnet
08:40 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
08:39 gehel: depool elastic2045 and ban from clsuters - T275345
08:12 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: cea41a2: fiwiki: Assign stablesettings to reviewers in IS.php rather than FR-specific file (T275017; 2/2) (duration: 00m 55s)
08:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cea41a2: fiwiki: Assign stablesettings to reviewers in IS.php rather than FR-specific file (T275017; 1/2) (duration: 01m 08s)
07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1090* from dbctl T274333', diff saved to https://phabricator.wikimedia.org/P14426 and previous config saved to /var/cache/conftool/dbconfig/20210222-075437-marostegui.json
07:38 moritzm: installing openldap security updates on LDAP replicas
07:29 hashar: Restarting CI Jenkins to downgrade plugin # T271683
07:14 hashar: Restarting CI Jenkins for plugin upgrade # T271683
07:11 elukey: powercycle elastic2045 - com2 available, no ssh, no root login (hangs indefinitely), no prometheus metrics reported

2021-02-21

16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 - crashed', diff saved to https://phabricator.wikimedia.org/P14424 and previous config saved to /var/cache/conftool/dbconfig/20210221-160258-marostegui.json
10:07 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1008.eqiad.wmnet with reason: REIMAGE
10:05 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1008.eqiad.wmnet with reason: REIMAGE
09:32 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1008.eqiad.wmnet with reason: REIMAGE
09:30 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1008.eqiad.wmnet with reason: REIMAGE
09:29 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
09:23 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet

2021-02-20

00:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1317.eqiad.wmnet
00:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1317.eqiad.wmnet
00:15 ebernhardson: start batch processing images through MachineVision fetchSuggestions.php for T274220 on mwmaint1002
00:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1333.eqiad.wmnet
00:13 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1333.eqiad.wmnet
00:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1339.eqiad.wmnet
00:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1342.eqiad.wmnet
00:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1342.eqiad.wmnet

2021-02-19

23:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1339.eqiad.wmnet
23:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1317.eqiad.wmnet with reason: REIMAGE
23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1317.eqiad.wmnet with reason: REIMAGE
22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1342.eqiad.wmnet with reason: REIMAGE
22:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1342.eqiad.wmnet with reason: REIMAGE
22:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1333.eqiad.wmnet with reason: REIMAGE
22:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1333.eqiad.wmnet with reason: REIMAGE
22:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
22:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1339.eqiad.wmnet with reason: REIMAGE
22:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1339.eqiad.wmnet with reason: REIMAGE
22:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1340.eqiad.wmnet
22:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1320.eqiad.wmnet
22:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1320.eqiad.wmnet
22:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1262.eqiad.wmnet
22:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1261.eqiad.wmnet
22:09 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1261.eqiad.wmnet
22:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
21:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
21:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
21:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1340.eqiad.wmnet with reason: REIMAGE
21:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1340.eqiad.wmnet with reason: REIMAGE
21:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1320.eqiad.wmnet with reason: REIMAGE
21:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1262.eqiad.wmnet with reason: REIMAGE
21:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1320.eqiad.wmnet with reason: REIMAGE
21:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2336.codfw.wmnet with reason: REIMAGE
21:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1262.eqiad.wmnet with reason: REIMAGE
21:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2336.codfw.wmnet with reason: REIMAGE
20:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1287.eqiad.wmnet
20:57 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1287.eqiad.wmnet
20:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2257.codfw.wmnet
20:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2257.codfw.cwmnet
20:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1270.eqiad.wmnet
20:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1261.eqiad.wmnet
20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1270.eqiad.wmnet
20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1261.eqiad.wmnet
20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1287.eqiad.wmnet
20:33 mutante: mw1261, mw1270 - scap pull
20:33 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin 'mw1261*,mw1270*,mw1287*' 'depool'
20:32 mutante: mw1287 - scap pull
20:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2257.codfw.wmnet
20:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1270.eqiad.wmnet
20:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
20:15 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.29 (duration: 01m 42s)
20:06 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.28 (duration: 01m 50s)
20:04 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.27 (duration: 02m 12s)
20:01 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.26 (duration: 02m 12s)
19:57 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.25 (duration: 04m 09s)
19:48 marxarelli: 1.36.0-wmf.31 re-rolled to all wikis (T271345)
19:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1287.eqiad.wmnet with reason: REIMAGE
19:22 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1129.eqiad.wmnet with reason: REIMAGE
19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1270.eqiad.wmnet with reason: REIMAGE
19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1287.eqiad.wmnet with reason: REIMAGE
19:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1261.eqiad.wmnet with reason: REIMAGE
19:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1270.eqiad.wmnet with reason: REIMAGE
19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1261.eqiad.wmnet with reason: REIMAGE
19:11 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.31
19:01 dduvall@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/Echo/includes/model/Event.php: backport: Echo::create: Convert UserIdentityValue to plain User (T275161) (duration: 01m 20s)
18:52 marxarelli: fetching backport https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Echo/+/665177 for sync prior to all wikis (re)deploy (T275161)
18:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1367.eqiad.wmnet
18:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1341.eqiad.wmnet
18:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1367.eqiad.wmnet
18:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2272.codfw.wmnet
18:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1341.eqiad.wmnet
18:30 mutante: mw1367 - powercycled - stuck in reboot
18:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2272.codfw.wmnet
18:07 Urbanecm: Password reset for User:Kolyma (T274737)
17:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1341.eqiad.wmnet with reason: REIMAGE
17:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1341.eqiad.wmnet with reason: REIMAGE
17:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2272.codfw.wmnet with reason: REIMAGE
17:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2272.codfw.wmnet with reason: REIMAGE
17:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1367.eqiad.wmnet with reason: REIMAGE
17:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1367.eqiad.wmnet with reason: REIMAGE
16:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1141.eqiad.wmnet with reason: REIMAGE
16:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1140.eqiad.wmnet with reason: REIMAGE
16:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1141.eqiad.wmnet with reason: REIMAGE
16:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1134.eqiad.wmnet with reason: REIMAGE
16:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1140.eqiad.wmnet with reason: REIMAGE
16:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1134.eqiad.wmnet with reason: REIMAGE
14:29 mbsantos@deploy1001: Finished deploy [tilerator/deploy@937deb5]: (no justification provided) (duration: 00m 15s)
14:28 mbsantos@deploy1001: Started deploy [tilerator/deploy@937deb5]: (no justification provided)
14:00 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
14:00 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
13:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'staging' .
13:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
13:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
13:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'staging' .
13:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
13:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
13:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
13:43 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
13:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' .
13:42 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' .
13:41 godog: reset-failed ifup@ens13 on prometheus5001 - T273026
13:39 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5001.eqsin.wmnet
13:31 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
13:29 gehel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
13:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus5001.eqsin.wmnet
09:27 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop backup cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
09:16 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop backup cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
08:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1001.eqiad.wmnet
08:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1001.eqiad.wmnet
08:06 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
08:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1108.eqiad.wmnet
07:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1108.eqiad.wmnet
02:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1133.eqiad.wmnet with reason: REIMAGE
02:24 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1133.eqiad.wmnet with reason: REIMAGE
01:22 mutante: mwmaint2001 back on buster and back in scap dsh groups (if anything pops up you can revert 665175)
01:19 mutante: deleting my huge build from puppet-compiler that failed because it made the compiler instance run out of disk to run on *
01:03 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.30/includes/ProtectionForm.php: d305308: field descriptors in HTMLForm must have keys (T275018; T274980) (duration: 01m 08s)
01:02 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/includes/ProtectionForm.php: 2487c25: field descriptors in HTMLForm must have keys (T275018; T274980) (duration: 01m 10s)
00:54 mutante: mwmaint2001 - back from reimage - scap pull
00:26 urbanecm@deploy1001: Synchronized static/images/project-logos/wikimedia-cloud-services.svg: 686acba: Restore logos on Vector (classic version) and use cloud icon for labs (T274210) (duration: 01m 07s)
00:14 dpifke@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: Deploying excimer-wall profiler pipeline T253160 (duration: 01m 03s)
00:12 dpifke@deploy1001: Synchronized wmf-config/profiler.php: Deploying excimer-wall profiler pipeline T253160 (duration: 01m 02s)

2021-02-18

23:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint2001.codfw.wmnet with reason: REIMAGE
23:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2001.codfw.wmnet with reason: REIMAGE
23:26 dancy@deploy1001: Synchronized wmf-config/: Syncing https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/634552 (duration: 01m 07s)
23:22 dancy@deploy1001: Synchronized wmf-config/CommonSettings.php: Syncing https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/634551 (duration: 01m 08s)
23:15 dancy@deploy1001: Synchronized src/ServiceConfig.php: (no justification provided) (duration: 03m 21s)
23:11 mutante: mwmaint2001 - will be rebooted for OS upgrade - T267607
23:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mwmaint2001.codfw.wmnet with reason: OS upgrade
23:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mwmaint2001.codfw.wmnet with reason: OS upgrade
23:04 mutante: mwmaint1002 - rsyncing data from mwmaint2001
22:30 mutante: mwmaint2001 - tar-gzipping a lot of old user home data I keep finding, partially museum worthy from several maintenance hosts ago, like places like /root/home-mwmaint1001/username/home-terbium/iron/ :p
21:29 marxarelli: 1.36.0-wmf.31 rolled back due to T275161 and new logspam (T271345)
21:26 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert "all wikis to 1.36.0-wmf.31"
20:09 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.31
19:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: f33f9f7: Make DiscussionTools replytool available for everyone on gomwiktionary (T258554) (duration: 01m 05s)
19:25 mutante: mwmaint2001 - deleting 'home-terbium' from all home directories (yes, it's in Bacula if you really used that, hope you didn't, it's been years since terbium)
19:25 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: da7b812: Enable DiscussionTools beta feature for newtopictool on arwiki, cswiki, huwiki (T273145) (duration: 01m 12s)
19:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/DiscussionTools/: 1cc29df: 6b88aff: DiscussionTools backports (T272666; T274949) (duration: 01m 08s)
19:19 urbanecm@deploy1001: sync-file aborted: 1cc29df DiscussionTools backports (T272666; T274949) (duration: 00m 00s)
19:17 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/DiscussionTools/: 9c6cdf5: 97acef6: DiscussionTools backports (T272666; T274949) (duration: 01m 26s)
19:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint2001.codfw.wmnet with reason: OS upgrade
19:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2001.codfw.wmnet with reason: OS upgrade
16:51 volans: uploaded python3-wmflib_0.0.7 to apt.wikimedia.org buster-wikimedia
16:23 shdubsh: restart ircecho on kraz -- deploying new metrics endpoint T216611
16:05 moritzm: installing libmaxminddb updates from buster 10.8 point release
15:33 _joe_: rebuilding base images for stretch,buster
15:30 moritzm: installing PHP 7.3 security updates on buster
15:06 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837
14:35 moritzm: installing libzstd security updates on Buster
13:59 moritzm: installing intel-microcode security updates on buster
13:49 jynus: restart db1150 T271913
12:20 jynus: restart db1140 T271913
12:01 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/includes/HookContainer/DeprecatedHooks.php: 28aa871: Silent deprecate ProtectionForm::buildForm (T274889) (duration: 01m 14s)
11:49 jynus: restart db1102 T271913
11:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1009 (duration: 01m 09s)
11:04 marostegui: Upgrade and reboot pc1009
11:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1009 (duration: 01m 08s)
10:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 33ab68f: Add https://seer.ufrgs.br to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T270962) (duration: 01m 09s)
10:45 urbanecm@deploy1001: Synchronized static/images: d1db300: Revert "Temporarily add cswiki-black-ribbon.png as a static resource" (duration: 01m 09s)
10:42 jynus: restarting dbprov* hosts T271913
10:34 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1001.eqiad.wmnet
10:30 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch restbase calls to envoy (duration: 01m 15s)
10:27 dcaro@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1001.eqiad.wmnet
09:48 jynus: restarting backup* hosts T271913
09:46 elukey: upgrade presto to 0.246-wmf on an-coord1001, an-presto*, stat100x
08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 T274333', diff saved to https://phabricator.wikimedia.org/P14408 and previous config saved to /var/cache/conftool/dbconfig/20210218-084758-marostegui.json
08:31 marostegui: Upgrade kernel on db1154 and db1155 (sanitarium running buster hosts)
08:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE
08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: REIMAGE
08:01 godog: upgrade grafana* to 7.4.2 - T263747
07:59 marostegui: Reboot es2029, es2030, es2031, es2032, es2033, es2034 for kernel upgrade
07:32 marostegui: Reboot es2026, es2027, es2028 for kernel upgrade
06:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
06:54 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
06:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
06:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1075.eqiad.wmnet
06:10 marostegui: Reboot dbproxy1014 for kernel upgrade
01:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fe64695: hewikisource: Allow sysops to grant/revoke reviewer (T274796) (duration: 01m 07s)
01:38 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:32 robh@cumin1001: START - Cookbook sre.dns.netbox
00:58 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:49 robh@cumin1001: START - Cookbook sre.dns.netbox
00:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/CentralNotice/resources/ext.centralNotice.display/state.js: dd64e44: Remove optedOutCampaigns property from impression data (T275054) (duration: 01m 08s)
00:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/CentralNotice/resources/ext.centralNotice.display/state.js: ff444c2: Remove optedOutCampaigns property from impression data (T275054) (duration: 01m 09s)
00:31 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 08b32c4: Remove wgCentralNoticeImpressionEventSampleRate; will default to 0 (T275054) (duration: 02m 17s)
00:28 urbanecm@deploy1001: sync-file aborted: 08b32c4: Remove wgCentralNoticeImpressionEventSampleRate; will default to 0 (T275054) (duration: 00m 00s)
00:03 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@8ca6884]: cirrus_namespace_map: Use retries when fetching (duration: 01m 21s)
00:02 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@8ca6884]: cirrus_namespace_map: Use retries when fetching

2021-02-17

20:31 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1034.eqiad.wmnet with reason: REIMAGE
20:29 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1034.eqiad.wmnet with reason: REIMAGE
20:23 marxarelli: 1.36.0-wmf.31 rolled to group1. no new errors for wmf.31 (T271345)
20:17 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.31 (duration: 01m 15s)
20:15 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.31
19:45 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 2e521f7: hewikisource: Allow reviewers to rollback (T274796) (duration: 01m 10s)
19:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 88e6ebc: hewikisource: Add bureaucrats the ability to grant/revoke (trans)import (T274796) (duration: 01m 09s)
19:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 6c5c5f0: arbcom_ruwiki: Add arbcom user group (T274844) (duration: 01m 12s)
19:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1033.eqiad.wmnet with reason: REIMAGE
19:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1033.eqiad.wmnet with reason: REIMAGE
19:27 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=tlwikibooks --fix # T274976 # P14404
19:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c37fa01: tlwikibooks: Add Wikijunior namespace (T274976) (duration: 01m 09s)
19:24 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=tlwikibooks --fix # T274977 # P14403
19:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a7eb726: tlwikibooks: Add WB as an alias to NS_PROJECT (T274977) (duration: 01m 09s)
19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 352dd72: Enable GlobalWatchlist extension on metawiki (T260862) (duration: 01m 07s)
19:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 6ac78bd: Remove uses of removed VisualEditor config variables (T273177; 2/2) (duration: 01m 07s)
19:06 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 6ac78bd: Remove uses of removed VisualEditor config variables (T273177; 1/2) (duration: 01m 14s)
18:40 ppchelko@deploy1001: Finished deploy [restbase/deploy@c5c4b2d]: Remove graphoid T242855 (duration: 19m 54s)
18:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1350.eqiad.wmnet
18:26 effie: enable puppet on mw*
18:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1344.eqiad.wmnet
18:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1343.eqiad.wmnet
18:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1275.eqiad.wmnet
18:20 ppchelko@deploy1001: Started deploy [restbase/deploy@c5c4b2d]: Remove graphoid T242855
18:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1350.eqiad.wmnet
18:14 mutante: mw1350 - powercycled via mgmt
18:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1343.eqiad.wmnet
18:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1344.eqiad.wmnet
18:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1275.eqiad.wmnet
18:07 effie: disable puppet on mw* in eqiad
17:36 godog: roll-restart logstash7 in codfw/eqiad to apply ulogd filters - T234565
17:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1035.eqiad.wmnet with reason: REIMAGE
17:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1035.eqiad.wmnet with reason: REIMAGE
17:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1343.eqiad.wmnet with reason: REIMAGE
17:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1344.eqiad.wmnet with reason: REIMAGE
17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1343.eqiad.wmnet with reason: REIMAGE
17:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1275.eqiad.wmnet with reason: REIMAGE
17:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1344.eqiad.wmnet with reason: REIMAGE
17:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1275.eqiad.wmnet with reason: REIMAGE
17:07 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1350.eqiad.wmnet with reason: REIMAGE
17:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1350.eqiad.wmnet with reason: REIMAGE
16:58 jiji@cumin1001: START - Cookbook sre.dns.netbox
16:46 godog: roll-restart logstash to apply ulogd filter - T234565
16:42 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
16:41 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
16:33 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
16:32 moritzm: installing intel-microcode security updates on buster
16:23 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
16:08 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
16:06 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@b5f4a3e]: (no justification provided) (duration: 00m 30s)
16:05 oblivian@deploy1001: Started deploy [docker-pkg/deploy@b5f4a3e]: (no justification provided)
15:36 cdanis: T275028 rolling restart done; check for fetch failures once caches re-fill
15:34 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
15:31 moritzm: uploaded jasper 1.900.1-debian1-2.4+deb8u6+wmf3 to apt.wikimedia.org
15:28 root@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
15:26 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
15:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
15:17 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1001.eqiad.wmnet
15:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1001.eqiad.wmnet
14:26 cdanis: starting rolling restart of cp-upload@eqsin varnish-fe T275028
13:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14396 and previous config saved to /var/cache/conftool/dbconfig/20210217-135533-root.json
13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 80%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14395 and previous config saved to /var/cache/conftool/dbconfig/20210217-134030-root.json
13:30 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
13:30 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
13:28 moritzm: installing libzstd security updates on Buster
13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 60%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14393 and previous config saved to /var/cache/conftool/dbconfig/20210217-132526-root.json
13:19 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Wikibase Repo ID generator rate limiting on Wikidata (T272032) (duration: 01m 11s)
13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14392 and previous config saved to /var/cache/conftool/dbconfig/20210217-131022-root.json
13:06 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
13:05 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
12:55 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
12:55 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 40%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14391 and previous config saved to /var/cache/conftool/dbconfig/20210217-125519-root.json
12:50 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
12:49 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
12:45 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
12:45 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
12:42 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
12:42 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
12:40 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 20%: Slowly repool db1172', diff saved to https://phabricator.wikimedia.org/P14390 and previous config saved to /var/cache/conftool/dbconfig/20210217-124015-root.json
12:40 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
12:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 6eeee95: vector: Enable search treatment AB test on test wikis (T259798) (duration: 01m 08s)
12:10 urbanecm@deploy1001: Synchronized dblists/desktop-improvements.dblist: 7872251: Revert "Revert "vector: Enable WVUI search on test wikis"" (T259798) (duration: 01m 09s)
12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7872251: Revert "Revert "vector: Enable WVUI search on test wikis"" (T259798) (duration: 01m 25s)
11:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2001.wikimedia.org
11:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netbox-dev2001.wikimedia.org
11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1172 in s8 - T258361', diff saved to https://phabricator.wikimedia.org/P14389 and previous config saved to /var/cache/conftool/dbconfig/20210217-112422-marostegui.json
11:08 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
11:08 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
11:04 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
11:04 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
11:04 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
11:03 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cloudnet1004.eqiad.wmnet with reason: hardware failure
10:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cloudnet1004.eqiad.wmnet with reason: hardware failure
10:13 _joe_: depooling mw1331 to perform some tests for T266855
10:08 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:01 aborrero@cumin1001: START - Cookbook sre.dns.netbox
09:32 elukey: reboot dbstore100[3-5] for kernel upgrades
08:44 marostegui: upgrade es2020 es2021 es2022's kernel
08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1172 in s8 - T258361', diff saved to https://phabricator.wikimedia.org/P14388 and previous config saved to /var/cache/conftool/dbconfig/20210217-084120-marostegui.json
08:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
08:04 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1172 in s8 - T258361', diff saved to https://phabricator.wikimedia.org/P14387 and previous config saved to /var/cache/conftool/dbconfig/20210217-074107-marostegui.json
07:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1007.eqiad.wmnet
07:33 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1007.eqiad.wmnet
07:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1006.eqiad.wmnet
07:23 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1006.eqiad.wmnet
07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1172 in s8 for the first time - T258361', diff saved to https://phabricator.wikimedia.org/P14386 and previous config saved to /var/cache/conftool/dbconfig/20210217-072131-marostegui.json
07:21 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
07:16 marostegui: Add x1 to orchestrator
07:04 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
07:01 marostegui: Restart db1103 (x1) primary master DONE - T273758
07:00 marostegui: Restart db1103 (x1) primary master - T273758
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1172 to dbctl, but not pooled yet T258361', diff saved to https://phabricator.wikimedia.org/P14385 and previous config saved to /var/cache/conftool/dbconfig/20210217-063915-marostegui.json
01:41 mutante: mwdebug1001 - back on buster and pooled
01:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mwdebug1001.eqiad.wmnet
01:39 mutante: mwdebug1001 - rebooting
01:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1345.eqiad.wmnet
01:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1351.eqiad.wmnet
01:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1001.eqiad.wmnet
01:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mwdebug1001.eqiad.wmnet
00:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1345.eqiad.wmnet
00:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1351.eqiad.wmnet
00:33 mutante: mw1351 - powercycled
00:27 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1001.eqiad.wmnet
00:17 legoktm@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/timeline/: Add $wgTimelineFontDirectory to be passed as GDFONTPATH (T274822) (duration: 01m 06s)
00:15 legoktm@deploy1001: Synchronized php-1.36.0-wmf.31/extensions/timeline/: Add $wgTimelineFontDirectory to be passed as GDFONTPATH (T274822) (duration: 01m 02s)
00:13 legoktm@deploy1001: Synchronized wmf-config/timeline.php: Set $wgTimelineFontDirectory (T274822) (duration: 01m 05s)
00:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1345.eqiad.wmnet with reason: REIMAGE
00:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1345.eqiad.wmnet with reason: REIMAGE

2021-02-16

23:54 mutante: puppetmaster1001 - puppet cert clean mwdebug1001, sign new request, initial puppet run, now on buster (T274023)
23:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1351.eqiad.wmnet with reason: REIMAGE
23:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1351.eqiad.wmnet with reason: REIMAGE
23:44 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mwdebug1001.eqiad.wmnet
23:44 mutante: reimaging mwdebug1001 with buster
23:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1001.eqiad.wmnet
23:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug1001.eqiad.wmnet with reason: OS upgrade
23:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug1001.eqiad.wmnet with reason: OS upgrade
23:09 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.30/includes/HookContainer/DeprecatedHooks.php: silence deprecation refs T274889 (duration: 01m 14s)
22:52 jgleeson: updated payments-wiki config to 3d1b4564a2
22:39 gehel: restarting wdqs-updater on wdqs2001
22:35 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
22:23 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
22:22 akosiaris: re-enable puppet and squid on install1003. wdqs seems to be mildly related to the outage, restart it
22:09 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
21:45 akosiaris: stop squid as a stopgap on install1003 and disable puppet so that it is not restarted while we figure out what wdqs updater is doing to cause issue to mediawiki
20:47 marxarelli: 1.36.0-wmf.31 rolled to group0. no new errors for wmf.31 (T271345)
20:33 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.31
20:20 mutante: mwdebug1002 has been recreated on buster and has been repooled after scap pull - you can find a .tar.gz in your home with the contents of your home before reimaging, fingerprint at T274023#6835116
20:18 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet
20:18 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet
20:18 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1289.eqiad.wmnet
20:18 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1288.eqiad.wmnet
20:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mwdebug1002.eqiad.wmnet
20:15 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1002.eqiad.wmnet
20:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
20:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
20:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
20:03 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1288.eqiad.wmnet
19:58 ryankemper: [WDQS] De-pooled `wdqs100[4,7]` to catch up on lag, and pooled `wdqs100[5,6]`
19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade
19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade
19:06 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1297.eqiad.wmnet with reason: REIMAGE
19:04 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1290.eqiad.wmnet with reason: REIMAGE
19:03 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1297.eqiad.wmnet with reason: REIMAGE
19:02 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1289.eqiad.wmnet with reason: REIMAGE
19:01 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1290.eqiad.wmnet with reason: REIMAGE
19:00 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1288.eqiad.wmnet with reason: REIMAGE
18:59 mutante: puppetmaster1002 - puppet cert clean mwdebug1002.eqiad.wmnet, sign new request, initial puppet run (T274023)
18:59 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1289.eqiad.wmnet with reason: REIMAGE
18:58 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1288.eqiad.wmnet with reason: REIMAGE
18:52 mutante: re-creating mwdebug1002
18:49 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.31 (duration: 49m 37s)
18:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1346.eqiad.wmnet
18:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1352.eqiad.wmnet
18:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1347.eqiad.wmnet
18:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1346.eqiad.wmnet
18:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1352.eqiad.wmnet
18:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1347.eqiad.wmnet
18:28 mutante: mw1352 - powercycle via mgmt
18:04 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.31
17:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1346.eqiad.wmnet with reason: REIMAGE
17:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1347.eqiad.wmnet with reason: REIMAGE
17:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1346.eqiad.wmnet with reason: REIMAGE
17:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1347.eqiad.wmnet with reason: REIMAGE
17:36 marxarelli: 1.36.0-wmf.31 was branched at c49ac6d (T271345)
17:33 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
17:32 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
17:31 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
17:30 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
17:30 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
17:30 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
17:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1352.eqiad.wmnet with reason: REIMAGE
17:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1352.eqiad.wmnet with reason: REIMAGE
17:24 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837
17:23 jforrester@deploy1001: Finished deploy [integration/docroot@8ab9125]: Update docroot with Special:MyLanguage links. (duration: 00m 11s)
17:23 jforrester@deploy1001: Started deploy [integration/docroot@8ab9125]: Update docroot with Special:MyLanguage links.
17:21 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
17:21 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
17:18 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
16:25 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd[2001-2003].codfw.wmnet with reason: klausman: Pushing new etcd changes from T273071
16:25 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd[2001-2003].codfw.wmnet with reason: klausman: Pushing new etcd changes from T273071
16:17 moritzm: installing edk2 security updates
16:09 moritzm: installing python-bottle security updates on buster
15:58 papaul: power down ms-be2031 for firmware upgrade
15:44 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd[1001-1003].eqiad.wmnet with reason: klausman: Pushing new etcd changes from T273071
15:44 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd[1001-1003].eqiad.wmnet with reason: klausman: Pushing new etcd changes from T273071
15:27 cdanis: re-enabling Puppet on cp-upload@eqsin to deploy Iab4d211 T274888
15:26 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
15:25 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
15:25 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
15:25 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
15:17 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
15:17 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
15:16 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:16 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEvents: Sample mediawiki.client.session_tick at 1:100 (T274172) (duration: 01m 00s)
15:14 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
15:14 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
15:13 akosiaris@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
15:13 akosiaris@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
15:12 cdanis: previous message was re: T274888
15:11 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin 'A:cp-upload and A:eqsin' 'disable-puppet "cdanis deploying Iab4d211 T263496"'
14:38 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.30 refs T271344 bfc73b6
14:24 twentyafterfour: MediaWiki train: prepare to promote all wikis to 1.36.0-wmf.30 refs T271344
14:07 akosiaris: rolling restart of cp500[1-6]
13:40 marostegui: Deploy schema change on s2 codfw - T273359
13:13 urbanecm@deploy1001: Synchronized static/images/cswiki-black-ribbon.png: 5d5b5c4: Temporarily add cswiki-black-ribbon.png as a static resource (duration: 01m 07s)
13:02 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
12:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:46 aborrero@cumin1001: START - Cookbook sre.dns.netbox
12:41 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
12:39 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable Wikibase Repo ID generator rate limiting on Test Wikidata (T272032) 2/2 (duration: 01m 06s)
12:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Wikibase Repo ID generator rate limiting on Test Wikidata (T272032) 1/2 (duration: 01m 12s)
12:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
12:08 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
12:06 marostegui: Deploy schema change on s5 codfw - T273359
11:54 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/DiscussionTools/includes/CommentFormatter.php: 5f4f516: CommentFormatter: Fix problems with editsection and quotes (T274709) (duration: 01m 12s)
11:54 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
11:54 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
11:52 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
11:52 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
11:47 kharlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
11:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1023.eqiad.wmnet
11:45 marostegui: Failover m2-master back from dbproxy1015 to dbproxy1013
11:42 effie: upgrade mc2037 to memcached 1.6 - T270315
11:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1023.eqiad.wmnet
11:40 marostegui: Reboot dbproxy1013 for kernel upgrade
11:29 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
11:28 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
11:27 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
10:53 marostegui: Reboot es2023, es2024 and es2025 for kernel upgrade
10:46 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
10:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1092 (re)pooling @ 100%: Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P14373 and previous config saved to /var/cache/conftool/dbconfig/20210216-103730-root.json
10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1092 (re)pooling @ 80%: Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P14372 and previous config saved to /var/cache/conftool/dbconfig/20210216-102227-root.json
10:19 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
10:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
10:18 marostegui: Reboot pc1010 for kernel upgrade
10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1075 from dbctl T274235', diff saved to https://phabricator.wikimedia.org/P14371 and previous config saved to /var/cache/conftool/dbconfig/20210216-101710-marostegui.json
10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1092 (re)pooling @ 60%: Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P14370 and previous config saved to /var/cache/conftool/dbconfig/20210216-100723-root.json
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1092 (re)pooling @ 40%: Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P14369 and previous config saved to /var/cache/conftool/dbconfig/20210216-095220-root.json
09:40 akosiaris: deploy new certs for apertium
09:40 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
09:40 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'staging' .
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1092 (re)pooling @ 20%: Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P14368 and previous config saved to /var/cache/conftool/dbconfig/20210216-093716-root.json
09:28 marostegui: Failover m2-master from dbproxy1013 to dbproxy1015
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1092 (re)pooling @ 10%: Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P14367 and previous config saved to /var/cache/conftool/dbconfig/20210216-092213-root.json
08:37 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836
08:30 marostegui: Deploy schema change on s6 codfw - T273359
07:40 dcausse: restarting blazegraph on wdqs1013
07:27 marostegui: Reboot dbproxy1021 for kernel upgrade
07:21 marostegui: Reboot dbproxy1012, 1015, 1016, 1017 for kernel upgrade
07:18 marostegui: Reboot dbproxy2* for kernel upgrade
06:49 marostegui: Reboot pc2010 pc2009 pc2008 pc2007 for kernel upgrade
06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 to clone db1172 T258361', diff saved to https://phabricator.wikimedia.org/P14365 and previous config saved to /var/cache/conftool/dbconfig/20210216-064602-marostegui.json
06:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
06:37 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1093 from dbctl T273955', diff saved to https://phabricator.wikimedia.org/P14364 and previous config saved to /var/cache/conftool/dbconfig/20210216-063250-marostegui.json
04:17 jforrester@deploy1001: Finished deploy [integration/docroot@864afdb]: Update docroot with changes from this weekend. (duration: 00m 17s)
04:17 jforrester@deploy1001: Started deploy [integration/docroot@864afdb]: Update docroot with changes from this weekend.

2021-02-15

21:33 eileen: civicrm revision changed from dfbb8f41bc to c535ac603a, config revision is ba9b2380b1
16:46 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1002.eqiad.wmnet
16:39 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage1002.eqiad.wmnet
16:33 volans: restarted netbox on netbox1001
16:32 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1001.eqiad.wmnet
16:27 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage1001.eqiad.wmnet
16:26 jayme: rolled back linkrecommendation helm releases to the most recent revision running chart verion linkrecommendation-0.0.4 on clusters codfw and eqiad (cc: kostajh)
16:22 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mwdebug1002.eqiad.wmnet
16:18 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2002.codfw.wmnet
16:14 hoo: Updated the Wikidata property suggester with data from the 2021-02-01 JSON dump (with pre-applied T132839 workarounds)
16:12 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
16:12 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
16:09 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2003-dev.codfw.wmnet
16:07 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
16:05 aborrero@cumin2001: START - Cookbook sre.hosts.reboot-single for host cloudnet2003-dev.codfw.wmnet
15:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet
15:53 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet
15:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2003.codfw.wmnet
15:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2003.codfw.wmnet
15:48 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
15:46 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1004.eqiad.wmnet
15:38 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1004.eqiad.wmnet
15:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast3004.wikimedia.org with reason: REIMAGE
15:36 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
15:36 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
15:36 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
15:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on bast3004.wikimedia.org with reason: REIMAGE
15:33 moritzm: installing linux-4.19 update for Stretch on servers which have it installed (no reboots, just updating the kernels)
15:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1003.eqiad.wmnet
15:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1003.eqiad.wmnet
15:09 moritzm: reimaging bast3004 to buster
15:04 godog: upgrade grafana to 7.4.1 on grafana1002 - T263747
14:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 00905c4: Add *.president.az to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T274789) (duration: 01m 09s)
14:08 godog: swift eqiad-prod: add weight back to sdg on ms-be1054 - T273582
13:57 moritzm: installing libonig security update for stretch
13:53 gehel@cumin2001: START - Cookbook sre.wdqs.data-reload
13:38 moritzm: installing subversion security updates
13:33 marostegui: Stop MySQL on db1093 - T273955
13:19 kharlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
13:06 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836
13:05 Lucas_WMDE: notice: stashbot had issues between 8:19 and 12:50, see for https://wm-bot.wmflabs.org/browser/index.php?start=02%2F15%2F2021&end=02%2F15%2F2021&display=%23wikimedia-operations for missed !log messages
13:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4002.wikimedia.org with reason: REIMAGE
13:02 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:02 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4002.wikimedia.org with reason: REIMAGE
12:58 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
12:58 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 4%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14343 and previous config saved to /var/cache/conftool/dbconfig/20210215-080435-root.json
07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 3%: Slowly pool db1162', diff saved to https://phabricator.wikimedia.org/P14342 and previous config saved to /var/cache/conftool/dbconfig/20210215-074932-root.json
07:42 elukey@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes - elukey@cumin1001
07:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet
07:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet
07:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1009.eqiad.wmnet
07:24 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1009.eqiad.wmnet
07:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet
07:20 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet
07:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
07:14 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1162 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14341 and previous config saved to /var/cache/conftool/dbconfig/20210215-070206-marostegui.json
06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1162 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14340 and previous config saved to /var/cache/conftool/dbconfig/20210215-064628-marostegui.json
06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1162 to dbctl - depooled T258361', diff saved to https://phabricator.wikimedia.org/P14339 and previous config saved to /var/cache/conftool/dbconfig/20210215-064001-marostegui.json

2021-02-14

13:13 akosiaris: sudo cumin -b 1 -s 120 'cp500[2,3,5,6].eqsin.wmnet' 'systemctl restart varnish-frontend.service'
13:10 _joe_: restarted varnish-fe on cp5004
13:09 akosiaris: restart varnish-fe on cp5001
09:27 joal@deploy1001: Finished deploy [analytics/refinery@dd5f947] (thin): Hotfix analytics deployment - THIN [analytics/refinery@dd5f947] (duration: 00m 06s)
09:27 joal@deploy1001: Started deploy [analytics/refinery@dd5f947] (thin): Hotfix analytics deployment - THIN [analytics/refinery@dd5f947]
09:27 joal@deploy1001: Finished deploy [analytics/refinery@dd5f947]: Hotfix analytics deployment [analytics/refinery@dd5f947] (duration: 12m 52s)
09:14 joal@deploy1001: Started deploy [analytics/refinery@dd5f947]: Hotfix analytics deployment [analytics/refinery@dd5f947]

2021-02-13

03:23 ryankemper: Depooled `wdqs1006` to catch up on lag
03:23 ryankemper: Restarted blazegraph on `wdqs1006`
01:30 crusnov@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mwdebug1002.eqiad.wmnet
01:00 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
00:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host mwdebug1002.eqiad.wmnet
00:49 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
00:38 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host mwdebug1002.eqiad.wmnet
00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1281.eqiad.wmnet
00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1282.eqiad.wmnet
00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1283.eqiad.wmnet
00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1284.eqiad.wmnet
00:30 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
00:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host mwdebug1002.eqiad.wmnet
00:26 mutante: ganeti - attempting to recreate VM mwdebug1002 with cookbook that wsa previously deleted manually (T274689 T274023)
00:25 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
00:08 mutante: ganeti1011 - manually deleting VM mwdebug1002 - T274689 T274023

2021-02-12

23:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1348.eqiad.wmnet
23:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1356.eqiad.wmnet
23:43 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
23:42 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1283.eqiad.wmnet
23:42 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1282.eqiad.wmnet
23:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1348.eqiad.wmnet
23:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1356.eqiad.wmnet
23:41 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1221.eqiad.wmnet
23:39 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1221.eqiad.wmnet
23:38 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1281.eqiad.wmnet
23:26 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
23:24 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
23:14 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
23:02 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
22:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
22:51 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1284.eqiad.wmnet with reason: REIMAGE
22:49 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1283.eqiad.wmnet with reason: REIMAGE
22:48 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1284.eqiad.wmnet with reason: REIMAGE
22:47 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1283.eqiad.wmnet with reason: REIMAGE
22:47 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1282.eqiad.wmnet with reason: REIMAGE
22:45 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1282.eqiad.wmnet with reason: REIMAGE
22:44 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1281.eqiad.wmnet with reason: REIMAGE
22:42 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1281.eqiad.wmnet with reason: REIMAGE
22:32 krinkle@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: Idc385de0 cleanup (duration: 05m 14s)
22:15 krinkle@deploy1001: Synchronized wmf-config/etcd.php: b3447343a cleanup (duration: 05m 20s)
22:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1348.eqiad.wmnet with reason: REIMAGE
21:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1348.eqiad.wmnet with reason: REIMAGE
21:26 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1357.eqiad.wmnet
20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1357.eqiad.wmnet with reason: REIMAGE
20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1356.eqiad.wmnet with reason: REIMAGE
20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1357.eqiad.wmnet with reason: REIMAGE
20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1356.eqiad.wmnet with reason: REIMAGE
20:36 mutante: mwdebug1003 now on buster - mwdebug1002 rebooting and reimaging to buster
20:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade
20:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade
20:32 mutante: mw1353, mw1358 - scap pull, repooled
20:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1353.eqiad.wmnet
20:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1358.eqiad.wmnet
20:17 mutante: mwdebug2001 - restarted memcached
20:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1358.eqiad.wmnet
20:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1353.eqiad.wmnet
19:56 mutante: mwdebug2002 - restart memcached
19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1358.eqiad.wmnet with reason: OS upgrade
19:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1358.eqiad.wmnet with reason: OS upgrade
19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1353.eqiad.wmnet with reason: OS upgrade
19:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1353.eqiad.wmnet with reason: OS upgrade
19:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug1003.eqiad.wmnet with reason: OS upgrade
19:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug1003.eqiad.wmnet with reason: OS upgrade
19:43 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: roll back commonswiki to 1.36.0-wmf.27 due to T274589
19:42 mutante: mwdebug2001 now on buster - mwdebug1003 rebooting and reimaging to stretch
19:38 milimetric@deploy1001: Finished deploy [analytics/refinery@e0c09a2] (thin): Fix for mediarequest per file cassandra job - 2 (duration: 00m 06s)
19:38 milimetric@deploy1001: Started deploy [analytics/refinery@e0c09a2] (thin): Fix for mediarequest per file cassandra job - 2
19:38 milimetric@deploy1001: Finished deploy [analytics/refinery@e0c09a2]: Fix for mediarequest per file cassandra job - 2 (duration: 11m 01s)
19:34 twentyafterfour: Train status: Rolling back commonswiki to wmf.27 due to T274589 (refs T271344)
19:29 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1358.eqiad.wmnet with reason: REIMAGE
19:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1353.eqiad.wmnet with reason: REIMAGE
19:28 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs1015.eqiad.wmnet with reason: REIMAGE
19:27 milimetric@deploy1001: Started deploy [analytics/refinery@e0c09a2]: Fix for mediarequest per file cassandra job - 2
19:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1358.eqiad.wmnet with reason: REIMAGE
19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1353.eqiad.wmnet with reason: REIMAGE
19:23 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1015.eqiad.wmnet with reason: REIMAGE
19:20 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE
19:18 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE
19:18 milimetric@deploy1001: Finished deploy [analytics/refinery@366962f]: Fix for mediarequest per file cassandra job (duration: 11m 58s)
19:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
19:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE
19:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
19:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1010.eqiad.wmnet with reason: REIMAGE
19:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE
19:11 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1010.eqiad.wmnet with reason: REIMAGE
19:06 milimetric@deploy1001: Started deploy [analytics/refinery@366962f]: Fix for mediarequest per file cassandra job
19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug2001.codfw.wmnet with reason: OS upgrade
19:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug2001.codfw.wmnet with reason: OS upgrade
19:02 mutante: rebooting and reimaging mwdebug2001 to buster T274023
18:35 mutante: mwdebug2002 now a buster VM; you can find a .tar.gz in your home dir with the contents of your previous home
18:30 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@98264b8]: airflow: review and correct usage of catchup=False (duration: 03m 10s)
18:27 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@98264b8]: airflow: review and correct usage of catchup=False
17:33 elukey@cumin1001: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
17:23 bblack: cp*: re-enabling puppet after successful agent run on one host as a test!
17:13 bblack: cp*: disable puppet ahead of https://gerrit.wikimedia.org/r/c/operations/puppet/+/663845
17:08 elukey@cumin1001: START - Cookbook sre.presto.reboot-workers for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
17:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
16:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
16:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
16:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1001.eqiad.wmnet
16:38 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1001.eqiad.wmnet
16:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
16:26 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
16:12 mforns@deploy1001: Finished deploy [analytics/refinery@9cd1297] (hadoop-test): Fix for data quality alarms after BigTop migration TEST [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5] (duration: 04m 05s)
16:11 hnowlan: joining maps2007 to cassandra cluster
16:08 mforns@deploy1001: Started deploy [analytics/refinery@9cd1297] (hadoop-test): Fix for data quality alarms after BigTop migration TEST [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5]
16:08 mforns@deploy1001: Finished deploy [analytics/refinery@9cd1297] (thin): Fix for data quality alarms after BigTop migration THIN [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5] (duration: 00m 06s)
16:07 mforns@deploy1001: Started deploy [analytics/refinery@9cd1297] (thin): Fix for data quality alarms after BigTop migration THIN [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5]
16:07 mforns@deploy1001: Finished deploy [analytics/refinery@9cd1297]: Fix for data quality alarms after BigTop migration [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5] (duration: 38m 56s)
15:28 mforns@deploy1001: Started deploy [analytics/refinery@9cd1297]: Fix for data quality alarms after BigTop migration [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5]
15:22 herron: rolling reboot of alert[12]001 hosts for updates
15:16 elukey: roll restart druid broker on druid-public to pick up new settings
14:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1022.eqiad.wmnet
14:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1022.eqiad.wmnet
14:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1090.eqiad.wmnet
14:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1089.eqiad.wmnet
14:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3065.esams.wmnet
14:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3064.esams.wmnet
13:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3065.esams.wmnet
13:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3064.esams.wmnet
13:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1090.eqiad.wmnet
13:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
13:10 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1005.eqiad.wmnet
12:12 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps2007.codfw.wmnet with reason: Resyncing database
12:11 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps2007.codfw.wmnet with reason: Resyncing database
11:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2042.codfw.wmnet
11:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2041.codfw.wmnet
11:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2042.codfw.wmnet
11:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2041.codfw.wmnet
11:27 moritzm: installing emacs updates from buster point release
11:25 moritzm: installing device-tree-compiler updates from buster point release
11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1088.eqiad.wmnet
11:22 moritzm: installing node-ini security updates
11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1087.eqiad.wmnet
11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3062.esams.wmnet
11:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3063.esams.wmnet
11:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
11:14 moritzm: installing golang-1.11 security updates
11:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3063.esams.wmnet
11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3062.esams.wmnet
11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1088.eqiad.wmnet
11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1087.eqiad.wmnet
11:10 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 100%', diff saved to https://phabricator.wikimedia.org/P14337 and previous config saved to /var/cache/conftool/dbconfig/20210212-111010-jynus.json
11:06 moritzm: installing xcftools security updates
10:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2039.codfw.wmnet
10:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2040.codfw.wmnet
10:50 legoktm: repooled registry1002 after revert
10:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2040.codfw.wmnet
10:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2039.codfw.wmnet
10:39 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 75%', diff saved to https://phabricator.wikimedia.org/P14336 and previous config saved to /var/cache/conftool/dbconfig/20210212-103921-jynus.json
10:24 moritzm: installing wireshark security updates for stretch
10:22 legoktm: depooled registry1002 while fixing/debugging nginx config
10:22 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Victorgrigas . # T274608
10:18 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 50%', diff saved to https://phabricator.wikimedia.org/P14335 and previous config saved to /var/cache/conftool/dbconfig/20210212-101814-jynus.json
10:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2004.codfw.wmnet
10:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4032.ulsfo.wmnet
10:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4026.ulsfo.wmnet
10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1086.eqiad.wmnet
10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3061.esams.wmnet
10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3060.esams.wmnet
10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1085.eqiad.wmnet
10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2038.codfw.wmnet
10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5012.eqsin.wmnet
10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2037.codfw.wmnet
10:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5006.eqsin.wmnet
10:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2004.codfw.wmnet
09:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2003.codfw.wmnet
09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5006.eqsin.wmnet
09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5012.eqsin.wmnet
09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4026.ulsfo.wmnet
09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4032.ulsfo.wmnet
09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3061.esams.wmnet
09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2038.codfw.wmnet
09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2037.codfw.wmnet
09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1086.eqiad.wmnet
09:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1085.eqiad.wmnet
09:46 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2003.codfw.wmnet
09:45 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 30%', diff saved to https://phabricator.wikimedia.org/P14334 and previous config saved to /var/cache/conftool/dbconfig/20210212-094520-jynus.json
09:32 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 20%', diff saved to https://phabricator.wikimedia.org/P14333 and previous config saved to /var/cache/conftool/dbconfig/20210212-093211-jynus.json
09:31 moritzm: installing node-y18n security updates
08:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2002.wikimedia.org with reason: REIMAGE
08:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2002.wikimedia.org with reason: REIMAGE
08:25 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 10%', diff saved to https://phabricator.wikimedia.org/P14331 and previous config saved to /var/cache/conftool/dbconfig/20210212-082526-jynus.json
08:15 moritzm: reimaging bast2002 to buster
07:54 elukey: roll restart of druid brokers on druid-public - locked after scheduled datasource deletion
03:36 krinkle@deploy1001: Finished deploy [integration/docroot@3c943ba]: I89e1ec881 (duration: 00m 08s)
03:36 krinkle@deploy1001: Started deploy [integration/docroot@3c943ba]: I89e1ec881
01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1329.eqiad.wmnet
01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1330.eqiad.wmnet
01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1331.eqiad.wmnet
01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1332.eqiad.wmnet
01:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1332.eqiad.wmnet
01:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1331.eqiad.wmnet
01:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1330.eqiad.wmnet
01:07 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1329.eqiad.wmnet
01:06 Urbanecm: Evening B&C done
01:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 389f7f1: Enable DiscussionTools Reply Tool A/B test (T273554) (duration: 01m 08s)
01:02 urbanecm@deploy1001: sync-file aborted: 389f7f1: Enable DiscussionTools Reply Tool A/B test (duration: 00m 48s)
01:01 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/VisualEditor/: c86cd00: de4a562: VE backports (T273096) (duration: 01m 15s)
00:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5d92ed1: Add import sources for zh_yuewiki (T274597) (duration: 01m 13s)
00:34 foks: removing 2 files for legal compliance
00:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a022f2b: Oversample DiscussionTools EditAttemptStep logging (T273946) (duration: 01m 08s)
00:30 Urbanecm: mwscript namespaceDupes.php itwikiquote --fix --add-prefix=BROKEN # T273362
00:29 Urbanecm: mwscript namespaceDupes.php itwikiquote --fix # T273362
00:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: f051c6c: Adding WQ as namespace alias for itwikiquote (T273362) (duration: 01m 10s)
00:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 53229b0: Enabling extension SandboxLink on ltwiki (T273957) (duration: 01m 07s)
00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
00:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
00:07 ejegg: updated fundraising civicrm from b81cb5e702 to dfbb8f41bc

2021-02-11

23:50 Urbanecm: Deploy security patch for T274514
23:47 mutante: reimaged mwdebug2002 with buster - since this is a VM: manually cleaned puppet cert on puppetmaster1001, signed new cert for same hostname, initial puppet run etc (T274023)
23:44 twentyafterfour: Train status for wmf.30 (T271344) is blocked until monday. leaving wmf.30 on group1 and wmf.27 on group2 in spite of T260401
23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog1002.eqiad.wmnet with reason: REIMAGE
23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog1002.eqiad.wmnet with reason: REIMAGE
23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
23:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
23:20 mutante: reimaging mwdebug2002 - stretch -> buster
22:57 Urbanecm: Run scap pull at mwmaint1002
22:53 mutante: powercycling crashed mwmaint1002
22:53 Urbanecm: Deploy security patch for T274514
22:11 legoktm@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/GlobalWatchlist: GlobalWatchlist backports (duration: 01m 11s)
22:05 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1332.eqiad.wmnet with reason: REIMAGE
22:03 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1331.eqiad.wmnet with reason: REIMAGE
22:03 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1332.eqiad.wmnet with reason: REIMAGE
22:01 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1330.eqiad.wmnet with reason: REIMAGE
22:01 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1331.eqiad.wmnet with reason: REIMAGE
21:59 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1329.eqiad.wmnet with reason: REIMAGE
21:59 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1330.eqiad.wmnet with reason: REIMAGE
21:57 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1329.eqiad.wmnet with reason: REIMAGE
21:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1354.eqiad.wmnet
21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1354.eqiad.wmnet
21:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
21:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1355.eqiad.wmnet
21:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1359.eqiad.wmnet
21:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1359.eqiad.wmnet
21:37 mutante: mw1355, mw1359 - power cycling
21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1354.eqiad.wmnet with reason: REIMAGE
21:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1354.eqiad.wmnet with reason: REIMAGE
21:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1360.eqiad.wmnet
21:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1360.eqiad.wmnet
21:05 mutante: mw1360 - powercycling
21:01 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1364.eqiad.wmnet
20:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1364.eqiad.wmnet
20:52 mutante: mw1364 - powercycled
20:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1355.eqiad.wmnet with reason: REIMAGE
20:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1355.eqiad.wmnet with reason: REIMAGE
20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1359.eqiad.wmnet with reason: REIMAGE
20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1359.eqiad.wmnet with reason: REIMAGE
20:26 twentyafterfour: new train blocker preventing deploy of 1.36.0-wmf.30 to all wikis. T274589 blocks T271344
20:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1365.eqiad.wmnet
20:23 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1365.eqiad.wmnet
20:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1360.eqiad.wmnet with reason: REIMAGE
20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1360.eqiad.wmnet with reason: REIMAGE
20:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1361.eqiad.wmnet
20:09 mutante: mw1365 - powercycle - reboot issue
20:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1361.eqiad.wmnet
20:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1364.eqiad.wmnet with reason: REIMAGE
20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1364.eqiad.wmnet with reason: REIMAGE
19:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1362.eqiad.wmnet
19:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1362.eqiad.wmnet
19:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1368.eqiad.wmnet
19:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1361.eqiad.wmnet with reason: REIMAGE
19:40 mutante: mw1368 - had the reboot via IPMI issue, did DRAC reset and repeated wmf-autoreimage, issue did not happen again
19:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1368.eqiad.wmnet
19:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1361.eqiad.wmnet with reason: REIMAGE
19:32 urbanecm@deploy1001: Synchronized wmf-config/logos.php: noop: a1244df: Add inline documentation to configuration about updating logos regarding labs (duration: 01m 08s)
19:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1365.eqiad.wmnet with reason: REIMAGE
19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 93e168c: Added Kokebok namespace to nowikibooks (T274265) (duration: 01m 20s)
19:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1362.eqiad.wmnet with reason: REIMAGE
19:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1365.eqiad.wmnet with reason: REIMAGE
19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1362.eqiad.wmnet with reason: REIMAGE
19:20 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1363.eqiad.wmnet
19:13 robh@cumin1001: START - Cookbook sre.dns.netbox
19:13 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1363.eqiad.wmnet
19:13 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
19:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
19:04 mutante: mw1363 - powercycled, reboot issue
18:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1374.eqiad.wmnet
18:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1374.eqiad.wmnet
18:46 mutante: mw1368 - racadm racreset
18:46 mutante: mw1368 - reboot via IPMI issue & can't powercycle "Unable to perform requested operation." - racreet
18:43 mutante: mw1374 - powercycled, reboot via ipmi issue
18:19 robh@cumin1001: START - Cookbook sre.dns.netbox
18:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:11 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
18:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
17:59 bblack: lvs2007 - downtimes ended, back in service - T274571
17:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1363.eqiad.wmnet with reason: REIMAGE
17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
17:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1363.eqiad.wmnet with reason: REIMAGE
17:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1374.eqiad.wmnet with reason: REIMAGE
17:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1374.eqiad.wmnet with reason: REIMAGE
17:52 bblack: lvs2007 - starting up puppet + pybal - T274571
17:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1375.eqiad.wmnet
17:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1375.eqiad.wmnet
17:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1376.eqiad.wmnet
17:31 bblack: lvs2007 - shutting down host - T274571
17:27 bblack: lvs2007 - stopping pybal - T274571
17:26 bblack: lvs2007 - puppet disabled, downtimed in icinga - T274571
17:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1376.eqiad.wmnet
17:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:07 mutante: mw1375 - powercycle - stuck at reboot
17:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1376.eqiad.wmnet
16:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: cumin execution failed during wmf-reimaged
16:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: cumin execution failed during wmf-reimaged
16:38 mutante: mw1368 - File "/usr/lib/python3/dist-packages/spicerack/remote.py", line 637, in _execute raise RemoteExecutionError(ret, 'Cumin execution failed')
16:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
16:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1375.eqiad.wmnet with reason: REIMAGE
16:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1376.eqiad.wmnet with reason: REIMAGE
16:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1375.eqiad.wmnet with reason: REIMAGE
16:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
16:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1376.eqiad.wmnet with reason: REIMAGE
16:24 ejegg: updated payments-wiki from a232fc3438 to 4b7b195c8a
16:13 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1163 at 1%, again T258361', diff saved to https://phabricator.wikimedia.org/P14323 and previous config saved to /var/cache/conftool/dbconfig/20210211-161308-kormat.json
15:52 jynus: deploying fixed grants to db1163
15:50 gehel: ban elastic2054 from shard allocation - T274555
15:49 jynus@cumin1001: dbctl commit (dc=all): 'Depool 1163', diff saved to https://phabricator.wikimedia.org/P14321 and previous config saved to /var/cache/conftool/dbconfig/20210211-154902-jynus.json
15:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
15:46 gehel: depooling elastic2054 - T274555
15:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
15:45 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1163 at 1% T258361', diff saved to https://phabricator.wikimedia.org/P14320 and previous config saved to /var/cache/conftool/dbconfig/20210211-154501-kormat.json
15:39 gehel: powercycle elastic2054 - T274555
15:39 gehel: powercycle elastic2054
14:44 kormat@cumin1001: dbctl commit (dc=all): 'Add db1163 to s1 T258361', diff saved to https://phabricator.wikimedia.org/P14318 and previous config saved to /var/cache/conftool/dbconfig/20210211-144445-kormat.json
14:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreams: Update sampling config syntax for test.instrumentation.sampled (duration: 01m 08s)
14:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2001.wikimedia.org
14:02 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host netmon2001.wikimedia.org
13:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
13:48 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
13:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
13:41 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
13:28 godog: test grafana 7.4.1 upgrade on grafana2001 - T263747
13:27 moritzm: re-adding ganeti5002 to the eqsin Ganeti cluster following mainboard replacement/reinstall T261130
13:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
13:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
13:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
13:04 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
13:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
13:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
12:57 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
12:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
12:45 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
12:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
12:40 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
12:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d2b1df1: Changing frwiktionary wmgBabelMainCategory (T274137) (duration: 01m 08s)
12:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
12:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
12:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: wikidata: post edit constraint jobs on 50% of edits (T204031) (up from 40%) (duration: 01m 08s)
12:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: wikidata: add Dagbani to wmgExtraLanguageNames (T272242) (duration: 01m 29s)
12:06 jynus: restart-failed systemd on cumin1001 after s5 eqiad snapshot failed
11:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2002.codfw.wmnet
11:45 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
11:41 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
11:40 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
11:39 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
11:39 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2002.codfw.wmnet
11:35 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
11:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2001.codfw.wmnet
11:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2001.codfw.wmnet
11:25 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1004.eqiad.wmnet
11:17 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
11:13 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
11:06 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
11:04 kormat@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: changed binlog_format T274472', diff saved to https://phabricator.wikimedia.org/P14315 and previous config saved to /var/cache/conftool/dbconfig/20210211-110447-kormat.json
11:03 moritzm: installing firejail security updates on Stretch
10:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
10:50 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
10:49 kormat@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 66%: changed binlog_format T274472', diff saved to https://phabricator.wikimedia.org/P14314 and previous config saved to /var/cache/conftool/dbconfig/20210211-104943-kormat.json
10:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
10:40 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
10:39 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
10:34 kormat@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 33%: changed binlog_format T274472', diff saved to https://phabricator.wikimedia.org/P14313 and previous config saved to /var/cache/conftool/dbconfig/20210211-103440-kormat.json
10:33 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
10:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
10:20 kormat@cumin1001: dbctl commit (dc=all): 'db1118 depooling: change binlog_format', diff saved to https://phabricator.wikimedia.org/P14312 and previous config saved to /var/cache/conftool/dbconfig/20210211-101959-kormat.json
10:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1118.eqiad.wmnet with reason: Depooling to change binglog_format T274472
10:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1118.eqiad.wmnet with reason: Depooling to change binglog_format T274472
10:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
10:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
10:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4025.ulsfo.wmnet
10:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4031.ulsfo.wmnet
10:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2035.codfw.wmnet
10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1084.eqiad.wmnet
10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3059.esams.wmnet
10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1083.eqiad.wmnet
10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3058.esams.wmnet
10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2036.codfw.wmnet
10:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5005.eqsin.wmnet
10:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5011.eqsin.wmnet
10:07 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
10:02 jynus: switching db1118 to row_format=STATEMENT as new s1 master candidate
10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5011.eqsin.wmnet
10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4025.ulsfo.wmnet
09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4031.ulsfo.wmnet
09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3059.esams.wmnet
09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3058.esams.wmnet
09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2036.codfw.wmnet
09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2035.codfw.wmnet
09:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1084.eqiad.wmnet
09:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1083.eqiad.wmnet
09:53 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thumbor1004.eqiad.wmnet
09:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
09:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
09:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
09:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
09:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2001.codfw.wmnet
09:12 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
09:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
09:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host rpki2001.codfw.wmnet
09:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
09:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
08:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1004.eqiad.wmnet
08:59 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1003.eqiad.wmnet
08:52 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
08:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
08:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1003.eqiad.wmnet
08:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
08:35 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837
08:29 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1003.eqiad.wmnet
08:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3005.wikimedia.org
08:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast3005.wikimedia.org
08:11 legoktm@deploy1001: Synchronized php-1.36.0-wmf.30/vendor/wikimedia/shellbox/src/Command/BashWrapper.php: wikimedia/shellbox: Don't unconditionally allowPath( 'limit.sh' ) - T274474 (duration: 01m 32s)
08:09 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1002.eqiad.wmnet
08:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4003.wikimedia.org
08:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast4003.wikimedia.org
07:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5002.wikimedia.org
07:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast5002.wikimedia.org
07:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1021.eqiad.wmnet
07:44 XioNoX: push improved loopback dhcp term to all routers
07:39 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1021.eqiad.wmnet
07:25 effie: pool thumbor1001
07:06 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
07:06 elukey: powercycle thumbor1001 - no ssh, no mgmt serial tty available, no racadm getsel infos
06:45 kart_: Updated cxserver to 2021-02-10-134029-production (T274133, T273456, T271980)
06:41 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
06:35 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
06:33 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
03:10 rzl@cumin1001: dbctl commit (dc=all): 'depool db1134', diff saved to https://phabricator.wikimedia.org/P14310 and previous config saved to /var/cache/conftool/dbconfig/20210211-031048-rzl.json
03:10 rzl: depooled db1134
02:18 milimetric@deploy1001: Finished deploy [analytics/refinery@01d811f] (thin): Fix spelling error in mediacounts job (duration: 00m 06s)
02:18 milimetric@deploy1001: Started deploy [analytics/refinery@01d811f] (thin): Fix spelling error in mediacounts job
02:18 milimetric@deploy1001: Finished deploy [analytics/refinery@01d811f]: Fix spelling error in mediacounts job (duration: 11m 06s)
02:07 milimetric@deploy1001: Started deploy [analytics/refinery@01d811f]: Fix spelling error in mediacounts job
02:05 dwisehaupt: move payments1* and frpig1* out of maintenance mode
02:04 eileen: process-control config revision is 726db3446a
02:02 dwisehaupt: move civi1001 out of maintenance mode
01:54 eileen: civicrm revision changed from 3776363c90 to b81cb5e702, config revision is f216d8fe8e
01:35 dwisehaupt: applying new civicrm triggers to frdb1002
01:14 eileen: civicrm revision changed from 2ce8194c07 to 3776363c90, config revision is f216d8fe8e
01:06 dwisehaupt: stopping mariadb replication on frdev1001 and frdb1004
01:05 dwisehaupt: Move payments/civi/frpig into maint mode for civi upgrade
01:04 eileen: process-control config revision is f216d8fe8e
00:26 legoktm@deploy1001: Synchronized wmf-config/profiler.php: Revert "profiler: Send data to excimer-buster pipeline" (duration: 02m 00s)
00:03 milimetric@deploy1001: Finished deploy [analytics/refinery@3da19b6] (thin): More fixes for jobs after cluster upgrade (duration: 00m 07s)
00:03 milimetric@deploy1001: Started deploy [analytics/refinery@3da19b6] (thin): More fixes for jobs after cluster upgrade

2021-02-10

23:53 milimetric@deploy1001: Finished deploy [analytics/refinery@3da19b6]: More fixes for jobs after cluster upgrade (duration: 14m 23s)
23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1328.eqiad.wmnet
23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1327.eqiad.wmnet
23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1326.eqiad.wmnet
23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1325.eqiad.wmnet
23:38 milimetric@deploy1001: Started deploy [analytics/refinery@3da19b6]: More fixes for jobs after cluster upgrade
23:36 eileen: civicrm revision changed from ae24f87158 to 2ce8194c07, config revision is a48a7db0a2
22:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1328.eqiad.wmnet
22:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1327.eqiad.wmnet
22:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1326.eqiad.wmnet
22:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1325.eqiad.wmnet
22:32 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d97f7d9]: query_clicks: Remove result file merging (duration: 01m 27s)
22:30 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d97f7d9]: query_clicks: Remove result file merging
22:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1377.eqiad.wmnet
22:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1369.eqiad.wmnet
22:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1377.eqiad.wmnet
22:13 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1369.eqiad.wmnet
22:07 mutante: mw1369, mw1377 - all servers in this section now consistenly fail to reboot when triggered as the last step of wmf-reimage script
21:43 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1328.eqiad.wmnet with reason: REIMAGE
21:41 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1327.eqiad.wmnet with reason: REIMAGE
21:41 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1328.eqiad.wmnet with reason: REIMAGE
21:39 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1326.eqiad.wmnet with reason: REIMAGE
21:39 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1327.eqiad.wmnet with reason: REIMAGE
21:37 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1325.eqiad.wmnet with reason: REIMAGE
21:37 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1326.eqiad.wmnet with reason: REIMAGE
21:35 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1325.eqiad.wmnet with reason: REIMAGE
21:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1377.eqiad.wmnet with reason: REIMAGE
21:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1369.eqiad.wmnet with reason: REIMAGE
21:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1377.eqiad.wmnet with reason: REIMAGE
21:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1369.eqiad.wmnet with reason: REIMAGE
20:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1293.eqiad.wmnet
20:37 eileen: civicrm revision changed from f161a34266 to ae24f87158, config revision is a48a7db0a2
20:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1293.eqiad.wmnet
20:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1370.eqiad.wmnet
20:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1378.eqiad.wmnet
20:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1378.eqiad.wmnet
20:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1370.eqiad.wmnet
20:23 mutante: mw1370, mw1378 - powercycling via DRAC
20:21 mutante: mw1370, mw1378 - again failing to reboot as the last step of reimaging script
20:19 jgleeson: updated civicrm from 1e9a86dd6e to f161a34266
20:13 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.30 (duration: 01m 02s)
20:12 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.30
20:05 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1324.eqiad.wmnet
20:01 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@1c5477d]: query_clicks: timestamp is now a reserved keyword (duration: 02m 19s)
20:01 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1323.eqiad.wmnet
20:00 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1322.eqiad.wmnet
20:00 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1321.eqiad.wmnet
19:59 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1c5477d]: query_clicks: timestamp is now a reserved keyword
19:54 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1324.eqiad.wmnet
19:54 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1323.eqiad.wmnet
19:54 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1322.eqiad.wmnet
19:54 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1321.eqiad.wmnet
19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1293.eqiad.wmnet with reason: REIMAGE
19:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
19:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1293.eqiad.wmnet with reason: REIMAGE
19:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1378.eqiad.wmnet with reason: REIMAGE
19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1378.eqiad.wmnet with reason: REIMAGE
19:20 thcipriani@deploy1001: Synchronized wmf-config/ProductionServices.php: Remove a couple of useless DNS lookups from mediawiki-config T231025 (duration: 01m 10s)
19:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1294.eqiad.wmnet
19:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1294.eqiad.wmnet
19:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1379.eqiad.wmnet
19:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1370.eqiad.wmnet with reason: REIMAGE
19:15 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
19:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1370.eqiad.wmnet with reason: REIMAGE
19:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1294.eqiad.wmnet
19:04 mutante: mw1379 - racadm racreset - host did not come back from reboot and DRAC says it can't powercycle it.. while it also ALREADY ON
19:00 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1324.eqiad.wmnet with reason: REIMAGE
19:00 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1379.eqiad.wmnet
18:58 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1323.eqiad.wmnet with reason: REIMAGE
18:58 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1324.eqiad.wmnet with reason: REIMAGE
18:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1371.eqiad.wmnet
18:56 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1322.eqiad.wmnet with reason: REIMAGE
18:56 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1323.eqiad.wmnet with reason: REIMAGE
18:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1321.eqiad.wmnet with reason: REIMAGE
18:54 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1322.eqiad.wmnet with reason: REIMAGE
18:52 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1321.eqiad.wmnet with reason: REIMAGE
18:36 andrew@deploy1001: Finished deploy [horizon/deploy@02cb8a4]: security group dashboard policy updates, now after doing a submodule update! (duration: 03m 31s)
18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
18:33 andrew@deploy1001: Started deploy [horizon/deploy@02cb8a4]: security group dashboard policy updates, now after doing a submodule update!
18:32 andrew@deploy1001: Finished deploy [horizon/deploy@4f5a5a7]: security group dashboard policy updates (duration: 00m 07s)
18:32 andrew@deploy1001: Started deploy [horizon/deploy@4f5a5a7]: security group dashboard policy updates
18:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1371.eqiad.wmnet
18:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1294.eqiad.wmnet with reason: REIMAGE
18:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1294.eqiad.wmnet with reason: REIMAGE
18:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thumbor1001.eqiad.wmnet
17:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1295.eqiad.wmnet
17:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1295.eqiad.wmnet
17:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1001.eqiad.wmnet
17:18 shdubsh: restart pybal on low-traffic lvs1015
17:13 shdubsh: restart pybal on backup lvs1016
17:13 andrew@deploy1001: Finished deploy [horizon/deploy@4f5a5a7]: puppet dashboard policy updates (duration: 03m 53s)
17:09 andrew@deploy1001: Started deploy [horizon/deploy@4f5a5a7]: puppet dashboard policy updates
16:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1295.eqiad.wmnet with reason: REIMAGE
16:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1295.eqiad.wmnet with reason: REIMAGE
16:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1379.eqiad.wmnet with reason: REIMAGE
16:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1379.eqiad.wmnet with reason: REIMAGE
16:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1371.eqiad.wmnet with reason: REIMAGE
16:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1371.eqiad.wmnet with reason: REIMAGE
16:20 moritzm: installing unzip security updates
16:12 moritzm: installing atftp security updates
16:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
16:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
15:26 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Do not produce canary events for rdf-streaming-updater streams - T269619 (duration: 01m 13s)
15:11 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.30
15:05 hashar: group0 wikis to 1.36.0-wmf.30 T271344
14:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2033.codfw.wmnet
14:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4024.ulsfo.wmnet
14:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3057.esams.wmnet
14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1082.eqiad.wmnet
14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1081.eqiad.wmnet
14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5010.eqsin.wmnet
14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4030.ulsfo.wmnet
14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3056.esams.wmnet
14:51 jynus: updating puppet-compiler-facts
14:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5004.eqsin.wmnet
14:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2034.codfw.wmnet
14:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2033.codfw.wmnet
14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5004.eqsin.wmnet
14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5010.eqsin.wmnet
14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4024.ulsfo.wmnet
14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4030.ulsfo.wmnet
14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3057.esams.wmnet
14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3056.esams.wmnet
14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2034.codfw.wmnet
14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1082.eqiad.wmnet
14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1081.eqiad.wmnet
12:26 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T269619: [wdqs] Add flink sideoutput stream definitions (duration: 01m 06s)
12:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove Wikibase.NewItemIdFormatter log channel (T268870) 2/2 (prod no-op) (duration: 01m 08s)
12:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove Wikibase.NewItemIdFormatter log channel (T268870) 1/2 (duration: 01m 07s)
12:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e8214ee: Enable GrowthExperiments on bnwiki (T266020) (duration: 01m 08s)
12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 2d8cb10: Set wgGEHelpPanelAskMentor to true for several wikis (T272753) (duration: 01m 21s)
12:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5003.eqsin.wmnet
12:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4029.ulsfo.wmnet
11:56 vgutierrez: powercycle cp5003
11:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3055.esams.wmnet
11:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5009.eqsin.wmnet
11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1080.eqiad.wmnet
11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3054.esams.wmnet
11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1079.eqiad.wmnet
11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2032.codfw.wmnet
11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
11:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5003.eqsin.wmnet
11:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5009.eqsin.wmnet
11:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4029.ulsfo.wmnet
11:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3055.esams.wmnet
11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3054.esams.wmnet
11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2032.codfw.wmnet
11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1080.eqiad.wmnet
11:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1079.eqiad.wmnet
11:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4023.ulsfo.wmnet
11:22 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1404.eqiad.wmnet
11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4023.ulsfo.wmnet
11:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5008.eqsin.wmnet
10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14301 and previous config saved to /var/cache/conftool/dbconfig/20210210-104649-root.json
10:42 vgutierrez: powercycle cp5008
10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4028.ulsfo.wmnet
10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5002.eqsin.wmnet
10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4022.ulsfo.wmnet
10:39 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3052.esams.wmnet
10:39 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2030.codfw.wmnet
10:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3053.esams.wmnet
10:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2029.codfw.wmnet
10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1078.eqiad.wmnet
10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1077.eqiad.wmnet
10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 80%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14300 and previous config saved to /var/cache/conftool/dbconfig/20210210-103146-root.json
10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5002.eqsin.wmnet
10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5008.eqsin.wmnet
10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4022.ulsfo.wmnet
10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4028.ulsfo.wmnet
10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3053.esams.wmnet
10:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3052.esams.wmnet
10:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2030.codfw.wmnet
10:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2029.codfw.wmnet
10:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1078.eqiad.wmnet
10:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1077.eqiad.wmnet
10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 60%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14299 and previous config saved to /var/cache/conftool/dbconfig/20210210-101642-root.json
10:16 moritzm: installing firejail security updates
10:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
10:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4021.ulsfo.wmnet
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 40%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14298 and previous config saved to /var/cache/conftool/dbconfig/20210210-100139-root.json
10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14297 and previous config saved to /var/cache/conftool/dbconfig/20210210-100111-root.json
10:00 vgutierrez: power cycling cp4021
09:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5007.eqsin.wmnet
09:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5001.eqsin.wmnet
09:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4027.ulsfo.wmnet
09:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
09:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
09:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2027.codfw.wmnet
09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 20%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14296 and previous config saved to /var/cache/conftool/dbconfig/20210210-094635-root.json
09:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2028.codfw.wmnet
09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14295 and previous config saved to /var/cache/conftool/dbconfig/20210210-094608-root.json
09:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1075.eqiad.wmnet
09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1076.eqiad.wmnet
09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5001.eqsin.wmnet
09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5007.eqsin.wmnet
09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4021.ulsfo.wmnet
09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4027.ulsfo.wmnet
09:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
09:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
09:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2028.codfw.wmnet
09:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2027.codfw.wmnet
09:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1076.eqiad.wmnet
09:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1075.eqiad.wmnet
09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14294 and previous config saved to /var/cache/conftool/dbconfig/20210210-093132-root.json
09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14293 and previous config saved to /var/cache/conftool/dbconfig/20210210-093104-root.json
09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14292 and previous config saved to /var/cache/conftool/dbconfig/20210210-093011-root.json
09:23 vgutierrez: rolling restart of cp nodes to catch up on kernel upgrades
09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14290 and previous config saved to /var/cache/conftool/dbconfig/20210210-091601-root.json
09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 80%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14289 and previous config saved to /var/cache/conftool/dbconfig/20210210-091507-root.json
09:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
09:10 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 10%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14288 and previous config saved to /var/cache/conftool/dbconfig/20210210-090057-root.json
09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 60%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14287 and previous config saved to /var/cache/conftool/dbconfig/20210210-090004-root.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 40%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14286 and previous config saved to /var/cache/conftool/dbconfig/20210210-084500-root.json
08:41 legoktm: depooling mw1404.eqiad.wmnet for perf benchmarking (T274041)
08:41 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1404.eqiad.wmnet
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 20%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14285 and previous config saved to /var/cache/conftool/dbconfig/20210210-082957-root.json
08:19 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836
08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14284 and previous config saved to /var/cache/conftool/dbconfig/20210210-081453-root.json
08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 T266483', diff saved to https://phabricator.wikimedia.org/P14283 and previous config saved to /var/cache/conftool/dbconfig/20210210-080512-marostegui.json
06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1170:3312, db1170:3317 T258361', diff saved to https://phabricator.wikimedia.org/P14282 and previous config saved to /var/cache/conftool/dbconfig/20210210-064330-marostegui.json
06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1170:3312, db1170:3317 T258361', diff saved to https://phabricator.wikimedia.org/P14281 and previous config saved to /var/cache/conftool/dbconfig/20210210-063534-marostegui.json
06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1170:3312, db1170:3317 with minimal weight for the first time T258361', diff saved to https://phabricator.wikimedia.org/P14279 and previous config saved to /var/cache/conftool/dbconfig/20210210-061924-marostegui.json
06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1170:3312 and db1170:3317 to dbctl, depooled T258361', diff saved to https://phabricator.wikimedia.org/P14278 and previous config saved to /var/cache/conftool/dbconfig/20210210-061638-marostegui.json
06:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1020.eqiad.wmnet
06:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1020.eqiad.wmnet
05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 to clone db1162 T258361', diff saved to https://phabricator.wikimedia.org/P14277 and previous config saved to /var/cache/conftool/dbconfig/20210210-055846-marostegui.json
03:46 ryankemper: `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph.service`
01:54 krinkle@deploy1001: Finished deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - Ib67da9 (duration: 00m 06s)
01:54 krinkle@deploy1001: Started deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - Ib67da9
01:43 krinkle@deploy1001: Finished deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - Ibf28e02ec03 (duration: 00m 06s)
01:43 krinkle@deploy1001: Started deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - Ibf28e02ec03
01:06 milimetric@deploy1001: Finished deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade (duration: 00m 06s)
01:06 milimetric@deploy1001: Started deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade
01:06 milimetric@deploy1001: Finished deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade (duration: 10m 55s)
00:58 mutante: doc1001 - reloaded apache2
00:55 milimetric@deploy1001: Started deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade
00:42 Amir1: changing frwiki to wmf.30 in mwdebug1002 to test T264391
00:33 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/FeaturedFeeds: Fix issues with recent caching update (T264391) (duration: 01m 10s)
00:22 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.30 (duration: 24m 10s)
00:01 twentyafterfour: train status: wmf.28 and wmf.29 are undeployed. wmf.27 is everywhere with the exception of testwikis which is at wmf.30 refs T271344

2021-02-09

23:58 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.30
23:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
23:55 ryankemper: Depooled `wdqs1005` - it's catching up on hours of lag
23:55 twentyafterfour@deploy1001: Finished scap: (no justification provided) (duration: 08m 43s)
23:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2250.codfw.wmnet
23:50 mutante: mw1383,mw1385 - scap pull, php
23:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1296.eqiad.wmnet
23:47 twentyafterfour: running scap sync-world
23:47 twentyafterfour@deploy1001: Started scap: (no justification provided)
23:46 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
23:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1296.eqiad.wmnet
23:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1380.eqiad.wmnet
23:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1380.eqiad.wmnet
23:28 mutante: mw1380 - powercycling after it did not come back from normal reboot during reimaging
23:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1372.eqiad.wmnet
23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1372.eqiad.wmnet
23:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2250.codfw.wmnet with reason: REIMAGE
23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2250.codfw.wmnet with reason: REIMAGE
22:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1296.eqiad.wmnet with reason: REIMAGE
22:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1296.eqiad.wmnet with reason: REIMAGE
22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1372.eqiad.wmnet with reason: REIMAGE
22:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1372.eqiad.wmnet with reason: REIMAGE
22:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2259.codfw.wmnet
22:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2259.codfw.wmnet
22:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1373.eqiad.wmnet
22:28 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1373.eqiad.wmnet
22:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
22:23 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1298.eqiad.wmnet
22:23 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GlobalWatchlist extension on testwiki (T260862) (duration: 02m 51s)
22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2259.codfw.wmnet with reason: REIMAGE
22:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1380.eqiad.wmnet with reason: REIMAGE
22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2259.codfw.wmnet with reason: REIMAGE
21:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1373.eqiad.wmnet with reason: REIMAGE
21:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1380.eqiad.wmnet with reason: REIMAGE
21:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1373.eqiad.wmnet with reason: REIMAGE
21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2260.codfw.wmnet
21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1381.eqiad.wmnet
21:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1384.eqiad.wmnet
21:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2260.codfw.wmnet
21:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1384.eqiad.wmnet
21:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1381.eqiad.wmnet
21:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1298.eqiad.wmnet with reason: REIMAGE
21:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1298.eqiad.wmnet with reason: REIMAGE
21:10 elukey: Analytics Hadoop cluster upgrade completed
21:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2260.codfw.wmnet with reason: REIMAGE
21:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1381.eqiad.wmnet with reason: REIMAGE
21:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1384.eqiad.wmnet with reason: REIMAGE
21:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2260.codfw.wmnet with reason: REIMAGE
21:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1381.eqiad.wmnet with reason: REIMAGE
21:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1384.eqiad.wmnet with reason: REIMAGE
20:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1299.eqiad.wmnet
20:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1299.eqiad.wmnet
20:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2263.codfw.wmnet
20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1382.eqiad.wmnet
20:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1385.eqiad.wmnet
20:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1382.eqiad.wmnet
20:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1385.eqiad.wmnet
20:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2263.codfw.wmnet
20:21 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
20:13 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
20:12 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
20:12 otto@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - otto@cumin1001
20:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
20:11 otto@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - otto@cumin1001
20:10 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
20:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1299.eqiad.wmnet with reason: REIMAGE
20:08 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
20:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1299.eqiad.wmnet with reason: REIMAGE
20:06 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
20:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1385.eqiad.wmnet with reason: REIMAGE
20:00 twentyafterfour: prepping 1.36.0-wmf.30
20:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1382.eqiad.wmnet with reason: REIMAGE
19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1385.eqiad.wmnet with reason: REIMAGE
19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2263.codfw.wmnet with reason: REIMAGE
19:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1382.eqiad.wmnet with reason: REIMAGE
19:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2263.codfw.wmnet with reason: REIMAGE
19:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2264.codfw.wmnet
19:35 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
19:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1383.eqiad.wmnet
19:26 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
19:23 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
19:21 ryankemper: T262211 `sudo cumin 'P{relforge*}' 'sudo run-puppet-agent'` on `ryankemper@cumin1001`
19:19 ryankemper: T262211 Attempting to bring `relforge100[3,4]` into service; merging https://gerrit.wikimedia.org/r/661229
19:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1300.eqiad.wmnet
19:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2220.codfw.wmnet
19:10 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
19:08 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
19:04 elukey@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - elukey@cumin1001
19:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - elukey@cumin1001
19:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
19:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1383.eqiad.wmnet
19:01 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
19:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2264.codfw.wmnet
18:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
18:57 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
18:46 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
18:45 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
18:42 ryankemper: T267927 [WDQS Data Reload] `sudo cookbook sre.wdqs.data-reload wdqs1010.eqiad.wmnet --reuse-downloaded-dump --reload-data wikidata --reason 'T267927: Reload wikidata jnl from fresh dumps' --task-id T267927` on `ryankemper@cumin1001` tmux session `wdqs_data_reload_1010`
18:41 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
18:40 ryankemper: T267927 [WDQS Data Reload] `sudo cookbook sre.wdqs.data-reload wdqs1009.eqiad.wmnet --reuse-downloaded-dump --reload-data wikidata --skolemize --reason 'T267927: Reload wikidata jnl from fresh dumps' --task-id T267927` on `ryankemper@cumin1001` tmux session `wdqs_data_reload_1009`
18:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
18:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
18:39 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
18:37 ryankemper: T267927 [WDQS Data Reload] Clearing old wikidata journal file to free disk space before beginning data reload:`sudo systemctl status wdqs-blazegraph && sudo systemctl stop wdqs-blazegraph && sudo rm -fv /srv/wdqs/wikidata.jnl && sudo systemctl start wdqs-blazegraph` on `wdqs100[9,10]`
18:37 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1300.eqiad.wmnet
18:37 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2220.codfw.wmnet
18:32 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
18:29 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
18:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
18:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
18:14 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
17:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
17:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
17:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1300.eqiad.wmnet with reason: REIMAGE
17:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2220.codfw.wmnet with reason: REIMAGE
17:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1300.eqiad.wmnet with reason: REIMAGE
17:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2220.codfw.wmnet with reason: REIMAGE
17:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
17:07 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
17:01 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
16:47 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.29
16:21 moritzm: installing wireshark security updates
16:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:14 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836
16:11 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
15:59 volker-e@deploy1001: Finished deploy [design/style-guide@b9b7ee6]: Deploy design/style-guide: b9b7ee6 “Components”: Fix components overview SVG rendering glitch (#439) (duration: 00m 07s)
15:59 volker-e@deploy1001: Started deploy [design/style-guide@b9b7ee6]: Deploy design/style-guide: b9b7ee6 “Components”: Fix components overview SVG rendering glitch (#439)
15:32 papaul: power down logstash2035 for relocation
15:23 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 95 hosts with reason: upgrading openstack
15:22 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 95 hosts with reason: upgrading openstack
15:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 95 hosts with reason: upgrading openstack
15:22 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack
15:22 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading openstack
15:21 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 95 hosts with reason: upgrading openstack
15:15 papaul: power down mw2220 for maintenance
15:11 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.29 (duration: 01m 11s)
15:10 moritzm: readding ganeti5002 to the eqsin Ganeti cluster following mainboard replacement/reinstall T261130
15:10 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.29
15:06 hashar@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/FeaturedFeeds: Revert "Caching fixes" T264391 (duration: 01m 25s)
14:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack
14:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading openstack
14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14270 and previous config saved to /var/cache/conftool/dbconfig/20210209-145206-root.json
14:50 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2001.codfw.wmnet
14:48 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host pybal-test2001.codfw.wmnet
14:43 gehel: rebooting wdqs1009 / 1010 for kernel upgrade
14:37 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.36.0-wmf.29"
14:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 85%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14269 and previous config saved to /var/cache/conftool/dbconfig/20210209-143703-root.json
14:29 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.29 (duration: 01m 06s)
14:28 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.29
14:26 volans: cd /srv/external-monitoring; git fetch/status/pull on wikitech-static - T273951
14:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14268 and previous config saved to /var/cache/conftool/dbconfig/20210209-142159-root.json
14:21 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.29
14:14 gehel: depooling wdqs1005, catching up on lag
14:10 hashar@deploy1001: Synchronized php-1.36.0-wmf.29/includes/libs/objectcache/wancache/WANObjectCache.php: WANObjectCache: throw on Closure - T273242 (duration: 01m 08s)
14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 60%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14267 and previous config saved to /var/cache/conftool/dbconfig/20210209-140655-root.json
13:52 Urbanecm: Deploy security patch (T274152)
13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14266 and previous config saved to /var/cache/conftool/dbconfig/20210209-135152-root.json
13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 40%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14265 and previous config saved to /var/cache/conftool/dbconfig/20210209-133648-root.json
13:25 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 30%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14264 and previous config saved to /var/cache/conftool/dbconfig/20210209-132145-root.json
13:08 twentyafterfour: restart phabricator daemons to free 3.5gb of ram (memory leak?)
13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14263 and previous config saved to /var/cache/conftool/dbconfig/20210209-130641-root.json
12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 20%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14262 and previous config saved to /var/cache/conftool/dbconfig/20210209-125138-root.json
12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 15%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14261 and previous config saved to /var/cache/conftool/dbconfig/20210209-123634-root.json
12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 13%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14260 and previous config saved to /var/cache/conftool/dbconfig/20210209-122131-root.json
12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14259 and previous config saved to /var/cache/conftool/dbconfig/20210209-120627-root.json
12:05 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
12:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop analytics cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2010.codfw.wmnet
11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2008.codfw.wmnet
11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2006.codfw.wmnet
11:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2005.codfw.wmnet
11:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1013.eqiad.wmnet
11:52 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1010.eqiad.wmnet
11:52 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1008.eqiad.wmnet
11:51 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1007.eqiad.wmnet
11:51 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1006.eqiad.wmnet
11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 8%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14258 and previous config saved to /var/cache/conftool/dbconfig/20210209-115124-root.json
11:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1013.eqiad.wmnet
11:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1001.eqiad.wmnet
11:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1014.eqiad.wmnet
11:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1014.eqiad.wmnet
11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 5%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14257 and previous config saved to /var/cache/conftool/dbconfig/20210209-113620-root.json
11:34 elukey: start the upgrade process for Hadoop Analytics
11:33 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop analytics cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1015.eqiad.wmnet
11:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1015.eqiad.wmnet
11:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 4%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14256 and previous config saved to /var/cache/conftool/dbconfig/20210209-112116-root.json
11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
11:17 vgutierrez: rolling restart of eqiad LVS instances to catch up on kernel upgrades
11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3005.esams.wmnet
11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 3%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14255 and previous config saved to /var/cache/conftool/dbconfig/20210209-110613-root.json
11:02 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3005.esams.wmnet
10:57 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
10:57 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
10:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3006.esams.wmnet
10:53 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2001.codfw.wmnet
10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 2%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14254 and previous config saved to /var/cache/conftool/dbconfig/20210209-105109-root.json
10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3006.esams.wmnet
10:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3007.esams.wmnet
10:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3007.esams.wmnet
10:41 vgutierrez: rolling restart of esams LVS instances to catch up on kernel upgrades
10:40 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2001.codfw.wmnet
10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 100%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14253 and previous config saved to /var/cache/conftool/dbconfig/20210209-103443-root.json
10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 100%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14252 and previous config saved to /var/cache/conftool/dbconfig/20210209-103414-root.json
10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1157 for the first time in s3 T258361', diff saved to https://phabricator.wikimedia.org/P14251 and previous config saved to /var/cache/conftool/dbconfig/20210209-102109-marostegui.json
10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 75%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14250 and previous config saved to /var/cache/conftool/dbconfig/20210209-101939-root.json
10:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1019.eqiad.wmnet
10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 75%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14249 and previous config saved to /var/cache/conftool/dbconfig/20210209-101911-root.json
10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1157 to dbctl, depooled T258361', diff saved to https://phabricator.wikimedia.org/P14248 and previous config saved to /var/cache/conftool/dbconfig/20210209-101556-marostegui.json
10:13 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1019.eqiad.wmnet
10:12 gehel@cumin1001: START - Cookbook sre.wdqs.reboot
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 50%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14247 and previous config saved to /var/cache/conftool/dbconfig/20210209-100436-root.json
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 50%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14246 and previous config saved to /var/cache/conftool/dbconfig/20210209-100407-root.json
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 25%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14245 and previous config saved to /var/cache/conftool/dbconfig/20210209-094932-root.json
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 25%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14244 and previous config saved to /var/cache/conftool/dbconfig/20210209-094904-root.json
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 10%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14243 and previous config saved to /var/cache/conftool/dbconfig/20210209-093429-root.json
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 10%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14242 and previous config saved to /var/cache/conftool/dbconfig/20210209-093400-root.json
09:22 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836
08:44 XioNoX: repool esams - T272342
08:30 XioNoX: rollback redirect ns2 to authdns1001 - T252631
08:09 XioNoX: alright, brace yourself, esams switch stack is going to go down
08:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 32 hosts with reason: switch upgrade
08:02 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on 32 hosts with reason: switch upgrade
07:54 XioNoX: redirect ns2 to authdns1001 - T252631
07:47 hashar@deploy1001: Finished deploy [integration/docroot@672e79f]: build: Add /scap/log to gitignore (duration: 00m 06s)
07:47 hashar@deploy1001: Started deploy [integration/docroot@672e79f]: build: Add /scap/log to gitignore
07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1081 from dbctl T273040', diff saved to https://phabricator.wikimedia.org/P14241 and previous config saved to /var/cache/conftool/dbconfig/20210209-073455-marostegui.json
07:20 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14240 and previous config saved to /var/cache/conftool/dbconfig/20210209-072038-root.json
07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14239 and previous config saved to /var/cache/conftool/dbconfig/20210209-070534-root.json
07:04 XioNoX: depool disable 2 uplinks on asw2-esams - T272342
06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14238 and previous config saved to /var/cache/conftool/dbconfig/20210209-065031-root.json
06:48 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
06:48 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
06:48 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
06:47 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@582b070]: 0.3.63 (duration: 06m 46s)
06:44 XioNoX: depool esams for network maintenance - T272342
06:41 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.63` on canary `wdqs1003`; proceeding to rest of fleet
06:40 ryankemper@deploy1001: Started deploy [wdqs/wdqs@582b070]: 0.3.63
06:40 ryankemper: Pooled `wdqs1007` and depooled `wdqs1005` (`1005` is ~12 hours behind)
06:38 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.63`. Pre-deploy tests passing on canary `wdqs1003`
06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14237 and previous config saved to /var/cache/conftool/dbconfig/20210209-063527-root.json
06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14236 and previous config saved to /var/cache/conftool/dbconfig/20210209-062024-root.json
06:20 marostegui: Stop mysql on s2 and s7 on db1090 to clone db1170 T258361
06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 T258361', diff saved to https://phabricator.wikimedia.org/P14234 and previous config saved to /var/cache/conftool/dbconfig/20210209-061822-marostegui.json
06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14233 and previous config saved to /var/cache/conftool/dbconfig/20210209-060520-root.json
05:02 krinkle@deploy1001: Finished deploy [integration/docroot@fdfb265]: I271e6054880, T273247 (duration: 00m 06s)
05:02 krinkle@deploy1001: Started deploy [integration/docroot@fdfb265]: I271e6054880, T273247
01:56 tstarling@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/FeaturedFeeds: probable fix for UBN T273242 (duration: 01m 06s)
01:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1302.eqiad.wmnet
01:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1301.eqiad.wmnet
00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1302.eqiad.wmnet
00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1301.eqiad.wmnet
00:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1387.eqiad.wmnet
00:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1386.eqiad.wmnet
00:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1386.eqiad.wmnet
00:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1387.eqiad.wmnet
00:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1301.eqiad.wmnet with reason: REIMAGE
00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1302.eqiad.wmnet with reason: REIMAGE

2021-02-08

23:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1301.eqiad.wmnet with reason: REIMAGE
23:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1302.eqiad.wmnet with reason: REIMAGE
23:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2220.codfw.wmnet with reason: T273803
23:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2220.codfw.wmnet with reason: T273803
23:50 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2220.codfw.wmnet
23:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2220.codfw.wmnet
23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1386.eqiad.wmnet with reason: REIMAGE
23:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1386.eqiad.wmnet with reason: REIMAGE
23:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1387.eqiad.wmnet with reason: REIMAGE
23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1387.eqiad.wmnet with reason: REIMAGE
23:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1388.eqiad.wmnet
23:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1303.eqiad.wmnet
23:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2245.codfw.wmnet
23:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1397.eqiad.wmnet
22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1274.eqiad.wmnet
22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1273.eqiad.wmnet
22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1272.eqiad.wmnet
22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1271.eqiad.wmnet
22:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2245.codfw.wmnet
22:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1388.eqiad.wmnet
22:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1303.eqiad.wmnet
22:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1397.eqiad.wmnet
22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1274.eqiad.wmnet
22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1273.eqiad.wmnet
22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1271.eqiad.wmnet
21:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1303.eqiad.wmnet with reason: REIMAGE
21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1303.eqiad.wmnet with reason: REIMAGE
21:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2245.codfw.wmnet with reason: REIMAGE
21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2245.codfw.wmnet with reason: REIMAGE
21:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
21:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
21:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1388.eqiad.wmnet with reason: REIMAGE
21:29 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw1273.eqiad.wmnet with reason: reimaging
21:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw1273.eqiad.wmnet with reason: reimaging
21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw1271.eqiad.wmnet with reason: reimaging
21:28 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw1271.eqiad.wmnet with reason: reimaging
21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1274.eqiad.wmnet with reason: REIMAGE
21:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1388.eqiad.wmnet with reason: REIMAGE
21:25 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1272.eqiad.wmnet with reason: REIMAGE
21:25 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1274.eqiad.wmnet with reason: REIMAGE
21:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1304.eqiad.wmnet
21:24 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1271.eqiad.wmnet with reason: REIMAGE
21:24 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1273.eqiad.wmnet with reason: REIMAGE
21:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1271.eqiad.wmnet with reason: REIMAGE
21:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1273.eqiad.wmnet with reason: REIMAGE
21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1272.eqiad.wmnet with reason: REIMAGE
21:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1305.eqiad.wmnet
21:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1304.eqiad.wmnet
21:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1305.eqiad.wmnet
21:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1389.eqiad.wmnet
21:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1389.eqiad.wmnet
21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1390.eqiad.wmnet
21:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1390.eqiad.wmnet
20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1304.eqiad.wmnet with reason: REIMAGE
20:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1304.eqiad.wmnet with reason: REIMAGE
20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1305.eqiad.wmnet with reason: REIMAGE
20:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1305.eqiad.wmnet with reason: REIMAGE
20:20 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1389.eqiad.wmnet with reason: REIMAGE
20:17 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Undo migration of SpecialMuteSubmit on all wikis except testwiki - T268517 (duration: 01m 06s)
20:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1389.eqiad.wmnet with reason: REIMAGE
20:16 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1390.eqiad.wmnet with reason: REIMAGE
20:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1390.eqiad.wmnet with reason: REIMAGE
20:11 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
19:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1391.eqiad.wmnet
19:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
19:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
19:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
19:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1391.eqiad.wmnet
19:48 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ca9bba1]: cirrus_namespace_map: only overwrite on success (duration: 01m 19s)
19:47 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ca9bba1]: cirrus_namespace_map: only overwrite on success
19:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
19:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
19:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
19:20 urbanecm@deploy1001: Synchronized wmf-config/config/dawiki.yaml: 3f39eef: Enable GrowthExperiments at dawiki (T256126; 3/3) (duration: 01m 04s)
19:18 urbanecm@deploy1001: Synchronized dblists/growthexperiments.dblist: 3f39eef: Enable GrowthExperiments at dawiki (T256126; 2/3) (duration: 01m 03s)
19:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3f39eef: Enable GrowthExperiments at dawiki (T256126) (duration: 01m 05s)
19:13 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
19:11 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1391.eqiad.wmnet with reason: REIMAGE
19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1391.eqiad.wmnet with reason: REIMAGE
19:08 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3e94e21: Make DiscussionTools newtopictool available on testwiki (duration: 01m 07s)
18:52 mutante: mw1391 - reimaging
18:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2037.codfw.wmnet
18:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@a458845]: Add trwikivoyage T271262 and restore restbase2009 (duration: 17m 13s)
18:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2037.codfw.wmnet
18:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2036.codfw.wmnet
18:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2036.codfw.wmnet
18:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2035.codfw.wmnet
18:16 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2035.codfw.wmnet
18:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2034.codfw.wmnet
18:12 ppchelko@deploy1001: Started deploy [restbase/deploy@a458845]: Add trwikivoyage T271262 and restore restbase2009
18:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2034.codfw.wmnet
18:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2033.codfw.wmnet
18:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2033.codfw.wmnet
17:57 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
17:57 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
17:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2032.codfw.wmnet
17:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2032.codfw.wmnet
17:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2031.codfw.wmnet
17:31 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2031.codfw.wmnet
17:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2030.codfw.wmnet
17:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2030.codfw.wmnet
17:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2029.codfw.wmnet
17:23 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings - Add eventgate-analytics-external - T272863 (no-op) (duration: 01m 06s)
17:21 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: ProductionServices - Add eventgate-analytics-external - T272863 (no-op) (duration: 01m 06s)
17:20 otto@deploy1001: sync-file aborted: ProductionServices - Add eventgate-analytics-external - T272863 (no-op) (duration: 00m 02s)
17:20 otto@deploy1001: Synchronized wmf-config/LabsServices.php: LabsServices - Add eventgate-analytics-external - T272998 (duration: 01m 08s)
17:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2029.codfw.wmnet
17:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2027.codfw.wmnet
17:12 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2027.codfw.wmnet
17:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2026.codfw.wmnet
17:06 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2026.codfw.wmnet
16:30 XioNoX: adding option-82 to all prod vlans DHCP - T269855
16:02 Urbanecm: Deploy security patch (T71367)
15:49 gehel: repool wdqs1012 - catched up on lag
15:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
15:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
15:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on maps1001.eqiad.wmnet with reason: Server being relocated
15:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on maps1001.eqiad.wmnet with reason: Server being relocated
15:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
15:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
15:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
15:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
15:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1002.wikimedia.org
15:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica1002.wikimedia.org
15:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
15:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1001.wikimedia.org
15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica1001.wikimedia.org
15:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2004.wikimedia.org
15:13 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/Wikibase/build/travis/install.sh: Backport: Fix Travis CI build on release branches (prod no-op, syncing only to avoid drift) (duration: 01m 08s)
15:11 ottomata: set kafka topic retention to 31 days for (eqiad|codfw.rdf-streaming-updater.mutation) in kafka main-eqiad and main-codfw - T269619
15:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica2004.wikimedia.org
15:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2003.wikimedia.org
15:04 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1001.eqiad.wmnet
15:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on maps1001.eqiad.wmnet with reason: Server being relocated
15:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on maps1001.eqiad.wmnet with reason: Server being relocated
15:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica2003.wikimedia.org
14:50 herron: stopped ES on logstash1020 in prep for re-rack T273984
14:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
14:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
14:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2026.codfw.wmnet
14:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
14:31 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2026.codfw.wmnet
14:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2024.codfw.wmnet
14:17 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
14:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2023.codfw.wmnet
14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2023.codfw.wmnet
14:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2022.codfw.wmnet
14:08 Urbanecm: Deploy security patch for T223654
14:05 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2022.codfw.wmnet
14:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2021.codfw.wmnet
13:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2021.codfw.wmnet
13:54 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - T272836
13:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5001.eqsin.wmnet
13:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5001.eqsin.wmnet
13:09 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/SyntaxHighlight_GeSHi/modules/pygments.wrapper.less: Move position:relative to inner wrapper (T272853) (duration: 01m 08s)
{{safesubst:SAL entry|1=13:06 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/Wikibase/repo/includes/Store/Sql/SqlChangeDispatchCoordinator.php: [[gerrit:662666|Cast chd_seen as signed integer (duration: 01m 10s)}}
12:55 daniel@deploy1001: Synchronized php-1.36.0-wmf.29/includes/libs/objectcache/wancache/WANObjectCache.php: Backport: objectcache: Log more info when WANObjectCache async refresh fails (phab:T264391) (duration: 01m 07s)
12:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
12:22 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5002.eqsin.wmnet
12:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5002.eqsin.wmnet
12:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5003.eqsin.wmnet
12:07 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] rename ores_articletopics -> weighted_tags (duration: 01m 07s)
12:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5003.eqsin.wmnet
12:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2007.codfw.wmnet
11:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2007.codfw.wmnet
11:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2008.codfw.wmnet
11:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2008.codfw.wmnet
11:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2009.codfw.wmnet
11:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 07s)
11:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 07s)
11:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2020.codfw.wmnet
11:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2009.codfw.wmnet
11:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2020.codfw.wmnet
11:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2019.codfw.wmnet
11:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2010.codfw.wmnet
11:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
11:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
11:25 Urbanecm: Deploy security patch for T71617
11:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2019.codfw.wmnet
11:23 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
11:23 hnowlan: resyncing postgres on maps1005
11:22 hnowlan: resyncing postgres on maps1001
11:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2010.codfw.wmnet
11:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4005.ulsfo.wmnet
11:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4005.ulsfo.wmnet
11:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4006.ulsfo.wmnet
11:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4006.ulsfo.wmnet
11:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4007.ulsfo.wmnet
10:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4007.ulsfo.wmnet
10:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2025.codfw.wmnet
10:05 moritzm: updating netboot images to Buster 10.8 T274099
10:05 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2025.codfw.wmnet
09:43 XioNoX: failover pfw3-eqiad RG1 to node 0 T263833
09:42 marostegui: Stop MySQL on db1111 T273982
09:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4007.ulsfo.wmnet
09:23 vgutierrez: restart varnish-fe on cp1087
09:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4007.ulsfo.wmnet
09:20 vgutierrez: rolling restart of LVS instances to catch up on kernel upgrades
09:00 gehel: depool and restart blazegraph on wdqs1005 / wdqs1012
08:56 XioNoX: push pfw policies T273989
08:33 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 T273982', diff saved to https://phabricator.wikimedia.org/P14229 and previous config saved to /var/cache/conftool/dbconfig/20210208-070858-marostegui.json
06:50 effie: Removed mc1024 from mcrouter, some resharding is expected
06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1094 from dbctl T273710', diff saved to https://phabricator.wikimedia.org/P14228 and previous config saved to /var/cache/conftool/dbconfig/20210208-061319-marostegui.json

2021-02-07

22:58 Urbanecm: Reset password for TheresNoTime (T274087)

2021-02-06

08:59 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
08:58 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
08:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
08:52 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
03:40 ryankemper: Deleted dump taking up diskspace on `wdqs1009`, disk space warning will resolve now
01:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1319.eqiad.wmnet
01:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet
01:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1319.eqiad.wmnet
01:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1313.eqiad.wmnet
01:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2265.codfw.wmnet
00:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1366.eqiad.wmnet
00:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1366.eqiad.wmnet
00:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2265.codfw.wmnet
00:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE
00:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE
00:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE
00:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE
00:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE
00:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE
00:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1366.eqiad.wmnet with reason: REIMAGE
00:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1366.eqiad.wmnet with reason: REIMAGE

2021-02-05

23:37 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1285.eqiad.wmnet
23:35 ryankemper: T267927 Re-downloading latest dumps (main database, lexeme) in tmux session `downloads_dumps` on `ryankemper@wdqs1009.eqiad.wmnet`
23:15 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1285.eqiad.wmnet
22:56 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
22:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
22:50 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
22:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
22:46 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
22:46 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
22:42 ryankemper: T267927 `sudo cookbook sre.wdqs.data-reload wdqs1009.eqiad.wmnet --reuse-downloaded-dump --reload-data wikidata --skolemize --reason 'T267927: Reload wikidata jnl from fresh dumps' --task-id T267927` failing with `ERROR org.wikidata.query.rdf.tool.Munge - Fatal error munging RDF`
22:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
22:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
22:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
22:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1269.eqiad.wmnet
22:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1269.eqiad.wmnet
22:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1306.eqiad.wmnet
22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1306.eqiad.wmnet
22:03 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1285.eqiad.wmnet with reason: REIMAGE
22:01 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1285.eqiad.wmnet with reason: REIMAGE
21:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1393.eqiad.wmnet
21:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1393.eqiad.wmnet
21:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1392.eqiad.wmnet
21:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1392.eqiad.wmnet
21:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2266.codfw.wmnet
21:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1269.eqiad.wmnet with reason: REIMAGE
21:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1269.eqiad.wmnet with reason: REIMAGE
21:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1278.eqiad.wmnet
21:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1306.eqiad.wmnet with reason: REIMAGE
21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2266.codfw.wmnet
21:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1306.eqiad.wmnet with reason: REIMAGE
21:09 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2254.codfw.wmnet
21:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2254.codfw.wmnet
21:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2266.codfw.wmnet with reason: REIMAGE
21:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2266.codfw.wmnet with reason: REIMAGE
20:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1307.eqiad.wmnet
20:57 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1392.eqiad.wmnet with reason: REIMAGE
20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1397.eqiad.wmnet
20:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1397.eqiad.wmnet
20:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1392.eqiad.wmnet with reason: REIMAGE
20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1393.eqiad.wmnet with reason: REIMAGE
20:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1393.eqiad.wmnet with reason: REIMAGE
20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2289.codfw.wmnet
20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1394.eqiad.wmnet
20:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1395.eqiad.wmnet
20:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2289.codfw.wmnet
20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1394.eqiad.wmnet
20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1395.eqiad.wmnet
20:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2254.codfw.wmnet with reason: REIMAGE
20:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2254.codfw.wmnet with reason: REIMAGE
20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
20:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2289.codfw.wmnet with reason: REIMAGE
19:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2289.codfw.wmnet with reason: REIMAGE
19:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1394.eqiad.wmnet with reason: REIMAGE
19:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1395.eqiad.wmnet with reason: REIMAGE
19:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1394.eqiad.wmnet with reason: REIMAGE
19:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1395.eqiad.wmnet with reason: REIMAGE
19:39 mutante: reimaging 2 scap proxies in codfw because there are no deployments today
15:32 cmjohnson1: replacing optics and fiber on pfw3a-eqiad:xe-0/0/17 and fasw-c1a-eqiad:xe-0/2/0 T271295
15:28 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@6b74e78]: (no justification provided) (duration: 00m 26s)
15:28 oblivian@deploy1001: Started deploy [docker-pkg/deploy@6b74e78]: (no justification provided)
14:45 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7257244]: (no justification provided) (duration: 01m 26s)
14:44 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7257244]: (no justification provided)
13:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1001.eqiad.wmnet
13:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netflow1001.eqiad.wmnet
13:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2001.codfw.wmnet
13:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netflow2001.codfw.wmnet
13:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3001.esams.wmnet
13:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netflow3001.esams.wmnet
13:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4001.ulsfo.wmnet
12:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netflow4001.ulsfo.wmnet
12:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5001.eqsin.wmnet
12:57 moritzm: reset ifup on netflow5001 T273026
12:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netflow5001.eqsin.wmnet
12:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp1001.wikimedia.org
12:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
12:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-corp1001.wikimedia.org
12:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases1002.eqiad.wmnet
12:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host releases1002.eqiad.wmnet
12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases2002.codfw.wmnet
12:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host releases2002.codfw.wmnet
12:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp2001.wikimedia.org
12:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-corp2001.wikimedia.org
12:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parse2001.codfw.wmnet
12:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1002.eqiad.wmnet
12:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
12:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host people1002.eqiad.wmnet
12:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
12:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host parse2001.codfw.wmnet
12:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
12:00 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7257244]: (no justification provided) (duration: 01m 00s)
11:59 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7257244]: (no justification provided)
11:59 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7257244]: (no justification provided) (duration: 04m 04s)
11:55 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7257244]: (no justification provided)
11:55 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7257244]: (no justification provided) (duration: 03m 25s)
11:51 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7257244]: (no justification provided)
11:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
11:44 jayme@deploy1001: Finished deploy [docker-pkg/deploy@7257244]: (no justification provided) (duration: 05m 50s)
11:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
11:39 ayounsi@cumin1001: START - Cookbook sre.network.cf
11:39 jayme@deploy1001: Started deploy [docker-pkg/deploy@7257244]: (no justification provided)
11:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
11:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
11:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
11:29 vgutierrez: restart acme-chief instances to catch up on kernel upgrades
11:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3001.esams.wmnet
11:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3001.esams.wmnet
11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3002.esams.wmnet
11:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3002.esams.wmnet
11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1001.eqiad.wmnet
11:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
11:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1002.eqiad.wmnet
10:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1002.eqiad.wmnet
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 100%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14222 and previous config saved to /var/cache/conftool/dbconfig/20210205-105345-root.json
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 75%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14221 and previous config saved to /var/cache/conftool/dbconfig/20210205-103841-root.json
10:32 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837
10:27 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 50%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14220 and previous config saved to /var/cache/conftool/dbconfig/20210205-102338-root.json
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 25%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14219 and previous config saved to /var/cache/conftool/dbconfig/20210205-100834-root.json
10:06 gehel: repooling wdqs1013 - catched up on lag
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 10%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14218 and previous config saved to /var/cache/conftool/dbconfig/20210205-095331-root.json
09:45 dcausse: reloading categories from scratch on wdqs1010
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 5%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14217 and previous config saved to /var/cache/conftool/dbconfig/20210205-093827-root.json
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 T273710', diff saved to https://phabricator.wikimedia.org/P14214 and previous config saved to /var/cache/conftool/dbconfig/20210205-084625-marostegui.json
08:29 dcausse: reloading categories from scratch on wdqs1009
07:55 gehel: cleanup of left over ttl dumps on wdqs1009 and wdqs1010
07:47 gehel: depooling wdqs1013 and restarting blazegraph
07:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
07:28 oblivian@cumin1001: START - Cookbook sre.network.cf
06:36 marostegui: Stop MySQL on db1075 to clone db1157 T258361
06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 T258361', diff saved to https://phabricator.wikimedia.org/P14212 and previous config saved to /var/cache/conftool/dbconfig/20210205-063554-marostegui.json
03:42 aaron@deploy1001: Synchronized wmf-config/mc.php: af5b0ef (duration: 01m 06s)
03:34 aaron@deploy1001: Synchronized php-1.36.0-wmf.27/includes/libs/rdbms: 4b38666 (duration: 01m 12s)
02:03 Krinkle: krinkle@mwmaint1002 Prune globalimagelinks references on s4 database for the deleted ukwikimedia wiki, ref T218170.
01:01 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@85713c1]: restore data range specifier in extract job partition spec (duration: 01m 12s)
00:59 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@85713c1]: restore data range specifier in extract job partition spec
00:36 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1278.eqiad.wmnet
00:35 legoktm: enabled remote IPMI access on mw1349.mgmt.eqiad.wmnet and mw1380.mgmt.eqiad.wmnet
00:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@9858513]: transfer_to_es: Wait for link reco, and write to weighted_tags as well (duration: 02m 43s)
00:21 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@9858513]: transfer_to_es: Wait for link reco, and write to weighted_tags as well

2021-02-04

23:59 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@93bf374]: correct hql in ores_predictions_init_v3 (duration: 01m 06s)
23:58 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@93bf374]: correct hql in ores_predictions_init_v3
23:26 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1278.eqiad.wmnet with reason: REIMAGE
23:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1278.eqiad.wmnet with reason: REIMAGE
23:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1397.eqiad.wmnet
23:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1396.eqiad.wmnet
23:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1396.eqiad.wmnet
23:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1397.eqiad.wmnet
23:01 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1311.eqiad.wmnet
22:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1263.eqiad.wmnet
22:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1263.eqiad.wmnet
22:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1311.eqiad.wmnet
22:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@700cd49]: partition ores staging tables by data source (duration: 01m 19s)
22:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@700cd49]: partition ores staging tables by data source
22:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1396.eqiad.wmnet with reason: REIMAGE
22:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
22:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1396.eqiad.wmnet with reason: REIMAGE
22:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
21:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1399.eqiad.wmnet
21:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1398.eqiad.wmnet
21:53 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2244.codfw.wmnet
21:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1311.eqiad.wmnet with reason: REIMAGE
21:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1399.eqiad.wmnet
21:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1398.eqiad.wmnet
21:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1311.eqiad.wmnet with reason: REIMAGE
21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
21:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
21:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1398.eqiad.wmnet with reason: REIMAGE
21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1399.eqiad.wmnet with reason: REIMAGE
21:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1398.eqiad.wmnet with reason: REIMAGE
21:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1399.eqiad.wmnet with reason: REIMAGE
21:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1308.eqiad.wmnet
21:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1308.eqiad.wmnet
20:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1400.eqiad.wmnet
20:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2267.codfw.wmnet
20:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2267.wmnet
20:38 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2244.codfw.wmnet
20:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1400.eqiad.wmnet
20:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2267.wmnet
20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1308.eqiad.wmnet with reason: REIMAGE
20:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1308.eqiad.wmnet with reason: REIMAGE
20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1400.eqiad.wmnet with reason: REIMAGE
20:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2267.codfw.wmnet with reason: REIMAGE
20:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1400.eqiad.wmnet with reason: REIMAGE
20:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2267.codfw.wmnet with reason: REIMAGE
19:56 Urbanecm: Purge several recompressed Wikipedia logos
19:52 urbanecm@deploy1001: Synchronized logos/config.yaml: Recompress several Wikipedia logos (2/2) (duration: 01m 05s)
19:51 urbanecm@deploy1001: Synchronized static/images/project-logos/: Recompress several Wikipedia logos (1/2) (duration: 01m 07s)
19:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1309.eqiad.wmnet
19:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 968ae8b: sysop_itwiki: Set wmgUsePopups to false (T259480) (duration: 01m 06s)
19:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2244.codfw.wmnet with reason: REIMAGE
19:31 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2244.codfw.wmnet with reason: REIMAGE
19:31 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: a199b83: abusefilter: enwikibooks: Enable block action (T273864) (duration: 01m 06s)
19:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 35e6e40: Remove ruwiki A/B test for WelcomeSurvey (T273900) (duration: 01m 07s)
19:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 74e7f70: wgAbuseFilterAflFilterMigrationStage: Make READ_NEW in production (T269712) (duration: 01m 11s)
19:06 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕜☕ sudo cumin A:cp 'enable-puppet "cdanis deploying I498a0c4af T263496"'
19:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2278.codfw.wmnet
19:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1309.eqiad.wmnet
18:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1401.eqiad.wmnet
18:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2019.codfw.wmnet
18:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:45 cdanis: T263496 deployed I498a0c4af on cp2027 at 18:29; now deploying on cp3060
18:45 robh@cumin1001: START - Cookbook sre.dns.netbox
18:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2019.codfw.wmnet
18:28 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕜☕ sudo cumin A:cp 'disable-puppet "cdanis deploying I498a0c4af T263496"'
18:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2278.codfw.wmnet
18:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1401.eqiad.wmnet
18:05 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: revert - Migrate PrefUpdate schema to Event Platform on all wikis - leave on testwiki only, seeing validation errors. T267348 (duration: 01m 01s)
18:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1309.eqiad.wmnet with reason: REIMAGE
18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1309.eqiad.wmnet with reason: REIMAGE
17:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2278.codfw.wmnet with reason: REIMAGE
17:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2278.codfw.wmnet with reason: REIMAGE
17:51 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate PrefUpdate schema to Event Platform on all wikis - T267348 (duration: 01m 01s)
17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1401.eqiad.wmnet with reason: REIMAGE
17:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1401.eqiad.wmnet with reason: REIMAGE
17:42 urbanecm@deploy1001: Synchronized wmf-config/logos.php: eed3c8e: Switch enwiki back to standard logo (T272108; resync) (duration: 01m 07s)
17:41 urbanecm@deploy1001: Synchronized logos/config.yaml: eed3c8e: Switch enwiki back to standard logo (T272108; 2/2) (duration: 01m 07s)
17:38 urbanecm@deploy1001: Synchronized wmf-config/logos.php: eed3c8e: Switch enwiki back to standard logo (T272108; 1/2) (duration: 03m 12s)
16:46 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate PrefUpdate schema to Event Platform on testwiki - T267348 (duration: 01m 08s)
16:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3002.esams.wmnet
16:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti3002.esams.wmnet
16:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2023.codfw.wmnet
16:00 moritzm: draining ganeti3002 for eventual reboot
15:57 moritzm: failover ganeti master in esams to ganeti3001
15:56 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2023.codfw.wmnet
15:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2022.codfw.wmnet
15:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3001.esams.wmnet
15:55 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
15:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti3001.esams.wmnet
15:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2022.codfw.wmnet
15:29 moritzm: draining ganeti3001 for eventual reboot
15:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3003.esams.wmnet
15:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2021.codfw.wmnet
15:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti3003.esams.wmnet
15:20 moritzm: draining ganeti3003 for eventual reboot
15:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2021.codfw.wmnet
15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2020.codfw.wmnet
15:01 jynus@cumin1001: START - Cookbook sre.hosts.decommission
14:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2020.codfw.wmnet
14:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2019.codfw.wmnet
14:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2001.codfw.wmnet
14:43 jynus: stop db1095 instance in preparation of its decom T273732
14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2001.codfw.wmnet
14:38 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837
14:37 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2019.codfw.wmnet
14:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet
14:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2002.codfw.wmnet
14:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2002.codfw.wmnet
14:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet
14:21 godog: roll-restart rsync/swift-object-replicator in codfw to apply memory limits
14:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4001.ulsfo.wmnet
14:18 effie: start rolling reboots of mc[2019-2027,2029-2037].codfw.wmnet T273278
14:16 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@47fc426]: (no justification provided) (duration: 00m 12s)
14:16 mbsantos@deploy1001: Started deploy [kartotherian/deploy@47fc426]: (no justification provided)
14:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4001.ulsfo.wmnet
14:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4002.ulsfo.wmnet
14:14 moritzm: installing ffmpeg security updates on stretch
14:11 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: (no justification provided) (duration: 00m 03s)
14:11 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: (no justification provided)
14:10 mbsantos@deploy1001: Finished deploy [tilerator/deploy@46a2eaf]: (no justification provided) (duration: 00m 13s)
14:10 mbsantos@deploy1001: Started deploy [tilerator/deploy@46a2eaf]: (no justification provided)
14:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4002.ulsfo.wmnet
14:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5001.eqsin.wmnet
13:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: NO-OP: 7c67b2f: bnwiki: wgGEHelpPanelLinks: Remove text in brackets (T266020) (duration: 01m 12s)
13:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5001.eqsin.wmnet
13:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5002.eqsin.wmnet
13:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5002.eqsin.wmnet
13:44 vgutierrez: rolling restart of ncredir instances (kernel upgrade)
13:36 moritzm: installing openldap security updates on buster (client-side tools/libs only, slapd instance already updated)
13:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
13:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1003.eqiad.wmnet
13:31 godog: reboot logstash2005.codfw.wmnet, no ssh / stuck
13:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
13:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mwdebug1003.eqiad.wmnet
13:10 jbond42: upload cas_6.2.7 to downgrade cas T273867
13:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1010.eqiad.wmnet with reason: REIMAGE
13:02 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1010.eqiad.wmnet with reason: REIMAGE
12:27 moritzm: installing libdatetime-timezone-perl updates on Buster
12:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 17 hosts with reason: reboot
12:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 4:00:00 on 17 hosts with reason: reboot
12:17 moritzm: rebooting mw[1264-1268,1276-1277,1337-1338,1404-1409,1411,1413].eqiad.wmnet for kernel update
12:08 godog: bounce rsyslog on centrallog1001
11:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1009.eqiad.wmnet
11:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1009.eqiad.wmnet
11:30 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
11:26 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
11:07 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams-internal
10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 93 hosts with reason: reboot
10:35 moritzm: rebooting mw[2261-2262,2268-2271,2273-2277,2283-2288,2290-2335,2337-2339,2350-2376].codfw.wmnet
10:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 4:00:00 on 93 hosts with reason: reboot
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14204 and previous config saved to /var/cache/conftool/dbconfig/20210204-102312-root.json
10:15 elukey: restart pybal on lvs1015 (low-traffic active) to pick up new changes for eventstreams-internal (new VIP) - T269160
10:13 elukey: restart pybal on lvs2009 (low-traffic active) to pick up new changes for eventstreams-internal (new VIP) - T269160
10:08 elukey: restart pybal on lvs1016 (low-traffic standby) to pick up new changes for eventstreams-internal (new VIP) - T269160
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14203 and previous config saved to /var/cache/conftool/dbconfig/20210204-100808-root.json
10:05 elukey: restart pybal on lvs2010 (low-traffic standby) to pick up new changes for eventstreams-internal (new VIP) - T269160
09:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 60%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14202 and previous config saved to /var/cache/conftool/dbconfig/20210204-095305-root.json
09:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
09:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 37 hosts with reason: reboot
09:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 4:00:00 on 37 hosts with reason: reboot
09:41 moritzm: rebooting mw[2215-2219,2221-2243,2246-2249,2251-2253,2255,2258 for kernel update
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14201 and previous config saved to /var/cache/conftool/dbconfig/20210204-093801-root.json
09:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flowspec1001.eqiad.wmnet
09:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flowspec1001.eqiad.wmnet
09:24 XioNoX: re-enable ping offload in esams - T273278
09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1078 from dbctl T273597', diff saved to https://phabricator.wikimedia.org/P14199 and previous config saved to /var/cache/conftool/dbconfig/20210204-092414-marostegui.json
09:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping3001.esams.wmnet
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 30%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14198 and previous config saved to /var/cache/conftool/dbconfig/20210204-092257-root.json
09:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ping3001.esams.wmnet
09:17 XioNoX: disable ping offload in esams (eqiad re-enabled) - T273278
09:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1001.eqiad.wmnet
09:15 godog: roll restart lvs low-traffic in codfw/eqiad for swift healthcheck updates
09:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ping1001.eqiad.wmnet
09:10 XioNoX: disable ping offload in eqiad (codfw-re-enabled) - T273278
09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14197 and previous config saved to /var/cache/conftool/dbconfig/20210204-090754-root.json
09:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2001.codfw.wmnet
09:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ping2001.codfw.wmnet
09:02 XioNoX: disable ping offload in codfw - T273278
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 20%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14196 and previous config saved to /var/cache/conftool/dbconfig/20210204-085250-root.json
08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 15%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14195 and previous config saved to /var/cache/conftool/dbconfig/20210204-083747-root.json
08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
08:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 12%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14194 and previous config saved to /var/cache/conftool/dbconfig/20210204-082243-root.json
08:22 moritzm: reset failed ifup@ens5 on xhgui2001/xhgui1001 T273026
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14193 and previous config saved to /var/cache/conftool/dbconfig/20210204-081605-root.json
08:10 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1009.eqiad.wmnet with reason: REIMAGE
08:08 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1009.eqiad.wmnet with reason: REIMAGE
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14192 and previous config saved to /var/cache/conftool/dbconfig/20210204-080740-root.json
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14191 and previous config saved to /var/cache/conftool/dbconfig/20210204-080101-root.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 7%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14190 and previous config saved to /var/cache/conftool/dbconfig/20210204-075236-root.json
07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14189 and previous config saved to /var/cache/conftool/dbconfig/20210204-074558-root.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 5%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14188 and previous config saved to /var/cache/conftool/dbconfig/20210204-073733-root.json
07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14187 and previous config saved to /var/cache/conftool/dbconfig/20210204-073054-root.json
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 3%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14186 and previous config saved to /var/cache/conftool/dbconfig/20210204-072229-root.json
07:16 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1117.eqiad.wmnet
07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 20%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14185 and previous config saved to /var/cache/conftool/dbconfig/20210204-071551-root.json
07:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1117.eqiad.wmnet
07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 2%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14184 and previous config saved to /var/cache/conftool/dbconfig/20210204-070726-root.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14183 and previous config saved to /var/cache/conftool/dbconfig/20210204-070047-root.json
06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14182 and previous config saved to /var/cache/conftool/dbconfig/20210204-064544-root.json
06:42 marostegui: Restart mysql on db1137
06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 T266483', diff saved to https://phabricator.wikimedia.org/P14181 and previous config saved to /var/cache/conftool/dbconfig/20210204-064157-marostegui.json
06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 1%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14180 and previous config saved to /var/cache/conftool/dbconfig/20210204-063033-root.json
06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1173 to dbctl - depooled T258361', diff saved to https://phabricator.wikimedia.org/P14179 and previous config saved to /var/cache/conftool/dbconfig/20210204-062836-marostegui.json
02:02 legoktm@deploy1001: Synchronized logos/config.yaml: Update and recompress logos for nowiki, cawiki, fiwiki, ukwiki, cswiki, huwiki, trwiki (2/2) (duration: 01m 06s)
02:00 legoktm@deploy1001: Synchronized static/images/project-logos/: Update and recompress logos for nowiki, cawiki, fiwiki, ukwiki, cswiki, huwiki, trwiki (1/2) (duration: 01m 10s)
01:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4b4872d]: transfer_to_es: Increase timeout waiting for source data to three hours (duration: 01m 16s)
01:13 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4b4872d]: transfer_to_es: Increase timeout waiting for source data to three hours
01:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1310.eqiad.wmnet
00:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1318.eqiad.wmnet
00:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1310.eqiad.wmnet
00:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
00:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2279.codfw.wmnet
00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2280.codfw.wmnet
00:17 eileen: civicrm revision changed from dfb2ea2148 to 1e9a86dd6e, config revision is 01ea3062f4
00:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2279.codw.wmnet
00:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet
00:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1310.eqiad.wmnet with reason: REIMAGE
00:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1310.eqiad.wmnet with reason: REIMAGE

2021-02-03

23:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1318.eqiad.wmnet with reason: REIMAGE
23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1318.eqiad.wmnet with reason: REIMAGE
23:51 mutante: installservers: replacing squid proxy logrotate cron with systemd timer
23:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2279.codfw.wmnet with reason: REIMAGE
23:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2280.codfw.wmnet with reason: REIMAGE
23:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2279.codfw.wmnet with reason: REIMAGE
23:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2280.codfw.wmnet with reason: REIMAGE
22:53 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
22:06 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox1001.wikimedia.org
21:53 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single for host netbox1001.wikimedia.org
21:53 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1001.eqiad.wmnet
21:46 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single for host netboxdb1001.eqiad.wmnet
21:44 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox2001.wikimedia.org
21:40 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single for host netbox2001.wikimedia.org
21:39 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2001.codfw.wmnet
21:34 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single for host netboxdb2001.codfw.wmnet
21:33 chaomodus: rebooting Netbox cluster
21:05 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
20:03 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1334.eqiad.wmnet
20:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1334.eqiad.wmnet
19:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2281.codfw.wmnet
19:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2282.codfw.wmnet
19:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2281.codfw.wmnet
19:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2282.codfw.wmnet
19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1334.eqiad.wmnet with reason: REIMAGE
19:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1334.eqiad.wmnet with reason: REIMAGE
19:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 56351f0: kowiki: Fix wgGEHelpPanelHelpDeskTitle (T273799) (duration: 01m 10s)
18:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2282.codfw.wmnet with reason: REIMAGE
18:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2281.codfw.wmnet with reason: REIMAGE
18:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2282.codfw.wmnet with reason: REIMAGE
18:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2281.codfw.wmnet with reason: REIMAGE
18:32 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
18:26 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
18:23 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
18:13 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
17:01 elukey@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=eventstreams-internal
16:44 mbsantos@deploy1001: deploy aborted: (no justification provided) (duration: 00m 00s)
16:44 mbsantos@deploy1001: Started deploy [tilerator/deploy@46a2eaf] (imposm): (no justification provided)
16:44 mbsantos@deploy1001: deploy aborted: (no justification provided) (duration: 00m 01s)
16:44 mbsantos@deploy1001: Started deploy [tilerator/deploy@46a2eaf] (beta): (no justification provided)
16:37 mbsantos@deploy1001: deploy aborted: Deploy Tilerator build for buster machines (duration: 00m 03s)
16:37 mbsantos@deploy1001: Started deploy [tilerator/deploy@46a2eaf] (imposm): Deploy Tilerator build for buster machines
16:37 mbsantos@deploy1001: deploy aborted: imposm Deploy Tilerator build for buster machines (duration: 00m 03s)
16:37 mbsantos@deploy1001: Started deploy [tilerator/deploy@46a2eaf] (nvironment): imposm Deploy Tilerator build for buster machines
16:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2001.codfw.wmnet
16:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host peek2001.codfw.wmnet
16:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host people2001.codfw.wmnet
16:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
16:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host peek2001.codfw.wmnet
16:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host planet1002.eqiad.wmnet
16:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
16:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host planet1002.eqiad.wmnet
16:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host planet2002.codfw.wmnet
16:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4002.ulsfo.wmnet
16:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host planet2002.codfw.wmnet
16:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb1002.eqiad.wmnet
16:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host miscweb1002.eqiad.wmnet
16:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti4002.ulsfo.wmnet
16:16 moritzm: draining ganeti4002 for eventual reboot
16:13 moritzm: failover ganeti master in ulsfo to ganeti4003
16:13 volans: enabled puppet on install1003 after the test T221388
16:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4003.ulsfo.wmnet
16:08 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams-internal
16:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti4003.ulsfo.wmnet
16:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labweb1002.wikimedia.org
16:00 moritzm: draining ganeti4003 for eventual reboot
15:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host labweb1002.wikimedia.org
15:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labweb1001.wikimedia.org
15:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti4001.ulsfo.wmnet
15:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host labweb1001.wikimedia.org
15:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
15:49 moritzm: draining ganeti4001 for eventual reboot
15:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
15:46 hnowlan: one-off installing imposm3 on maps1009
15:32 volans: disabling puppet on install1003 for a quick test for T221388
15:18 moritzm: installing ca-certificates update for buster (reverting the Symantec CA blacklist, related to GeoTrust CA)
15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14171 and previous config saved to /var/cache/conftool/dbconfig/20210203-150411-root.json
14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14170 and previous config saved to /var/cache/conftool/dbconfig/20210203-144908-root.json
14:39 akosiaris@cumin1001: conftool action : set/pooled=True; selector: dnsdisc=linkrecommendation
14:38 akosiaris@cumin1001: conftool action : set/pooled=True; selector: dnsdisc=similar-users
14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14169 and previous config saved to /var/cache/conftool/dbconfig/20210203-143404-root.json
14:20 moritzm: installing openldap security updates on serpens/seaborgium
14:19 godog: test memory limits on swift-object-replicator on ms-be2050 - T221904
14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14168 and previous config saved to /var/cache/conftool/dbconfig/20210203-141901-root.json
14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 20%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14167 and previous config saved to /var/cache/conftool/dbconfig/20210203-140357-root.json
13:58 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837
13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14166 and previous config saved to /var/cache/conftool/dbconfig/20210203-134854-root.json
13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14165 and previous config saved to /var/cache/conftool/dbconfig/20210203-133350-root.json
13:30 marostegui: Stop mysql on db1120 to enable report_host T266483
13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host xhgui1001.eqiad.wmnet
13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T266483', diff saved to https://phabricator.wikimedia.org/P14164 and previous config saved to /var/cache/conftool/dbconfig/20210203-132938-marostegui.json
13:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host xhgui2001.codfw.wmnet
13:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host xhgui1001.eqiad.wmnet
13:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host xhgui2001.codfw.wmnet
13:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2001.wikimedia.org
13:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1001.wikimedia.org
13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2001.wikimedia.org
13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1001.wikimedia.org
12:46 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetmaster2001.codfw.wmnet
12:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster2002.codfw.wmnet
12:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster2003.codfw.wmnet
12:35 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2001.codfw.wmnet
12:35 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetmaster1001.eqiad.wmnet
12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb2002.codfw.wmnet
12:34 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2002.codfw.wmnet
12:34 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2003.codfw.wmnet
12:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1002.eqiad.wmnet
12:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1003.eqiad.wmnet
12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1003.eqiad.wmnet
12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1002.eqiad.wmnet
12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1001.eqiad.wmnet
12:28 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetdb1002.eqiad.wmnet
12:26 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetdb2002.codfw.wmnet
12:25 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetdb1002.eqiad.wmnet
12:22 jbond42: disable puppet fleet wide to reboot puppetmaster,puppetdb
12:19 moritzm: installing openldap security updates on LDAP replicas
11:20 jbond42: update puppetlabs-stdlib to v6.6.0
11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14163 and previous config saved to /var/cache/conftool/dbconfig/20210203-110236-root.json
10:54 elukey@deploy1001: Finished deploy [analytics/refinery@8b8f0cf]: Weekly deployment (duration: 11m 06s)
10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 100%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14162 and previous config saved to /var/cache/conftool/dbconfig/20210203-105057-root.json
10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 85%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14161 and previous config saved to /var/cache/conftool/dbconfig/20210203-104733-root.json
10:43 elukey@deploy1001: Started deploy [analytics/refinery@8b8f0cf]: Weekly deployment
10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 75%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14160 and previous config saved to /var/cache/conftool/dbconfig/20210203-103554-root.json
10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14159 and previous config saved to /var/cache/conftool/dbconfig/20210203-103229-root.json
10:28 vgutierrez: rolling restart of varnish-fe on cp5002 and cp5003
10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 50%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14158 and previous config saved to /var/cache/conftool/dbconfig/20210203-102050-root.json
10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 60%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14157 and previous config saved to /var/cache/conftool/dbconfig/20210203-101726-root.json
10:16 legoktm: re-enabled puppet on mw2295 (T273726)
10:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 25%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14156 and previous config saved to /var/cache/conftool/dbconfig/20210203-100547-root.json
10:05 gehel: depooling and restarting blazegraph on wdqs1007
10:04 hashar@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/Echo/includes/api/ApiEchoUnreadNotificationPages.php: Add missing isset() check to ApiEchoUnreadNotificationPages - T273479 (duration: 01m 14s)
10:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14155 and previous config saved to /var/cache/conftool/dbconfig/20210203-100222-root.json
09:57 marostegui: m2 master restart - T272964
09:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 10%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14154 and previous config saved to /var/cache/conftool/dbconfig/20210203-095043-root.json
09:50 XioNoX: disable DE-CIX codfw peering sessions
09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 40%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14153 and previous config saved to /var/cache/conftool/dbconfig/20210203-094719-root.json
09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 5%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14152 and previous config saved to /var/cache/conftool/dbconfig/20210203-093540-root.json
09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 30%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14151 and previous config saved to /var/cache/conftool/dbconfig/20210203-093215-root.json
09:30 vgutierrez: depool cp5006
09:26 vgutierrez: rolling restart varnish-fe on cp5004-5006
09:20 _joe_: restarting varnish-frontend on cp5001
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14150 and previous config saved to /var/cache/conftool/dbconfig/20210203-091712-root.json
09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 20%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14149 and previous config saved to /var/cache/conftool/dbconfig/20210203-090208-root.json
08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 15%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14148 and previous config saved to /var/cache/conftool/dbconfig/20210203-084705-root.json
08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 13%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14147 and previous config saved to /var/cache/conftool/dbconfig/20210203-083201-root.json
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14146 and previous config saved to /var/cache/conftool/dbconfig/20210203-081658-root.json
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 8%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14145 and previous config saved to /var/cache/conftool/dbconfig/20210203-080154-root.json
07:49 marostegui: Stop mysql on db1093 to clone db1173 T258361
07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 to clone db1173 T258361', diff saved to https://phabricator.wikimedia.org/P14143 and previous config saved to /var/cache/conftool/dbconfig/20210203-074749-marostegui.json
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14142 and previous config saved to /var/cache/conftool/dbconfig/20210203-074651-root.json
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Give some more weight to db1174', diff saved to https://phabricator.wikimedia.org/P14141 and previous config saved to /var/cache/conftool/dbconfig/20210203-071310-marostegui.json
07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 - will be decommissioned', diff saved to https://phabricator.wikimedia.org/P14139 and previous config saved to /var/cache/conftool/dbconfig/20210203-064137-marostegui.json
06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1174 with minimal weight for the first time in s7', diff saved to https://phabricator.wikimedia.org/P14138 and previous config saved to /var/cache/conftool/dbconfig/20210203-063812-marostegui.json
00:16 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
00:13 legoktm@deploy1001: Synchronized logos/: Update and recompress logos for nlwiki, eswiki, ptwiki, ruwiki, svwiki, zhwiki (2/2) (duration: 01m 05s)
00:12 legoktm@deploy1001: Synchronized static/images/project-logos/: Update and recompress logos for nlwiki, eswiki, ptwiki, ruwiki, svwiki, zhwiki (1/2) (duration: 01m 10s)
00:10 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .

2021-02-02

23:53 mutante: mw1300 - scap pull (it crashed earlier put is back after powercycling)
23:52 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
23:30 mutante: powercycling crashed m1300.eqiad.wmnet
21:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1335.eqiad.wmnet
21:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1336.eqiad.wmnet
21:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1335.eqiad.wmnet
21:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1336.eqiad.wmnet
21:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1335.eqiad.wmnet with reason: REIMAGE
21:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1336.eqiad.wmnet with reason: REIMAGE
21:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1335.eqiad.wmnet with reason: REIMAGE
21:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1336.eqiad.wmnet with reason: REIMAGE
20:12 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕒☕ sudo cumin A:cp 'enable-puppet "cdanis deploying I7003b7b6 and Idd0e124f5 T263496"' # test on cp2027 looks good, perhaps slightly-increased Varnish CPU consumption but hard to be sure
20:00 Lucas_WMDE: Morning backport window done
19:58 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/WikibaseMediaInfo/: Backport: Pass $databaseName into WikiPageEntityDataLoader (T273622) (duration: 01m 07s)
19:57 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/Wikibase/: Backport: Add wiki ID to WikiPageEntityDataLoader (T273622) (duration: 01m 25s)
19:52 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕒☕ sudo cumin A:cp 'disable-puppet "cdanis deploying I7003b7b6 and Idd0e124f5 T263496"'
19:00 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
18:48 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
18:43 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
18:23 milimetric@deploy1001: Finished deploy [analytics/turnilo/deploy@052348b]: (no justification provided) (duration: 00m 03s)
18:23 milimetric@deploy1001: Started deploy [analytics/turnilo/deploy@052348b]: (no justification provided)
18:22 milimetric@deploy1001: deploy aborted: (no justification provided) (duration: 00m 10s)
18:22 milimetric@deploy1001: Started deploy [analytics/turnilo/deploy@052348b]: (no justification provided)
18:17 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:07 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:03 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
16:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host auth2001.codfw.wmnet
16:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host auth1002.eqiad.wmnet
16:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host auth1002.eqiad.wmnet
16:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host auth2001.codfw.wmnet
15:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2002.codfw.wmnet
15:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host miscweb2002.codfw.wmnet
15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 100%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14135 and previous config saved to /var/cache/conftool/dbconfig/20210202-143950-root.json
14:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1001.eqiad.wmnet
14:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host failoid1001.eqiad.wmnet
14:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
14:35 klausman@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2003.codfw.wmnet
14:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
14:26 hashar@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.29 (duration: 73m 10s)
14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 75%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14134 and previous config saved to /var/cache/conftool/dbconfig/20210202-142446-root.json
14:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
14:21 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2003.codfw.wmnet
14:12 klausman@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet
14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 50%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14133 and previous config saved to /var/cache/conftool/dbconfig/20210202-140943-root.json
14:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 25%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14132 and previous config saved to /var/cache/conftool/dbconfig/20210202-135439-root.json
13:49 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2002.codfw.wmnet
13:49 klausman@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2001.codfw.wmnet
13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 10%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14128 and previous config saved to /var/cache/conftool/dbconfig/20210202-133936-root.json
13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
13:32 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
13:31 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd1003.eqiad.wmnet
13:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc2001.codfw.wmnet
13:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
13:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc1002.eqiad.wmnet
13:13 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.29
13:13 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1003.eqiad.wmnet
13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host doc2001.codfw.wmnet
13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host doc1002.eqiad.wmnet
13:11 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd1002.eqiad.wmnet
13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2001.wikimedia.org
13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2001.codfw.wmnet
13:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host failoid2001.codfw.wmnet
13:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host irc2001.wikimedia.org
13:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
13:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
12:52 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1002.eqiad.wmnet
12:52 klausman@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host ml-etcd1002.eqiad.wmnet
12:51 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1002.eqiad.wmnet
12:50 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on malmok.wikimedia.org with reason: rebooting for kernel update
12:50 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on malmok.wikimedia.org with reason: rebooting for kernel update
12:47 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cescout1001.eqiad.wmnet with reason: rebooting for kernel update
12:46 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on cescout1001.eqiad.wmnet with reason: rebooting for kernel update
12:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki2001.codfw.wmnet
12:46 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd1002.eqiad.wmnet
12:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
12:44 klausman@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd2001.codfw.wmnet
12:43 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1002.eqiad.wmnet
12:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1001.eqiad.wmnet
12:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
12:42 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4001.wikimedia.org
12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5001.wikimedia.org
12:41 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd1001.eqiad.wmnet
12:41 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
12:40 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
12:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2003.wikimedia.org
12:40 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host pki2001.codfw.wmnet
12:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3001.wikimedia.org
12:38 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host pki1001.eqiad.wmnet
12:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1003.wikimedia.org
12:37 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install5001.wikimedia.org
12:37 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install4001.wikimedia.org
12:36 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install3001.wikimedia.org
12:35 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install2003.wikimedia.org
12:35 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install1003.wikimedia.org
12:34 klausman@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd2001.codfw.wmnet
12:34 urbanecm@deploy1001: Synchronized docroot/noc/conf/index.php: 995649e: noc: yaml files may be published w/o .txt extension (duration: 00m 57s)
12:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1001.wikimedia.org
12:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1001.wikimedia.org
12:30 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
12:30 klausman@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd2001.codfw.wmnet
12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp1001.wikimedia.org
12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host apt1001.wikimedia.org
12:26 urbanecm@deploy1001: Synchronized docroot/noc/createTxtFileSymlinks.sh: 210647e: noc: Publicly expose logos/config.yaml (2/2; T273330) (duration: 00m 55s)
12:23 urbanecm@deploy1001: Synchronized docroot/noc/conf/logos-config.yaml: 210647e: noc: Publicly expose logos/config.yaml (1/2; T273330) (duration: 00m 57s)
12:22 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
12:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/GrowthExperiments/includes/HomepageModules/Banner.php: da8f328: Banner module: Switch to using activated/unactivated for state (T273084) (duration: 00m 58s)
12:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/GrowthExperiments/includes/Specials/SpecialHomepage.php: 18c59d0: SpecialHomepage: Do not load start-startediting if SE arent enabled (T273243) (duration: 01m 01s)
12:18 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1001.eqiad.wmnet
12:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2001.wikimedia.org
12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2001.wikimedia.org
12:14 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
12:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
12:13 jbond42: upload cas_6.3 package
12:12 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp2001.wikimedia.org
12:12 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
12:11 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
11:06 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
11:04 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
10:30 XioNoX: re-enable DE-CIX codfw peering sessions
10:17 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 to clone db1174 - T258361', diff saved to https://phabricator.wikimedia.org/P14121 and previous config saved to /var/cache/conftool/dbconfig/20210202-100859-marostegui.json
10:08 elukey@cumin1001: START - Cookbook sre.dns.netbox
10:02 hashar: Restarted Gerrit primary on gerrit1001 # T273223
10:00 hashar@deploy1001: Finished deploy [gerrit/gerrit@c3cd63b]: Gerrit primary on gerrit1001 to v3.2.7 T273223 (duration: 00m 09s)
10:00 hashar@deploy1001: Started deploy [gerrit/gerrit@c3cd63b]: Gerrit primary on gerrit1001 to v3.2.7 T273223
10:00 hashar: Restarted Gerrit replica on gerrit2001 # T273223
09:56 hashar@deploy1001: Finished deploy [gerrit/gerrit@c3cd63b]: Gerrit replica on gerrit2001 to v3.2.7 T273223 (duration: 00m 12s)
09:56 hashar@deploy1001: Started deploy [gerrit/gerrit@c3cd63b]: Gerrit replica on gerrit2001 to v3.2.7 T273223
09:27 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1381.eqiad.wmnet
08:56 XioNoX: disable DE-CIX codfw peering session
08:30 godog: swift eqiad-prod: add weight back to sdg on ms-be1054 - T273582
08:02 legoktm: depooled mw1381.eqiad.wmnet for perf testing (T273312)
07:59 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1381.eqiad.wmnet
07:45 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1403.eqiad.wmnet
07:45 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1405.eqiad.wmnet
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14118 and previous config saved to /var/cache/conftool/dbconfig/20210202-073105-root.json
07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14117 and previous config saved to /var/cache/conftool/dbconfig/20210202-071602-root.json
07:14 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14116 and previous config saved to /var/cache/conftool/dbconfig/20210202-070057-root.json
06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14115 and previous config saved to /var/cache/conftool/dbconfig/20210202-064553-root.json
06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14114 and previous config saved to /var/cache/conftool/dbconfig/20210202-063050-root.json
06:24 marostegui: Restart mysql on es1022
06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 T266483', diff saved to https://phabricator.wikimedia.org/P14113 and previous config saved to /var/cache/conftool/dbconfig/20210202-062303-marostegui.json
04:12 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
03:40 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
03:40 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
03:40 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
03:36 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@ad9db35]: 0.3.62 (duration: 06m 59s)
03:29 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.62` on canary `wdqs1003`; proceeding to rest of fleet
03:29 ryankemper@deploy1001: Started deploy [wdqs/wdqs@ad9db35]: 0.3.62
03:26 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.62`. Pre-deploy tests passing on canary `wdqs1003`
03:21 ryankemper: `sudo systemctl restart wdqs-blazegraph` on `wdqs1006`

2021-02-01

23:54 legoktm@deploy1001: Synchronized wmf-config/profiler.php: profiler: Send data to excimer-buster pipeline (T273312) (duration: 00m 57s)
23:15 legoktm: depooling mw1403 and mw1405 for perf testing
23:14 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1405.eqiad.wmnet
23:14 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1403.eqiad.wmnet
23:14 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1278.eqiad.wmnet
23:05 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/Collection/includes/Specials/SpecialCollection.php: 3c7864c: Remove unnecessary calls to WikiPage (T273101) (duration: 00m 58s)
22:09 sbassett: Deployed security patch for T272386
22:05 sbassett: Deployed security patch for T270713
22:04 legoktm: depooling mw1278.eqiad.wmnet for perf testing
22:03 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1278.eqiad.wmnet
22:03 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1277.eqiad.wmnet
21:29 dancy@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
20:53 andrew@deploy1001: Finished deploy [striker/deploy@b6441b8]: Striker hacked fix for T272410 (duration: 00m 57s)
20:52 andrew@deploy1001: Started deploy [striker/deploy@b6441b8]: Striker hacked fix for T272410
20:27 legoktm: depooling mw1277.eqiad.wmnet for perf testing
19:42 Urbanecm: Morning B&C done
19:41 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/GrowthExperiments/includes/Specials/SpecialHomepage.php: 1acaba4: SpecialHomepage: Do not load start-startediting if SE arent enabled (T273243) (duration: 01m 05s)
19:39 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/GrowthExperiments/includes/HomepageModules/Banner.php: d39746a: Banner module: Switch to using activated/unactivated for state (T273084) (duration: 01m 05s)
19:23 mutante: gerrit2001 - restarting gerrit (replica)
19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 6360e78: Enable DiscussionTools as a beta feature on 3 wikis per request (T258554; T265829; T273192) (duration: 01m 04s)
19:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a98f08f: Enable DiscussionTools as a beta feature on wikis with language variants (T272639) (duration: 01m 07s)
18:57 mutante: restarting gerrit for change 660030 (no ticket)
18:44 mutante: new Wikimedia project language "mni" added - Meitei is a Sino-Tibetan language and the predominant language and lingua franca of the state of Manipur in northeastern India.
18:03 mutante: ping3001 - apt-get clean; apt-get autoremove; let it finish kernel upgrade; was out of disk
17:59 mutante: ping 2001 - apt-get clean; apt autoremove - was out of disk as well
17:52 mutante: ping1001 - apt-get clean gets back 447M - it was out of disk completely, now 84% usage
17:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
17:29 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
17:17 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deneb.codfw.wmnet
17:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5001.wikimedia.org
17:15 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host deneb.codfw.wmnet
17:15 pt1979@cumin2001: START - Cookbook sre.dns.netbox
17:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4001.wikimedia.org
17:14 mutante: decom'ing francium.eqiad.wmnet, formerly HTML dumps server, replaced by htmldumper1001
17:13 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2003.wikimedia.org
17:12 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
17:12 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1003.wikimedia.org
17:12 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3001.wikimedia.org
17:12 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install5001.wikimedia.org
17:10 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install4001.wikimedia.org
17:10 sukhe: upload dnsdist_1.5.1-3wm1 to apt.wm.o (buster) - T252132
17:09 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install3001.wikimedia.org
17:09 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host install2003.wikimedia.org
17:09 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install2003.wikimedia.org
17:08 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host install2003.wikimedia.org
17:07 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host install1003.wikimedia.org
17:06 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1001.wikimedia.org
17:03 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1001.wikimedia.org
17:01 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host idp1001.wikimedia.org
17:01 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host apt1001.wikimedia.org
16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2001.wikimedia.org
16:52 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2001.wikimedia.org
16:50 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
16:46 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host idp2001.wikimedia.org
16:46 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
16:44 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1001.eqiad.wmnet
16:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki2001.codfw.wmnet
16:43 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
16:42 jbond42: enable puppet fleet wide to post reboots
16:35 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
16:35 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
16:34 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host pki1001.eqiad.wmnet
16:34 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster2003.codfw.wmnet
16:34 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host pki2001.codfw.wmnet
16:33 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetmaster2001.codfw.wmnet
16:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster2002.codfw.wmnet
16:33 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb2002.codfw.wmnet
16:28 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetdb2002.codfw.wmnet
16:28 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2003.codfw.wmnet
16:28 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2002.codfw.wmnet
16:28 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2001.codfw.wmnet
16:28 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetmaster1001.eqiad.wmnet
16:26 jbond@cumin2001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetdb1002.eqiad.wmnet
16:22 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1002.eqiad.wmnet
16:21 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1003.eqiad.wmnet
16:15 XioNoX: fail-back RG1 back to node1 on pfw3-eqiad - T263833
16:14 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1001.eqiad.wmnet
16:14 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1002.eqiad.wmnet
16:14 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1003.eqiad.wmnet
16:13 jbond@cumin2001: START - Cookbook sre.hosts.reboot-single for host puppetdb1002.eqiad.wmnet
16:12 jbond42: disable puppet fleet wide to preform reboots
16:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
16:05 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
16:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
16:03 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14110 and previous config saved to /var/cache/conftool/dbconfig/20210201-160122-root.json
15:59 jbond42: install buster kernel update
15:46 XioNoX: failover RG1 back to node0 on pfw3-eqiad - T263833
15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 80%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14109 and previous config saved to /var/cache/conftool/dbconfig/20210201-154618-root.json
15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 60%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14108 and previous config saved to /var/cache/conftool/dbconfig/20210201-153115-root.json
15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 40%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14107 and previous config saved to /var/cache/conftool/dbconfig/20210201-151611-root.json
15:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 20%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14106 and previous config saved to /var/cache/conftool/dbconfig/20210201-150107-root.json
14:53 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
14:53 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=codfw
14:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 10%: Repool db1147 after a restart', diff saved to https://phabricator.wikimedia.org/P14105 and previous config saved to /var/cache/conftool/dbconfig/20210201-144604-root.json
14:40 marostegui: Restart mysql on db1147 T266483
14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147 T266483', diff saved to https://phabricator.wikimedia.org/P14104 and previous config saved to /var/cache/conftool/dbconfig/20210201-143925-marostegui.json
14:12 ladsgroup@deploy1001: Finished scap: Add Multilingual Wikisource to list of Wikidata's special sites (T138332) (duration: 21m 52s)
13:50 ladsgroup@deploy1001: Started scap: Add Multilingual Wikisource to list of Wikidata's special sites (T138332)
13:47 ladsgroup@deploy1001: scap sync-l10n completed (1.36.0-wmf.28) (duration: 00m 58s)
13:27 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.28 (duration: 01m 03s)
13:26 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.28
12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14102 and previous config saved to /var/cache/conftool/dbconfig/20210201-124308-root.json
12:42 urbanecm@deploy1001: Synchronized wmf-config/logos.php: d70e8ac: Update ombudsmenwiki logo (3/3) (duration: 01m 05s)
12:42 Urbanecm: Purge 'https://en.wikipedia.org/static/images/project-logos/ombudsmenwiki.png' (T273323)
12:41 urbanecm@deploy1001: Synchronized logos/config.yaml: d70e8ac: Update ombudsmenwiki logo (2/3) (duration: 01m 04s)
12:40 urbanecm@deploy1001: Synchronized static/images/project-logos/: d70e8ac: Update ombudsmenwiki logo (1/3) (duration: 01m 05s)
12:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cf34936: ombudsmenwiki: Set sitename to "Ombuds Commission" (T273323) (duration: 01m 06s)
12:35 urbanecm@deploy1001: Synchronized static/images/project-logos/: Regenerate a couple of logos from Commons (2/2) (duration: 01m 08s)
12:34 urbanecm@deploy1001: Synchronized logos/config.yaml: Regenerate a couple of logos from Commons (1/2) (duration: 01m 07s)
12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14100 and previous config saved to /var/cache/conftool/dbconfig/20210201-122804-root.json
12:25 urbanecm@deploy1001: Synchronized docroot/noc/createTxtFileSymlinks.sh: ec5b6d2: Publish logos.php at noc.wikimedia.org (2/2; T273330) (duration: 01m 05s)
12:24 urbanecm@deploy1001: Synchronized docroot/noc/conf/logos.php.txt: ec5b6d2: Publish logos.php at noc.wikimedia.org (1/2; T273330) (duration: 01m 04s)
12:20 Lucas_WMDE: EU backport&config window done
12:19 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: wikidata: post edit constraint jobs on 40% of edits (T204031) (duration: 01m 03s)
12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 60%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14099 and previous config saved to /var/cache/conftool/dbconfig/20210201-121301-root.json
12:12 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9836287e0, 424efdcdb: [WikibaseMediaInfo] Set wgMediaInfoMediaSearchHasLtrPlugin & wgMediaInfoMediaSearchConceptChipsSimpleHeuristics (duration: 01m 10s)
11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14098 and previous config saved to /var/cache/conftool/dbconfig/20210201-115757-root.json
11:50 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=codfw
11:49 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=^swift,name=codfw
11:47 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 04s)
11:46 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 14s)
11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 30%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14097 and previous config saved to /var/cache/conftool/dbconfig/20210201-114254-root.json
11:28 XioNoX: push pfw policies - T272073
11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14096 and previous config saved to /var/cache/conftool/dbconfig/20210201-112750-root.json
11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 20%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14095 and previous config saved to /var/cache/conftool/dbconfig/20210201-111246-root.json
11:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.28
11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14094 and previous config saved to /var/cache/conftool/dbconfig/20210201-110102-root.json
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 15%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14093 and previous config saved to /var/cache/conftool/dbconfig/20210201-105743-root.json
10:54 hashar@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.28 (duration: 07m 48s)
10:46 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.28
10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14092 and previous config saved to /var/cache/conftool/dbconfig/20210201-104559-root.json
10:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1047.eqiad.wmnet with reason: reboot
10:45 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be1047.eqiad.wmnet with reason: reboot
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 12%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14091 and previous config saved to /var/cache/conftool/dbconfig/20210201-104240-root.json
10:42 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/includes/: Fixing T273317 T273296 (duration: 01m 01s)
10:41 urbanecm@deploy1001: sync-file aborted: Fixing T273317 T273296 (duration: 00m 12s)
10:39 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/includes/user//User.php: Fixing T273317 T273296 (duration: 00m 58s)
10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 60%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14090 and previous config saved to /var/cache/conftool/dbconfig/20210201-103055-root.json
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14089 and previous config saved to /var/cache/conftool/dbconfig/20210201-102736-root.json
10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14088 and previous config saved to /var/cache/conftool/dbconfig/20210201-101552-root.json
10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 9%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14087 and previous config saved to /var/cache/conftool/dbconfig/20210201-101233-root.json
10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 30%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14086 and previous config saved to /var/cache/conftool/dbconfig/20210201-100048-root.json
09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 8%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14085 and previous config saved to /var/cache/conftool/dbconfig/20210201-095729-root.json
09:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es4 into writes T266483 (duration: 00m 56s)
09:46 marostegui: Restart mysql on es1021 T266483
09:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es4 from writes T266483 (duration: 01m 04s)
09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14084 and previous config saved to /var/cache/conftool/dbconfig/20210201-094545-root.json
09:42 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 7%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14083 and previous config saved to /var/cache/conftool/dbconfig/20210201-094226-root.json
09:39 elukey@cumin1001: START - Cookbook sre.dns.netbox
09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 20%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14082 and previous config saved to /var/cache/conftool/dbconfig/20210201-093041-root.json
09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 6%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14081 and previous config saved to /var/cache/conftool/dbconfig/20210201-092722-root.json
09:27 dcausse: restarting blazegraph on wdqs1013
09:24 XioNoX: renumber gr-3/3/0.1 local endpoint on cr1-eqiad
09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 15%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14080 and previous config saved to /var/cache/conftool/dbconfig/20210201-091538-root.json
09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14079 and previous config saved to /var/cache/conftool/dbconfig/20210201-091218-root.json
09:04 gilles@deploy1001: Finished deploy [performance/navtiming@3215510]: T271208 browser_minor is needed for Mobile Safari allowlist (duration: 00m 05s)
09:04 gilles@deploy1001: Started deploy [performance/navtiming@3215510]: T271208 browser_minor is needed for Mobile Safari allowlist
09:03 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1054.eqiad.wmnet with reason: reboot
09:03 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be1054.eqiad.wmnet with reason: reboot
09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 12%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14078 and previous config saved to /var/cache/conftool/dbconfig/20210201-090034-root.json
09:00 elukey@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 3%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14077 and previous config saved to /var/cache/conftool/dbconfig/20210201-085714-root.json
08:56 marostegui: Stop MySQL on db1089 - T273417
08:53 gilles@deploy1001: Finished deploy [performance/navtiming@1e02d76]: T271208 Add more debug logging (duration: 00m 05s)
08:53 gilles@deploy1001: Started deploy [performance/navtiming@1e02d76]: T271208 Add more debug logging
08:53 elukey@cumin1001: START - Cookbook sre.dns.netbox
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14075 and previous config saved to /var/cache/conftool/dbconfig/20210201-084531-root.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from dbctl T273417', diff saved to https://phabricator.wikimedia.org/P14074 and previous config saved to /var/cache/conftool/dbconfig/20210201-084523-marostegui.json
08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 4%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14073 and previous config saved to /var/cache/conftool/dbconfig/20210201-084211-root.json
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 7%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14072 and previous config saved to /var/cache/conftool/dbconfig/20210201-082933-root.json
08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 2%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14071 and previous config saved to /var/cache/conftool/dbconfig/20210201-082707-root.json
08:17 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837
08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1166 with minimal weight for the first time T258361', diff saved to https://phabricator.wikimedia.org/P14070 and previous config saved to /var/cache/conftool/dbconfig/20210201-081554-marostegui.json
08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 5%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14069 and previous config saved to /var/cache/conftool/dbconfig/20210201-081429-root.json
08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1166 to dbctl, depooled T258361', diff saved to https://phabricator.wikimedia.org/P14068 and previous config saved to /var/cache/conftool/dbconfig/20210201-080520-marostegui.json
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 3%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14067 and previous config saved to /var/cache/conftool/dbconfig/20210201-075926-root.json
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 2%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14066 and previous config saved to /var/cache/conftool/dbconfig/20210201-074422-root.json
07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1175 with some more minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14065 and previous config saved to /var/cache/conftool/dbconfig/20210201-073603-marostegui.json
07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 100%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14064 and previous config saved to /var/cache/conftool/dbconfig/20210201-070429-root.json
06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 75%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14063 and previous config saved to /var/cache/conftool/dbconfig/20210201-064926-root.json
06:39 marostegui: Run analyze table on db2071 and db2102
06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 50%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14062 and previous config saved to /var/cache/conftool/dbconfig/20210201-063422-root.json
06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1175 to dbctl, depooled T258361', diff saved to https://phabricator.wikimedia.org/P14061 and previous config saved to /var/cache/conftool/dbconfig/20210201-062358-marostegui.json
06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 25%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14060 and previous config saved to /var/cache/conftool/dbconfig/20210201-061919-root.json
06:10 marostegui: Upgrade db2071 and db2102 to 10.4.18
06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 10%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14059 and previous config saved to /var/cache/conftool/dbconfig/20210201-060415-root.json
05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094', diff saved to https://phabricator.wikimedia.org/P14058 and previous config saved to /var/cache/conftool/dbconfig/20210201-055851-marostegui.json

2021-01-29

23:26 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
22:36 dancy@deploy1001: Finished scap: MW servers complaining about l10n files after .27 rollback (duration: 07m 22s)
22:29 dancy@deploy1001: Started scap: MW servers complaining about l10n files after .27 rollback
22:26 dancy@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
22:20 reedy@deploy1001: Synchronized php-1.36.0-wmf.27/includes/parser/CacheTime.php: CacheTime: Extra protection for rollback unserialization T273007 (duration: 01m 00s)
22:14 dancy@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.28
22:09 dancy@deploy1001: scap failed: average error rate on 8/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
21:42 razzi: rebalance kafka partitions for codfw.resource_change
21:40 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
19:26 razzi@cumin1001: END (FAIL) - Cookbook sre.kafka.reboot-workers (exit_code=99) for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
19:26 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
18:50 hashar: CI slightly overloaded due to a surge of library updates but is otherwise processing changes
17:31 reedy@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/WikiEditor/modules/jquery.wikiEditor.toolbar.config.js: T273231 (duration: 01m 02s)
16:56 effie: depool mw1403 and mw1405
15:46 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-presto1001.eqiad.wmnet
15:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-presto1001.eqiad.wmnet
14:58 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1007.eqiad.wmnet with reason: REIMAGE
14:56 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1007.eqiad.wmnet with reason: REIMAGE
13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:48 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
12:38 hnowlan: uploaded osmborder_0.1.0-2~buster0 package to buster-wikimedia
12:00 gilles@deploy1001: Finished deploy [performance/coal@b0d3b59]: T271208 Filter out canary events (duration: 00m 06s)
12:00 gilles@deploy1001: Started deploy [performance/coal@b0d3b59]: T271208 Filter out canary events
11:42 dcausse@deploy1001: Synchronized wmf-config/unitConversionConfig.json: T270252: Update unitConversionConfig.json (duration: 01m 01s)
11:39 gilles@deploy1001: Finished deploy [performance/navtiming@ae8310a]: T271208 Fix canary event check (duration: 00m 05s)
11:39 gilles@deploy1001: Started deploy [performance/navtiming@ae8310a]: T271208 Fix canary event check
11:26 gilles@deploy1001: Finished deploy [performance/navtiming@e7712c3]: T271208 Log instead of hard error on missing wiki field (duration: 00m 06s)
11:26 gilles@deploy1001: Started deploy [performance/navtiming@e7712c3]: T271208 Log instead of hard error on missing wiki field
11:06 gilles@deploy1001: Finished deploy [performance/navtiming@125f6be]: T271208 Ignore canary events (duration: 00m 05s)
11:06 gilles@deploy1001: Started deploy [performance/navtiming@125f6be]: T271208 Ignore canary events
11:04 elukey: upload presto-* version 0.246-1 packages to buster/stretch-wikimedia
10:54 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
10:45 jynus@cumin1001: START - Cookbook sre.hosts.decommission
10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14050 and previous config saved to /var/cache/conftool/dbconfig/20210129-103505-root.json
10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14049 and previous config saved to /var/cache/conftool/dbconfig/20210129-102001-root.json
10:18 vgutierrez: pool cp5006
10:17 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14048 and previous config saved to /var/cache/conftool/dbconfig/20210129-100458-root.json
09:51 jynus@cumin1001: START - Cookbook sre.hosts.decommission
09:50 vgutierrez: reboot cp5006
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14047 and previous config saved to /var/cache/conftool/dbconfig/20210129-094954-root.json
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14046 and previous config saved to /var/cache/conftool/dbconfig/20210129-093451-root.json
09:32 marostegui: Expand lvs on db1155-db1175 T258361
09:31 vgutierrez: depool cp5006
08:20 marostegui: Change buffer pool sizes on clouddb1013,1015,1017,1019 T267090
07:11 marostegui: Upgrade pc2007 to 10.4.18 T268457
06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to clone db1175', diff saved to https://phabricator.wikimedia.org/P14044 and previous config saved to /var/cache/conftool/dbconfig/20210129-065529-marostegui.json
03:35 marostegui: Reload haproxy1018
02:42 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet
02:42 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2252.codfw.wmnet
02:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2252.codfw.wmnet
02:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2251.codfw.wmnet
02:04 krinkle@deploy1001: Synchronized wmf-config/profiler.php: If0c71a983772c (duration: 00m 58s)
01:49 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2252.codfw.wmnet with reason: REIMAGE
01:48 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2251.codfw.wmnet with reason: REIMAGE
01:46 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2252.codfw.wmnet with reason: REIMAGE
01:46 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2251.codfw.wmnet with reason: REIMAGE
01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2253.codfw.wmnet
01:07 mutante: repooled mw2248,mw2249 - jobrunners/videoscalers now on buster
01:06 mutante: repooled mw2048,mw2049 - jobrunners/videoscalers now on buster
01:06 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
01:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2249.codfw.wmnet
01:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2248.codfw.wmnet
01:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2249.codfw.wmnet
01:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2248.codfw.wmnet
00:19 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2261.codfw.wmnet
00:14 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2262.codfw.wmnet
00:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2283.codfw.wmnet

2021-01-28

23:58 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2261.codfw.wmnet
23:58 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2262.codfw.wmnet
23:57 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2283.codfw.wmnet
23:52 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2253.codfw.wmnet with reason: REIMAGE
23:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2253.codfw.wmnet with reason: REIMAGE
23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2248.codfw.wmnet with reason: REIMAGE
23:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2249.codfw.wmnet with reason: REIMAGE
23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2248.codfw.wmnet with reason: REIMAGE
23:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2249.codfw.wmnet with reason: REIMAGE
23:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2283.codfw.wmnet with reason: reimaging
23:34 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2283.codfw.wmnet with reason: reimaging
23:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2262.codfw.wmnet with reason: REIMAGE
23:31 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2283.codfw.wmnet with reason: REIMAGE
23:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2261.codfw.wmnet with reason: REIMAGE
23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2283.codfw.wmnet with reason: REIMAGE
23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2262.codfw.wmnet with reason: REIMAGE
23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2261.codfw.wmnet with reason: REIMAGE
23:14 mutante: reimaging jobrunners/videoscallers mw2248,mw2249
22:43 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/includes/parser/CacheTime.php: CacheTime: Extra protection for rollback unserialization (T273007) (duration: 00m 57s)
22:41 bblack: eqiad lvs should be back to normal state now with everything working
22:39 bblack: lvs1014 - apply https://gerrit.wikimedia.org/r/659439
22:37 bblack: lvs1013 - testing https://gerrit.wikimedia.org/r/659439 (expect nop, worked on 1015!)
22:36 bblack: lvs1015 - testing https://gerrit.wikimedia.org/r/659439 (expect nop)
22:21 bblack: lvs1016 - trying https://gerrit.wikimedia.org/r/659439 on backup LVS...
22:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2287.codfw.wmnet
22:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2286.codfw.wmnet
22:20 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2285.codfw.wmnet
22:20 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2284.codfw.wmnet
22:16 bblack: disabling puppet on all eqiad lvs for https://gerrit.wikimedia.org/r/659439 risks
22:03 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2284.codfw.wmnet
22:03 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2285.codfw.wmnet
22:02 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2286.codfw.wmnet
22:02 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2287.codfw.wmnet
21:33 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
21:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1172.eqiad.wmnet with reason: REIMAGE
21:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: REIMAGE
21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1172.eqiad.wmnet with reason: REIMAGE
21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: REIMAGE
21:28 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.28
21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2287.codfw.wmnet with reason: reimaging
21:28 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2287.codfw.wmnet with reason: reimaging
21:27 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2285.codfw.wmnet with reason: reimaging
21:27 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2285.codfw.wmnet with reason: reimaging
21:27 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2284.codfw.wmnet with reason: REIMAGE
21:25 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2286.codfw.wmnet with reason: REIMAGE
21:23 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2285.codfw.wmnet with reason: REIMAGE
21:23 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2287.codfw.wmnet with reason: REIMAGE
21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2284.codfw.wmnet with reason: REIMAGE
21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2285.codfw.wmnet with reason: REIMAGE
21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2287.codfw.wmnet with reason: REIMAGE
21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2286.codfw.wmnet with reason: REIMAGE
21:19 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.28 (duration: 01m 05s)
21:17 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.28
21:15 brennen: 1.36.0-wmf.28 train status (T271342): blockers resolved, going go group1 to be follow shortly by all wikis
21:11 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/CentralAuth/includes/: Backport: Revert CentralAuthCreateLocalAccountJob changes in 9f79de4 (T273205) (duration: 01m 09s)
20:49 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/tests/phpunit/includes/parser/ParserOptionsTest.php: Backport: Make ParserOptions::isSafeToCache more robust (T273120) (duration: 01m 07s)
20:46 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/includes/parser/ParserOptions.php: Backport: Make ParserOptions::isSafeToCache more robust (T273120) (duration: 01m 08s)
20:25 bblack: lvs1014,lvs1016 - all back to "normal" state
20:24 bblack: lvs1014 - restart pybal
20:20 bblack: lvs1016 - restart pybal
20:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@911731d]: write articletopic and drafttopic to hourly tables (duration: 01m 44s)
20:13 bblack: lvs1014,lvs1016 - puppet temporarily disabled for new service config deploy - T271476
20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2223.codfw.wmnet
20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2247.codfw.wmnet
20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1264.eqiad.wmnet
20:13 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@911731d]: write articletopic and drafttopic to hourly tables
20:13 mutante: scap pulling and repooling: mw1264, mw2223, mw2247
20:11 bstorm@cumin1001: conftool action : set/pooled=yes; selector: name=dbproxy1019.eqiad.wmnet
20:10 bstorm@cumin1001: conftool action : set/pooled=yes; selector: name=dbproxy1018.eqiad.wmnet
20:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2223.codfw.wmnet
20:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2247.codfw.wmnet
20:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1264.eqiad.wmnet
19:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
19:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
19:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ba1acd6]: airflow: start ores_predictions_daily one day earlier (duration: 01m 09s)
19:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ba1acd6]: airflow: start ores_predictions_daily one day earlier
19:45 Urbanecm: Run mwscript namespaceDupes.php --wiki=frwikisource --add-prefix=BROKEN --fix (T271939)
19:44 Urbanecm: Run mwscript namespaceDupes.php --wiki=frwikisource --fix (T271939)
19:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0ae4909: frwikisource: Add WS as an alias to NS_PROJECT (T271939) (duration: 00m 57s)
19:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fd18092: Add image.laji.fi to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T270587) (duration: 01m 04s)
19:36 jynus: extending backup1001 /dev/mapper/array1-archive partition to allocate enough space for helium backups T238048
19:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 519350b: frwiktionary: Change babel category names per community request (T270186) (duration: 00m 59s)
19:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
19:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3d0ca3a: Create patroller user group for thwiki (T272149) (duration: 01m 07s)
19:20 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
19:19 mforns@deploy1001: Finished deploy [analytics/refinery@1e41f60] (thin): Regular analytics weekly train THIN [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562] (duration: 00m 08s)
19:19 mforns@deploy1001: Started deploy [analytics/refinery@1e41f60] (thin): Regular analytics weekly train THIN [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562]
19:15 mforns@deploy1001: Finished deploy [analytics/refinery@1e41f60]: Regular analytics weekly train [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562] (duration: 16m 53s)
19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e914f1e: robots: cawikimedia: Set wgDefaultRobotPolicy to noindex,nofollow (T272871) (duration: 01m 08s)
19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2247.codfw.wmnet with reason: REIMAGE
19:10 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@0742443]: hourly partitioning for ores tables (duration: 01m 25s)
19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2223.codfw.wmnet with reason: REIMAGE
19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2247.codfw.wmnet with reason: REIMAGE
19:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@0742443]: hourly partitioning for ores tables
19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2223.codfw.wmnet with reason: REIMAGE
19:07 cdanis: decom Zayo IP transit on cr2-codfw T272675
19:06 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for mediawiki_revision_recommendation_create (duration: 01m 12s)
19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
19:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
18:58 cdanis: draining traffic from Zayo OGYX/123447 codfw<>ulsfo in preparation for decommission 🥃 T272675
18:58 mforns@deploy1001: Started deploy [analytics/refinery@1e41f60]: Regular analytics weekly train [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562]
18:58 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Remove T257687 mitigations (duration: 01m 10s)
18:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1159.eqiad.wmnet with reason: REIMAGE
18:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1159.eqiad.wmnet with reason: REIMAGE
18:34 mutante: reimaging another canary appserver, mw1264, so that we will have at least 2 stretch and 2 buster canaries for the transitional period
18:30 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:26 bblack@cumin1001: START - Cookbook sre.dns.netbox
17:49 jgleeson: fundraising-tools tools updated from 41cab089da to d64b2f8cee
17:38 crusnov@deploy1001: Finished deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next T265084 (duration: 01m 18s)
17:37 crusnov@deploy1001: Started deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next T265084
17:35 crusnov@deploy1001: Started deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next T265084
17:28 ebernhardson: ban elastic1063 from production-search-omega-eqiad and production-search-eqiad T265113
17:11 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 06s)
16:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
16:51 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
16:51 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
16:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
16:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
16:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
16:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
16:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
16:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
16:45 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
16:44 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
16:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
16:44 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
16:41 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
16:41 arturo: running homer on cr*-eqiad* again for reverting latest changes (T271476)
16:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'plain' .
16:26 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
16:25 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'plain' .
16:25 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'staging' .
16:24 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
16:24 akosiaris: stop scraping apertium from prometheus, it doesn't have a prometheus endpoint.
16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'plain' .
16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
16:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
16:17 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
16:06 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
16:03 arturo: running homer on cr*-eqiad* for T271476
15:55 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
15:54 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
15:52 cdanis: draining traffic from Zayo OGYX/120003 codfw<>eqiad in preparation for decommission 🥃 T272675
15:49 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
15:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d0a6933]: align threshold path references across days (duration: 01m 15s)
15:49 marostegui: Power off clouddb1019 for memory replacement T272125
15:48 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d0a6933]: align threshold path references across days
15:25 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate NavigationTiming schemas to Event Platform on all wikis - T271208 (duration: 01m 11s)
15:06 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
15:05 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
14:26 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
14:14 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148 after kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14039 and previous config saved to /var/cache/conftool/dbconfig/20210128-141425-marostegui.json
13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14038 and previous config saved to /var/cache/conftool/dbconfig/20210128-135730-marostegui.json
13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14037 and previous config saved to /var/cache/conftool/dbconfig/20210128-135612-root.json
13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14036 and previous config saved to /var/cache/conftool/dbconfig/20210128-135602-root.json
13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14035 and previous config saved to /var/cache/conftool/dbconfig/20210128-134109-root.json
13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14034 and previous config saved to /var/cache/conftool/dbconfig/20210128-134057-root.json
13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14033 and previous config saved to /var/cache/conftool/dbconfig/20210128-132605-root.json
13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14032 and previous config saved to /var/cache/conftool/dbconfig/20210128-132553-root.json
13:17 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - T272837
13:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14031 and previous config saved to /var/cache/conftool/dbconfig/20210128-131101-root.json
13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14030 and previous config saved to /var/cache/conftool/dbconfig/20210128-131050-root.json
12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1024's weight', diff saved to https://phabricator.wikimedia.org/P14029 and previous config saved to /var/cache/conftool/dbconfig/20210128-125631-marostegui.json
12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14028 and previous config saved to /var/cache/conftool/dbconfig/20210128-125558-root.json
12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14027 and previous config saved to /var/cache/conftool/dbconfig/20210128-125546-root.json
12:48 dcausse: European mid-day backport window done
12:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14026 and previous config saved to /var/cache/conftool/dbconfig/20210128-123800-root.json
12:32 dcausse@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CirrusSearch/: Add an option to limit the size of the file_text field: T271493 (duration: 01m 09s)
12:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 80%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14025 and previous config saved to /var/cache/conftool/dbconfig/20210128-122256-root.json
12:22 marostegui: Reboot db1146:3312 db1146:3314
12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312, db1146:3314 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14024 and previous config saved to /var/cache/conftool/dbconfig/20210128-122118-marostegui.json
12:12 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T271493: [cirrus] set 50kb limit on file text indexing for commons (duration: 01m 09s)
12:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 70%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14023 and previous config saved to /var/cache/conftool/dbconfig/20210128-120752-root.json
12:07 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T266027: [cirrus] Swith to perfield builder for spaceless languages (duration: 01m 06s)
11:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14022 and previous config saved to /var/cache/conftool/dbconfig/20210128-115249-root.json
11:45 gilles@deploy1001: Finished deploy [performance/navtiming@446e5df]: (no justification provided) (duration: 00m 05s)
11:45 gilles@deploy1001: Started deploy [performance/navtiming@446e5df]: (no justification provided)
11:37 vgutierrez: upgrade pybal to 1.15.9 in esams
11:30 elukey: disable nginx proxy buffering on archiva.wikimedia.org for a perf test - T252767
11:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 30%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14020 and previous config saved to /var/cache/conftool/dbconfig/20210128-112242-root.json
11:21 vgutierrez: upgrade pybal to 1.15.9 in eqiad
11:20 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - T272837
11:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14019 and previous config saved to /var/cache/conftool/dbconfig/20210128-110739-root.json
11:04 marostegui: Restart mysql on es1025 T266483
11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1025 T266483', diff saved to https://phabricator.wikimedia.org/P14018 and previous config saved to /var/cache/conftool/dbconfig/20210128-110353-marostegui.json
11:01 _joe_: restarting php-fpm on the appserver,api and jobrunner clusters in eqiad, 10% at a time, for simulating scap rolling restarts T266055
10:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es5 on writes T266483 (duration: 01m 05s)
10:46 marostegui: Restart mysql on es1024 T266483
10:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es5 from writes T266483 (duration: 01m 09s)
10:33 _joe_: performing a test-run of the rolling restart of php-fpm in codfw, using the same code scap will use T266055. Starting from the api cluster, then proceeding whith others
10:15 _joe_: upgrading pybal on lvs2008
10:11 _joe_: upgrading pybal on lvs2009
10:10 vgutierrez: upgrade pybal to 1.15.9 in eqsin
09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14017 and previous config saved to /var/cache/conftool/dbconfig/20210128-095642-root.json
09:48 _joe_: upgrading pybal to 1.15.9 in codfw, starting from lvs2010
09:47 jbond42: upload new cas package to apt
09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 80%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14016 and previous config saved to /var/cache/conftool/dbconfig/20210128-094139-root.json
09:30 _joe_: upgrading pybal on lvs4006
09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 70%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14015 and previous config saved to /var/cache/conftool/dbconfig/20210128-092635-root.json
09:25 _joe_: upgrading pybal on lvs4005
09:11 _joe_: installing pybal 1.15.9 on lvs4007
09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 60%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14014 and previous config saved to /var/cache/conftool/dbconfig/20210128-091131-root.json
09:08 moritzm: installing perf updates on Stretch
09:06 marostegui: Testing wikitech
09:00 _joe_: uploading pybal 1.15.9 to apt.wikimedia.org
08:58 moritzm: installing perf updates on Buster
08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14013 and previous config saved to /var/cache/conftool/dbconfig/20210128-085627-root.json
08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 40%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14012 and previous config saved to /var/cache/conftool/dbconfig/20210128-084123-root.json
08:34 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - T272837
08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14011 and previous config saved to /var/cache/conftool/dbconfig/20210128-083347-root.json
08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14010 and previous config saved to /var/cache/conftool/dbconfig/20210128-083337-root.json
08:32 vgutierrez: pool cp1087 - T273153
08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 30%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14009 and previous config saved to /var/cache/conftool/dbconfig/20210128-082620-root.json
08:20 vgutierrez: restart purged on cp1087 - T273153
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14008 and previous config saved to /var/cache/conftool/dbconfig/20210128-081843-root.json
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14007 and previous config saved to /var/cache/conftool/dbconfig/20210128-081834-root.json
08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 20%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14006 and previous config saved to /var/cache/conftool/dbconfig/20210128-081116-root.json
08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14005 and previous config saved to /var/cache/conftool/dbconfig/20210128-080340-root.json
08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14004 and previous config saved to /var/cache/conftool/dbconfig/20210128-080330-root.json
07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 15%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14003 and previous config saved to /var/cache/conftool/dbconfig/20210128-075613-root.json
07:54 moritzm: installing tomcat9 security updates
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14002 and previous config saved to /var/cache/conftool/dbconfig/20210128-074836-root.json
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14001 and previous config saved to /var/cache/conftool/dbconfig/20210128-074827-root.json
07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1169 some more minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P14000 and previous config saved to /var/cache/conftool/dbconfig/20210128-073426-marostegui.json
07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13999 and previous config saved to /var/cache/conftool/dbconfig/20210128-073333-root.json
07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13998 and previous config saved to /var/cache/conftool/dbconfig/20210128-073323-root.json
07:25 elukey: powercycle cp1087 (after depooling it)
07:24 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet
07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13997 and previous config saved to /var/cache/conftool/dbconfig/20210128-072154-marostegui.json
07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13996 and previous config saved to /var/cache/conftool/dbconfig/20210128-072120-marostegui.json
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1169 some more minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P13995 and previous config saved to /var/cache/conftool/dbconfig/20210128-072036-marostegui.json
06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1169 to s1 for the first time, with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P13994 and previous config saved to /var/cache/conftool/dbconfig/20210128-063806-marostegui.json
06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1169 to dbctl T258361', diff saved to https://phabricator.wikimedia.org/P13993 and previous config saved to /var/cache/conftool/dbconfig/20210128-063655-marostegui.json
03:03 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
03:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2291.codfw.wmnet
02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2290.codfw.wmnet
02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2288.codfw.wmnet
02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2288.codfw.wmnet
02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2290.codfw.wmnet
02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2291.codfw.wmnet
02:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
02:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
01:35 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2291.codfw.wmnet with reason: REIMAGE
01:35 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
01:35 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
01:33 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2291.codfw.wmnet with reason: REIMAGE
01:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2290.codfw.wmnet with reason: REIMAGE
01:32 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
01:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
01:32 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
01:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
01:31 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2288.codfw.wmnet with reason: REIMAGE
01:31 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2290.codfw.wmnet with reason: REIMAGE
01:31 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2288.codfw.wmnet with reason: REIMAGE
01:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
01:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
01:10 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2294.codfw.wmnet
01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2293.codfw.wmnet
01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2292.codfw.wmnet
00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2294.codfw.wmnet
00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2293.codfw.wmnet
00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2292.codfw.wmnet
00:50 Urbanecm: Evening B&C done
00:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 87c304c: Disable max-width on page namespace for wikisource (T260091; 2nd take) (duration: 01m 00s)
00:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1404.eqiad.wmnet
00:41 foks: reset email for User:Uwe Martens
00:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1404.eqiad.wmnet
00:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1404.wmnet
00:33 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/includes/: c5c39ba: Fix fetching ipblock-exempt within BlockManager::getUserBlock (T271551, T270145) (duration: 01m 04s)
00:32 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2293.codfw.wmnet with reason: reimaging
00:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2293.codfw.wmnet with reason: reimaging
00:31 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/includes/: a67fe4f: Fix fetching ipblock-exempt within BlockManager::getUserBlock (T271551, T270145) (duration: 01m 07s)
00:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2292.codfw.wmnet with reason: REIMAGE
00:26 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/GrowthExperiments/includes/HomepageModules/BaseModule.php: 5417e0c: Fix BaseModule::BASE_CSS_CLASS visibility (T273099) (duration: 01m 00s)
00:26 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2294.codfw.wmnet with reason: REIMAGE
00:24 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2293.codfw.wmnet with reason: REIMAGE
00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2292.codfw.wmnet with reason: REIMAGE
00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2293.codfw.wmnet with reason: REIMAGE
00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2294.codfw.wmnet with reason: REIMAGE
00:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
00:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1404.eqiad.wmnet with reason: REIMAGE
00:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1404.eqiad.wmnet with reason: REIMAGE
00:12 urbanecm@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty

2021-01-27

23:30 shdubsh: reboot logstash2006
22:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2246.codfw.wmnet
22:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2246.codfw.wmnet
22:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2222.codfw.wmnet
22:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2222.codfw.wmnet
22:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1405.eqiad.wmnet
22:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1405.eqiad.wmnet
21:57 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.28
21:51 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae24e12]: repoint ores thresholds to yesterday (duration: 02m 23s)
21:48 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae24e12]: repoint ores thresholds to yesterday
21:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily task (duration: 07m 54s)
21:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily task
21:09 ebernhardson@deploy1001: deploy aborted: airflow: hourly tasks must wait for yesterdays daily tank (duration: 00m 00s)
21:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily tank
20:58 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/includes/libs/objectcache/RedisBagOStuff.php: Backport: objectcache: fix broken for loop in RedisBagOStuff::doSetMulti() (T273006) (duration: 01m 07s)
20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2246.codfw.wmnet with reason: REIMAGE
20:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2246.codfw.wmnet with reason: REIMAGE
20:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2222.codfw.wmnet with reason: REIMAGE
20:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2222.codfw.wmnet with reason: REIMAGE
20:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2299.codfw.wmnet
20:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2217.codfw.wmnet
20:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2217.codfw.wmnet
20:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2221.codfw.wmnet
20:37 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2221.codfw.wmnet
20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1405.eqiad.wmnet with reason: REIMAGE
20:30 brennen: 1.36.0-wmf.28 (T271342): taking over train while dancy is afk; waiting on gerrit:658939 to merge and will sync for verification on testwikis
20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1405.eqiad.wmnet with reason: REIMAGE
20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2216.codfw.wmnet
20:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2218.codfw.wmnet
20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2219.codfw.wmnet
20:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1263.eqiad.wmnet
20:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2216.codfw.wmnet
20:07 urbanecm@deploy1001: Synchronized logos/config.yaml: 6c5dd65: Undeploy cswiki birthday logo (duration: 01m 05s)
20:06 urbanecm@deploy1001: Synchronized wmf-config/logos.php: 6c5dd65: Undeploy cswiki birthday logo (duration: 01m 06s)
20:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2218.codfw.wmnet
20:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2219.codfw.wmnet
20:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1263.eqiad.wmnet
19:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2221.codfw.wmnet with reason: REIMAGE
19:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2221.codfw.wmnet with reason: REIMAGE
19:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 53419ab: arwiki: Configure wgGEHomepageManualAssignmentMentorsList (T273060) (duration: 00m 59s)
19:19 elukey: reboot an-launcher1002 for kernel upgrades
19:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cabb2e2: Declare 6 more NavigationTiming eventlogging streams and migrate on testwiki (T271208) (duration: 01m 00s)
19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9382a98: Migrate WebUIActionsTracking schemas to Event Platform on all wikis (T267347,T271164) (duration: 01m 03s)
19:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2215.codfw.wmnet
18:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2215.codfw.wmnet
18:50 mutante: testreduce1001 - making nginx listen on IPv6 and restarting it T266509
18:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
18:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2219.codfw.wmnet with reason: REIMAGE
18:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2219.codfw.wmnet with reason: REIMAGE
18:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2218.codfw.wmnet with reason: REIMAGE
18:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2218.codfw.wmnet with reason: REIMAGE
18:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2217.codfw.wmnet with reason: REIMAGE
18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2217.codfw.wmnet with reason: REIMAGE
18:30 Tchanders: Creating the table securepoll_log in votewiki and testwiki (T271270)
18:25 hashar@deploy1001: Finished deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4 (duration: 00m 07s)
18:25 hashar@deploy1001: Started deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4
18:25 hashar@deploy1001: Finished deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4 (duration: 00m 10s)
18:25 hashar@deploy1001: Started deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4
18:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
18:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
18:15 dpifke@deploy1001: Finished deploy [performance/arc-lamp@e24f319]: Re-deploying ArcLamp to webperf1002 (duration: 00m 05s)
18:15 dpifke@deploy1001: Started deploy [performance/arc-lamp@e24f319]: Re-deploying ArcLamp to webperf1002
18:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2301.codfw.wmnet
18:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1406.eqiad.wmnet
18:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2216.codfw.wmnet with reason: REIMAGE
18:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1407.eqiad.wmnet
18:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2216.codfw.wmnet with reason: REIMAGE
18:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1407.eqiad.wmnet
18:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2301.codfw.wmnet
18:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1406.eqiad.wmnet
17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2215.codfw.wmnet with reason: REIMAGE
17:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2215.codfw.wmnet with reason: REIMAGE
17:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2301.codfw.wmnet with reason: REIMAGE
17:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2301.codfw.wmnet with reason: REIMAGE
17:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1406.eqiad.wmnet with reason: REIMAGE
17:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1407.eqiad.wmnet with reason: REIMAGE
17:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1406.eqiad.wmnet with reason: REIMAGE
17:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1407.eqiad.wmnet with reason: REIMAGE
17:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
17:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
16:54 elukey@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
16:40 elukey@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
16:38 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
16:21 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
16:18 moritzm: installing python-bottle security updates
15:42 elukey: umount /var/hadoop/data/r on an-worker1099 and restart hadoop daemons - T273034
15:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 5 NavigationTiming schemas to Event Platform on group0 and group1 - T271208 (duration: 01m 07s)
15:15 godog: bounce rsyslog on centrallog1001
13:52 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:52 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:48 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
13:48 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
13:43 kharlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
13:25 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - T272837
13:20 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - T272837
12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13989 and previous config saved to /var/cache/conftool/dbconfig/20210127-123300-root.json
12:25 awight: EU bacon done
12:25 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable bracket matching on the first wikis (T270238) (duration: 01m 07s)
12:20 awight@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CodeMirror: Backport: Improve matchbrackets performance when moving the cursor (T270317) (duration: 01m 06s)
12:19 awight@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/CodeMirror: Backport: Improve matchbrackets performance when moving the cursor (T270317) (duration: 01m 14s)
12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13988 and previous config saved to /var/cache/conftool/dbconfig/20210127-121756-root.json
12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13987 and previous config saved to /var/cache/conftool/dbconfig/20210127-120253-root.json
11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13986 and previous config saved to /var/cache/conftool/dbconfig/20210127-114749-root.json
11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13985 and previous config saved to /var/cache/conftool/dbconfig/20210127-113245-root.json
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13984 and previous config saved to /var/cache/conftool/dbconfig/20210127-105735-marostegui.json
10:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet
10:23 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet
10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2003.codfw.wmnet
10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with final weight T258361', diff saved to https://phabricator.wikimedia.org/P13982 and previous config saved to /var/cache/conftool/dbconfig/20210127-102042-marostegui.json
10:18 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2003.codfw.wmnet
10:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1004.eqiad.wmnet
10:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1004.eqiad.wmnet
10:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1003.eqiad.wmnet
10:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1003.eqiad.wmnet
10:05 elukey: reboot matomo1002 for kernel upgrades
10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight T258361', diff saved to https://phabricator.wikimedia.org/P13981 and previous config saved to /var/cache/conftool/dbconfig/20210127-100220-marostegui.json
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight T258361', diff saved to https://phabricator.wikimedia.org/P13980 and previous config saved to /var/cache/conftool/dbconfig/20210127-093802-marostegui.json
09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight T258361', diff saved to https://phabricator.wikimedia.org/P13979 and previous config saved to /var/cache/conftool/dbconfig/20210127-091909-marostegui.json
09:04 jbond42: deploy fix to enable-puppet
09:03 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - T272837
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight T258361', diff saved to https://phabricator.wikimedia.org/P13978 and previous config saved to /var/cache/conftool/dbconfig/20210127-083618-marostegui.json
08:29 marostegui: Stop mysql on db1089 to clone db1169 T258361
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 to clone db1169 T258361', diff saved to https://phabricator.wikimedia.org/P13976 and previous config saved to /var/cache/conftool/dbconfig/20210127-082826-marostegui.json
08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P13975 and previous config saved to /var/cache/conftool/dbconfig/20210127-081150-marostegui.json
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13974 and previous config saved to /var/cache/conftool/dbconfig/20210127-080753-marostegui.json
08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13973 and previous config saved to /var/cache/conftool/dbconfig/20210127-080645-root.json
07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight T258361', diff saved to https://phabricator.wikimedia.org/P13972 and previous config saved to /var/cache/conftool/dbconfig/20210127-075715-marostegui.json
07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13971 and previous config saved to /var/cache/conftool/dbconfig/20210127-075142-root.json
07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13970 and previous config saved to /var/cache/conftool/dbconfig/20210127-073638-root.json
07:26 elukey: powercycle analytics1073 - kernel soft lock up bug registered, os needs a reboot
07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13969 and previous config saved to /var/cache/conftool/dbconfig/20210127-072135-root.json
07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 T272008', diff saved to https://phabricator.wikimedia.org/P13968 and previous config saved to /var/cache/conftool/dbconfig/20210127-070502-marostegui.json
06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight T258361', diff saved to https://phabricator.wikimedia.org/P13967 and previous config saved to /var/cache/conftool/dbconfig/20210127-065715-marostegui.json
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight T258361', diff saved to https://phabricator.wikimedia.org/P13966 and previous config saved to /var/cache/conftool/dbconfig/20210127-063930-marostegui.json
06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with minimal weight T258361', diff saved to https://phabricator.wikimedia.org/P13965 and previous config saved to /var/cache/conftool/dbconfig/20210127-061336-marostegui.json
06:03 twentyafterfour: phabricator appears to be up and running fine
06:03 twentyafterfour: phabricator is read-write
06:01 twentyafterfour: phabricator is read-only
06:00 marostegui: m3 master restart, phabricator will go on read only - T272596
05:50 marostegui: Deploy schema change on s3 T270055
03:48 ryankemper: (Restarted `wdqs-blazegraph` on `wdqs1012`)
02:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@9c85a21]: transfer_to_es: start date 2020 -> 2021 (duration: 02m 59s)
02:21 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@9c85a21]: transfer_to_es: start date 2020 -> 2021
01:58 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
01:57 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
01:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
01:56 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@6c6b2cb]: 0.3.61 (duration: 07m 50s)
01:50 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.61` on canary `wdqs1003`; proceeding to rest of fleet
01:48 ryankemper@deploy1001: Started deploy [wdqs/wdqs@6c6b2cb]: 0.3.61
01:48 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.61`. Pre-deploy tests passing on canary `wdqs1003`
01:39 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ee948e0]: transfer_to_es: Enable catchup (duration: 01m 11s)
01:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ee948e0]: transfer_to_es: Enable catchup
01:25 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2296.codfw.wmnet
01:25 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2295.codfw.wmnet
01:24 ryankemper: T272713 [Deploy envoy for `wdqs-internal`] Roll-out complete. Will monitor `wdqs-internal` for any issues. All the remaining `WDQS SPARQL` alerts should clear shortly
01:21 ryankemper: T272713 [Deploy envoy for `wdqs-internal`] Test queries to `wdqs1003.eqiad.wmnet` passed, and metrics in Grafana (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs-internal&from=1611706751381&to=1611710190405) look good. Rolling out to rest of fleet
01:21 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2296.codfw.wmnet
01:20 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2295.codfw.wmnet
01:14 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@246b640]: remove link recommendations from hourly transfer deps (duration: 03m 31s)
01:10 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@246b640]: remove link recommendations from hourly transfer deps
00:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2296.codfw.wmnet with reason: REIMAGE
00:52 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2295.codfw.wmnet with reason: REIMAGE
00:51 ryankemper: T272713 [Deploy envoy for `wdqs-internal`] Fixed typo in private key in commit `ea152df802b55e939d34494a4965ed83a80a24f2`. Puppet run on `wdqs1003` was successful as a result. Monitoring...
00:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2295.codfw.wmnet with reason: REIMAGE
00:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2296.codfw.wmnet with reason: REIMAGE
00:45 ryankemper: T272713 [Deploy envoy for `wdqs-internal`] Discovered source of the above failure; the secret key in the puppetmaster `/srv/private` repo has a typo in its name (my error): it had `wqds` instead of `wdqs`. Opening up a patch now
00:45 ryankemper: T272713 [Deploy envoy for `wdqs-internal`] `...Error while evaluating a Function Call, secret(): invalid secret ssl/wdqs-internal.discovery.wmnet.key (file: /etc/puppet/modules/sslcert/manifests/certificate.pp, line: 91, column: 26) (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 129) on node wdqs1003.eqiad.wmnet`
00:36 ryankemper: [Deploy envoy for `wdqs-internal`] `...Error while evaluating a Function Call, secret(): invalid secret ssl/wdqs-internal.discovery.wmnet.key (file: /etc/puppet/modules/sslcert/manifests/certificate.pp, line: 91, column: 26) (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 129) on node wdqs1003.eqiad.wmnet`
00:20 ryankemper: T272713 [Deploy envoy for `wdqs-internal`] Disabled puppet on all `wdqs-internal` hosts; merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/657913
00:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2008.codfw.wmnet with reason: Enabling envoy for wdqs-internal
00:16 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2008.codfw.wmnet with reason: Enabling envoy for wdqs-internal
00:15 ryankemper: T272713 [Deploy envoy for `wdqs-internal`] Downtimed all `wdqs-internal` hosts on icinga
00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2006.codfw.wmnet with reason: Enabling envoy for wdqs-internal
00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2006.codfw.wmnet with reason: Enabling envoy for wdqs-internal
00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: Enabling envoy for wdqs-internal
00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: Enabling envoy for wdqs-internal
00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: Enabling envoy for wdqs-internal
00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: Enabling envoy for wdqs-internal
00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
00:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1003.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
00:14 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1003.eqiad.wmnet with reason: Enabling envoy for wdqs-internal

2021-01-26

23:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2297.codfw.wmnet
23:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2298.codfw.wmnet
23:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2302.codfw.wmnet
23:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1264.eqiad.wmnet
23:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2297.codfw.wmnet
23:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1264.eqiad.wmnet
23:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2298.codfw.wmnet
23:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2299.codfw.wmnet
23:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2302.codfw.wmnet
22:35 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a276626]: correct execution_date_fn in ores_predictions_hourly (duration: 01m 07s)
22:34 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a276626]: correct execution_date_fn in ores_predictions_hourly
22:30 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2300.codfw.wmnet
22:27 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2300.codfw.wmnet
22:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
22:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
22:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2297.codfw.wmnet with reason: REIMAGE
22:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2297.codfw.wmnet with reason: REIMAGE
22:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2298.codfw.wmnet with reason: REIMAGE
22:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2299.codfw.wmnet with reason: REIMAGE
22:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2298.codfw.wmnet with reason: REIMAGE
22:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2299.codfw.wmnet with reason: REIMAGE
21:58 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2300.codfw.wmnet with reason: REIMAGE
21:56 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2300.codfw.wmnet with reason: REIMAGE
21:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2306.codfw.wmnet
21:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2304.codfw.wmnet
21:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2302.codfw.wmnet with reason: REIMAGE
21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2304.codfw.wmnet
21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2306.codfw.wmnet
21:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2302.codfw.wmnet with reason: REIMAGE
21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1338.eqiad.wmnet
21:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1337.eqiad.wmnet
21:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1338.eqiad.wmnet
21:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw13388.eqiad.wmnet
21:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet
21:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a87a69a]: correct alter table syntax to create wbitem table (duration: 03m 09s)
21:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2308.codfw.wmnet
21:28 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2308.codfw.wmnet
21:27 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a87a69a]: correct alter table syntax to create wbitem table
21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2304.codfw.wmnet with reason: REIMAGE
21:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2304.codfw.wmnet with reason: REIMAGE
21:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
21:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2306.codfw.wmnet with reason: REIMAGE
21:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2306.codfw.wmnet with reason: REIMAGE
21:06 ebernhardson: restart airflow-scheduler and airflow-webserver on an-airflow1001 post-deploy
21:05 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@2662ca2]: ship hourly link recommendations (duration: 08m 30s)
20:57 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@2662ca2]: ship hourly link recommendations
20:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
20:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 5 NavigationTiming schemas to Event Platform on testwiki - T271208 (duration: 01m 17s)
20:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2308.codfw.wmnet with reason: REIMAGE
20:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1413.eqiad.wmnet
20:52 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission
20:52 ryankemper: T272444 (Decommission relforge100[1,2]) Beginning decommission of `relforge1002`: `sudo -i cookbook sre.hosts.decommission relforge1002.eqiad.wmnet -t T272444`
20:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
20:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2308.codfw.wmnet with reason: REIMAGE
20:50 dancy: group0 rolled back to 1.36.0-wmf.27
20:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1411.eqiad.wmnet
20:50 dancy@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
20:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1411.eqiad.wmnet
20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
20:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
20:42 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.28
20:40 ryankemper: T272444 (Decommission relforge100[1,2]) Beginning decommission of `relforge1001`: `sudo -i cookbook sre.hosts.decommission relforge1001.eqiad.wmnet -t T272444`
20:40 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission
20:39 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
20:39 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission
20:37 ryankemper: T272444 (Decommission relforge100[1,2]) Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/657453 prior to running decom cookbook
20:36 ryankemper: T272444 (Decommission relforge100[1,2]) Downtimed `relforge100[1,2]` in Icinga cookbook for the next 26 hours
20:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
20:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
20:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
20:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2321.codfw.wmnet
20:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2321.codfw.wmnet
20:03 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1409.eqiad.wmnet
20:01 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1408.eqiad.wmnet
19:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1408.eqiad.wmnet
19:53 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2317.codfw.wmnet
19:49 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2317.codfw.wmnet
19:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1409.eqiad.wmnet
19:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet
19:18 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2317.codfw.wmnet with reason: REIMAGE
19:16 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2317.codfw.wmnet with reason: REIMAGE
19:13 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2321.codfw.wmnet
18:58 moritzm: installing sudo security updates on Jessie
18:57 moritzm: uploaded sudo 1.8.10p3-1+deb8u7+wmf1 to apt.wikimedia.org
18:46 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.28 (duration: 40m 09s)
18:37 moritzm: installing sudo security updates on Stretch
18:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming after rebuild
18:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming after rebuild
18:15 moritzm: installing sudo security updates on Buster
18:07 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.28
17:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
17:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
17:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
17:19 mutante: ms-be1028 - running puppet to clear ferm icinga alert
17:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
17:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2321.codfw.wmnet with reason: REIMAGE
17:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
17:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2321.codfw.wmnet with reason: REIMAGE
16:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
16:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
16:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
16:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
16:50 marostegui: Deploy schema change on testwiki - T272953
16:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
16:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
16:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
16:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
16:42 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
16:42 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
16:42 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
16:42 mutante: reimaginge l33t jobrunner mw1337
16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
16:02 moritzm: installing mutt security updates on buster
14:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
14:56 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
14:44 hnowlan: reimaging maps1009 as new buster master
14:23 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
14:23 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
14:23 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
14:22 akosiaris: restart pybal on lvs1015, lvs1016, lvs2009, lvs2010 for picking up linkrecommendation, similar-users, apertium-tls LVS services.
14:21 marostegui: Install mariadb 10.4.18 on pc2010 - T268457
14:13 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
14:07 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
14:05 marostegui: Restart db1077
14:03 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - T272837
13:41 arturo: admin update some kubernetes-related packages in buster-wikimedia/thirdparty/kubeadm-k8s-1-17 (T263284)
13:30 hashar: Upgraded and restarting Jenkins on release1002 / releases2002 / contint1001 and contint2001
12:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=zhwiki --fix # T271612 # P13960
12:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 11cfef4: Add WikiProject and WikiProject_talk namespace and its aliases for zh.wikipedia (T271612) (duration: 01m 01s)
12:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 080389d: Add localized Wikivoyage wordmark for the mobile view of Turkish Wikivoyage (T272776; 2/2) (duration: 01m 02s)
12:24 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikivoyage-wordmark-tr.svg: 080389d: Add localized Wikivoyage wordmark for the mobile view of Turkish Wikivoyage (T272776; 1/2) (duration: 01m 01s)
12:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 4dfc28a: Add Turkish Powered by MediaWiki and A Wikimedia project icons for Turkish Wikivoyage (T272781) (duration: 01m 00s)
12:12 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=trwikivoyage --cluster=all
12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: eab535f: Add namespace aliases to Turkish Wikivoyage (T272782) (duration: 01m 00s)
11:47 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
11:46 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
11:44 elukey@cumin1001: START - Cookbook sre.hosts.decommission
11:29 moritzm: imported jenkins 2.263.3 to apt.wikimedia.org (thirdparty/ci)
09:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
09:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
09:37 elukey: reboot dbstore1005 for kernel upgrades
09:34 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Resync: Some mw2xxx hosts have old version (duration: 00m 55s)
09:32 godog: disable mdadm check emails on ms-be1022 / known, and host is going to be decom'd - T267870
09:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: Restart mariadb to pick up config changes T272957
09:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: Restart mariadb to pick up config changes T272957
09:28 elukey: reboot dbstore1003 for kernel upgrades
09:24 urbanecm@deploy1001: Synchronized wmf-config/logos.php: Resyncing to fix mw2xxx apache loading (duration: 00m 57s)
09:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
09:14 elukey: reboot dbstore1004 for kernel upgrades
09:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: eab87780: frwiki: Fix tagline height and width (T272907) (duration: 00m 58s)
09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 (db1175 isn't ready yet)', diff saved to https://phabricator.wikimedia.org/P13959 and previous config saved to /var/cache/conftool/dbconfig/20210126-091236-marostegui.json
09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to clone db1175 T258361', diff saved to https://phabricator.wikimedia.org/P13958 and previous config saved to /var/cache/conftool/dbconfig/20210126-091149-marostegui.json
09:06 elukey@cumin1001: START - Cookbook sre.hosts.decommission
08:53 marostegui: Stop mysql on db1081 to clone db1160
08:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
08:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
08:38 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1119,1131].eqiad.wmnet
08:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
08:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1119,1131].eqiad.wmnet
08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
08:31 godog: swift start decom for ms-be20[17,19,21,23,24,25,26,27] - T272837
08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE
08:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE
08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE
08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE
08:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
08:18 moritzm: upgrading OpenJDK on aqs and Hadoop systems
08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 (s4 old master) - T271427', diff saved to https://phabricator.wikimedia.org/P13955 and previous config saved to /var/cache/conftool/dbconfig/20210126-070443-marostegui.json
07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 master and remove read-only from s4 T271427', diff saved to https://phabricator.wikimedia.org/P13954 and previous config saved to /var/cache/conftool/dbconfig/20210126-070152-marostegui.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance T271427', diff saved to https://phabricator.wikimedia.org/P13953 and previous config saved to /var/cache/conftool/dbconfig/20210126-070037-marostegui.json
07:00 marostegui: Starting s4 eqiad failover from db1081 to db1138 - T271427
06:55 ryankemper: Restarted `wdqs-blazegraph` on `wdqs1005` - its blazegraph was deadlocked (based on the presence of null values for the blazegraph metrics for that host)
05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Set candidate master to weight 0 before the failover T271427', diff saved to https://phabricator.wikimedia.org/P13952 and previous config saved to /var/cache/conftool/dbconfig/20210126-054337-marostegui.json
00:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2331.codfw.wmnet
00:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2318.codfw.wmnet
00:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2319.codfw.wmnet
00:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2320.codfw.wmnet
00:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2331.codfw.wmnet
00:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2318.codfw.wmnet
00:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2319.codfw.wmnet
00:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2320.codfw.wmnet
00:34 legoktm@deploy1001: Synchronized wmf-config/CommonSettings.php: Invalidate configuration cache when logos.php is touched too (duration: 00m 56s)
00:32 legoktm@deploy1001: Synchronized wmf-config/logos.php: Add script to mostly automate logo management (duration: 00m 55s)
00:16 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Split $wmgSiteLogo{1,1_5,2}x to a separate logos.php (1/2) (duration: 01m 00s)
00:14 legoktm@deploy1001: Synchronized wmf-config/logos.php: Split $wmgSiteLogo{1,1_5,2}x to a separate logos.php (1/2) (duration: 00m 56s)
00:08 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T272920: arbcom_enwiki: Change favicon to a renamed copy of arbcom_ruwiki.ico (2/2) (duration: 00m 58s)
00:07 legoktm@deploy1001: Synchronized static/favicon/arbcom_enwiki.ico: T272920: arbcom_enwiki: Change favicon to a renamed copy of arbcom_ruwiki.ico (1/2) (duration: 01m 00s)

2021-01-25

23:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2318.codfw.wmnet with reason: REIMAGE
23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2319.codfw.wmnet with reason: REIMAGE
23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2318.codfw.wmnet with reason: REIMAGE
23:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2319.codfw.wmnet with reason: REIMAGE
23:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2320.codfw.wmnet with reason: REIMAGE
23:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
22:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2320.codfw.wmnet with reason: REIMAGE
22:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1338.eqiad.wmnet
22:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2322.codfw.wmnet
22:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2323.codfw.wmnet
22:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2322.codfw.wmnet
22:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2323.codfw.wmnet
22:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1338.eqiad.wmnet
21:45 cstone: civicrm revision changed from 3afb54f6f9 to dfb2ea2148
21:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
21:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
21:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
21:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
20:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2326.codfw.wmnet
20:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2326.codfw.wmnet
20:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1410.eqiad.wmnet
20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1410.eqiad.wmnet
20:35 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
20:35 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2323.codfw.wmnet with reason: REIMAGE
20:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2322.codfw.wmnet with reason: REIMAGE
20:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2323.codfw.wmnet with reason: REIMAGE
20:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2322.codfw.wmnet with reason: REIMAGE
20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
20:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE
20:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE
20:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2324.codfw.wmnet
19:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
19:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
19:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1412.eqiad.wmnet
19:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1411.eqiad.wmnet
19:52 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
19:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2324.codfw.wmnet
19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2326.codfw.wmnet
19:48 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw14124.eqiad.wmnet
19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1411.eqiad.wmnet
19:44 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
19:44 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
19:44 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
19:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
19:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
19:41 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
19:41 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
19:37 tgr_: Morning deploys done
19:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
19:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
19:29 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
19:29 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
19:25 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enables MediaWiki client error instrument on English Wikipedia (T255585) (duration: 01m 01s)
19:20 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
19:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [beta] GrowthExperiments: set link recommendation feature flags () (duration: 01m 06s)
19:00 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2324.codfw.wmnet with reason: REIMAGE
18:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
18:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
18:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE
18:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
18:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE
18:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
16:40 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
16:07 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
15:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
15:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
15:42 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
15:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
15:23 dcausse@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CirrusSearch/: revert: Add an option to limit the size of the file_text field: T271493 (duration: 01m 05s)
15:20 dcausse@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CirrusSearch/: Add an option to limit the size of the file_text field: T271493 (duration: 00m 58s)
15:16 dcausse: re-opening EU Backport window to ship pending patches
15:10 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
15:09 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
14:37 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove 2 Remove migrated EventLoggingSchemas overrides - T259163, T267352 (duration: 00m 56s)
14:35 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
14:34 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
14:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:31 akosiaris@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:28 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
14:28 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:25 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
12:47 urbanecm@deploy1001: Synchronized static/images/project-logos/: 6a4cbe6: Revert "Switch fiwiki to their 500k temporary logo!": delete temporary logo files (duration: 00m 57s)
12:41 urbanecm@deploy1001: Synchronized wmf-config/MetaContactPages.php: 7a6a60f: Create Contact page for Ombuds commission at Meta-Wiki (T271828) (duration: 01m 00s)
12:41 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=arbcom_ruwiki --fix # T272292
12:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 8338333: Adding namespace aliases on arbcom-ruwiki (T272292) (duration: 00m 57s)
12:30 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateCollation.php --wiki=trwikivoyage --previous-collation=uppercase # T272783
12:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bcc7ad7: Set $wgCategoryCollation = uca-tr on trwikivoyage (T272783) (duration: 00m 57s)
12:27 urbanecm@deploy1001: Synchronized static/images/project-logos/: d34cb32: Resize the logo of Turkish Wikivoyage (T272784) (duration: 00m 54s)
12:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 177339d: Defining wgSitename for trwikivoyage (T272779) (duration: 01m 00s)
12:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 89d0723: Enable SandboxLink on Turkish Wikivoyage (T272780) (duration: 01m 05s)
12:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 75aa32f: frwiki: Change back to normal logo (T272700) (duration: 01m 07s)
12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 693eaec: Add bidgee.id.au to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T272202) (duration: 01m 01s)
11:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 55s)
11:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
11:35 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
11:33 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
11:11 godog: thanos delete old orphaned blocks with replica=unset label
10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
10:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
10:44 godog: swift decrease weight for ms-be20[16,18,20,22] - T272837
10:00 moritzm: installing imagemagick security updates on stretch
09:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
09:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
09:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
09:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
09:40 godog: bounce apache2 on logstash1024, stuck on high cpu
09:21 marostegui@deploy1001: Synchronized wmf-config/etcd.php: Add x2 to the mapping array T269324 (duration: 00m 58s)
09:17 moritzm: installing samba security updates on stretch
09:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add x2 to the mapping array T269324 (duration: 01m 01s)
09:06 ema: cp3054: install varnish 6.0.1-1wm2 -- 6.0.1 without https://github.com/varnishcache/varnish-cache/pull/2705 T264398
08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13944 and previous config saved to /var/cache/conftool/dbconfig/20210125-084715-root.json
08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13943 and previous config saved to /var/cache/conftool/dbconfig/20210125-083211-root.json
08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13942 and previous config saved to /var/cache/conftool/dbconfig/20210125-081708-root.json
08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13941 and previous config saved to /var/cache/conftool/dbconfig/20210125-080204-root.json
07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P13940 and previous config saved to /var/cache/conftool/dbconfig/20210125-073322-marostegui.json
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Add x2 eqiad to dbctl T269324', diff saved to https://phabricator.wikimedia.org/P13939 and previous config saved to /var/cache/conftool/dbconfig/20210125-064419-marostegui.json
06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Populate x2 eqiad hosts into dbctl T269324', diff saved to https://phabricator.wikimedia.org/P13938 and previous config saved to /var/cache/conftool/dbconfig/20210125-064305-marostegui.json

2021-01-23

22:21 volker-e@deploy1001: Finished deploy [design/style-guide@63e39e7]: Deploy design/style-guide: 63e39e7 “Components”: Amend button groups states SVG font stack (#427) (duration: 00m 06s)
22:21 volker-e@deploy1001: Started deploy [design/style-guide@63e39e7]: Deploy design/style-guide: 63e39e7 “Components”: Amend button groups states SVG font stack (#427)
04:05 ryankemper: Depooled `wdqs1013` (it has ~50 mins of lag to catch up on, and also the bad gateway above)
04:03 ryankemper: Restarted `wdqs-blazegraph` on `wdqs1013`: `sudo systemctl restart wdqs-blazegraph`
01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2332.codfw.wmnet
01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2328.codfw.wmnet
01:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2332.codfw.wmnet
01:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2328.codfw.wmnet
01:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2330.codfw.wmnet
01:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2334.codfw.wmnet
01:48 foks: reset user email for Davey2010
01:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
01:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
01:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2330.codfw.wmnet
01:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2334.codfw.wmnet
01:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
01:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1413.eqiad.wmnet
00:46 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch enwiki to use enwiki20 "Option A" logo variant (T272526) (duration: 00m 57s)
00:36 legoktm@deploy1001: Synchronized static/images/project-logos/: Add enwiki20 "Option A" fixed logos (T272526) (duration: 00m 59s)

2021-01-22

22:41 reedy@deploy1001: Synchronized invalid.json: (no justification provided) (duration: 00m 58s)
20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
20:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2330.codfw.wmnet with reason: REIMAGE
20:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2332.codfw.wmnet with reason: REIMAGE
20:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2328.codfw.wmnet with reason: REIMAGE
20:01 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2334.codfw.wmnet with reason: REIMAGE
20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2328.codfw.wmnet with reason: REIMAGE
20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2330.codfw.wmnet with reason: REIMAGE
20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2332.codfw.wmnet with reason: REIMAGE
19:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2334.codfw.wmnet with reason: REIMAGE
19:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2356.codfw.wmnet
19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2354.codfw.wmnet
19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2352.codfw.wmnet
19:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2350.codfw.wmnet
19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2352.codfw.wmnet
19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2350.codfw.wmnet
19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2354.codfw.wmnet
19:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2356.codfw.wmnet
19:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE
19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE
19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE
19:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE
19:09 mutante: releases1002 systemctl reset-failed
19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE
19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE
19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE
19:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE
18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2364.codfw.wmnet
18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2362.codfw.wmnet
18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2360.codfw.wmnet
18:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2358.codfw.wmnet
18:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2362.codfw.wmnet
18:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2364.codfw.wmnet
18:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2360.codfw.wmnet
18:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2358.codfw.wmnet
18:17 mutante: releases2002 - rebooting to confirm works now and also new disk gets auto-mounted
18:03 mutante: releases1002 - replaced ens5 with ens6 in /etc/network/interfaaces and rebooted again
18:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk
18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk
17:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster
17:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster
17:57 mutante: releases1002 (releases.wm.org active backend) - rebooting - hopefully it does not run into T272555 but if it does now it's known how to fix
17:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE
17:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE
17:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE
17:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE
17:52 mutante: releases2001 - create new partition table with fdisk, make ext4 filesystem on /dev/vdb1
17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE
17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE
17:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE
17:49 ppchelko@deploy1001: Finished deploy [restbase/deploy@e54225d]: T270411 T270415 T270281 T270277 (duration: 65m 37s)
17:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE
17:29 mforns@deploy1001: Finished deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 00m 07s)
17:29 mforns@deploy1001: Started deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253]
17:23 mforns@deploy1001: Finished deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 10m 03s)
17:13 mforns@deploy1001: Started deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253]
16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@e54225d]: T270411 T270415 T270281 T270277
16:40 cmjohnson1: replacing optics/fiber pfw3a-eqiad:xe-0/0/17 and fasw-c1a-eqiad:xe-0/2/0 T271295
16:19 jynus: restart of backup source hosts on codfw T271913
15:54 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
15:40 moritzm: installing puppetboard1002
15:24 moritzm: installing puppetboard2002
13:44 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13932 and previous config saved to /var/cache/conftool/dbconfig/20210122-134444-kormat.json
13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P13931 and previous config saved to /var/cache/conftool/dbconfig/20210122-133341-marostegui.json
13:31 marostegui: Stop replication on db1121
13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13930 and previous config saved to /var/cache/conftool/dbconfig/20210122-133044-marostegui.json
13:29 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13929 and previous config saved to /var/cache/conftool/dbconfig/20210122-132939-kormat.json
13:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2002.codfw.wmnet
13:20 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 100%: Reboot T272121', diff saved to https://phabricator.wikimedia.org/P13927 and previous config saved to /var/cache/conftool/dbconfig/20210122-132028-kormat.json
13:14 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13926 and previous config saved to /var/cache/conftool/dbconfig/20210122-131436-kormat.json
13:05 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 75%: Reboot T272121', diff saved to https://phabricator.wikimedia.org/P13925 and previous config saved to /var/cache/conftool/dbconfig/20210122-130525-kormat.json
12:59 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13924 and previous config saved to /var/cache/conftool/dbconfig/20210122-125932-kormat.json
12:54 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host puppetboard2002.codfw.wmnet
12:53 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1002.eqiad.wmnet
12:50 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 50%: Reboot T272121', diff saved to https://phabricator.wikimedia.org/P13923 and previous config saved to /var/cache/conftool/dbconfig/20210122-125021-kormat.json
12:47 kormat@cumin1001: dbctl commit (dc=all): 'db1149 depooling: Rebooting for T272255', diff saved to https://phabricator.wikimedia.org/P13922 and previous config saved to /var/cache/conftool/dbconfig/20210122-124748-kormat.json
12:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1149.eqiad.wmnet with reason: Rebooting for T272255
12:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1149.eqiad.wmnet with reason: Rebooting for T272255
12:43 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1110 from api group T272255', diff saved to https://phabricator.wikimedia.org/P13921 and previous config saved to /var/cache/conftool/dbconfig/20210122-124310-kormat.json
12:38 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host puppetboard1002.eqiad.wmnet
12:38 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1127 from api group T272255', diff saved to https://phabricator.wikimedia.org/P13920 and previous config saved to /var/cache/conftool/dbconfig/20210122-123832-kormat.json
12:35 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 25%: Reboot T272121', diff saved to https://phabricator.wikimedia.org/P13919 and previous config saved to /var/cache/conftool/dbconfig/20210122-123518-kormat.json
12:33 volker-e@deploy1001: Finished deploy [design/style-guide@9a811b8]: Deploy design/style-guide: 9a811b8 Add Language selectors to component overview Sketch document (#424) (duration: 00m 07s)
12:33 volker-e@deploy1001: Started deploy [design/style-guide@9a811b8]: Deploy design/style-guide: 9a811b8 Add Language selectors to component overview Sketch document (#424)
12:10 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1135,1137].eqiad.wmnet
12:08 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1135,1137].eqiad.wmnet
12:00 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13918 and previous config saved to /var/cache/conftool/dbconfig/20210122-120011-kormat.json
11:54 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
11:51 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13917 and previous config saved to /var/cache/conftool/dbconfig/20210122-115113-kormat.json
11:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on es1023.eqiad.wmnet with reason: Extended reboot for T272121
11:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on es1023.eqiad.wmnet with reason: Extended reboot for T272121
11:46 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13916 and previous config saved to /var/cache/conftool/dbconfig/20210122-114642-kormat.json
11:45 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13915 and previous config saved to /var/cache/conftool/dbconfig/20210122-114507-kormat.json
11:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1003.eqiad.wmnet
11:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1023.eqiad.wmnet with reason: Reboot for T272121
11:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es1023.eqiad.wmnet with reason: Reboot for T272121
11:36 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13914 and previous config saved to /var/cache/conftool/dbconfig/20210122-113610-kormat.json
11:31 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13913 and previous config saved to /var/cache/conftool/dbconfig/20210122-113139-kormat.json
11:30 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13912 and previous config saved to /var/cache/conftool/dbconfig/20210122-113004-kormat.json
11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mwdebug1003.eqiad.wmnet
11:24 kormat@cumin1001: dbctl commit (dc=all): 'es1023 depooling: enable report_host T271106', diff saved to https://phabricator.wikimedia.org/P13911 and previous config saved to /var/cache/conftool/dbconfig/20210122-112424-kormat.json
11:24 hnowlan: joining restbase2009-a to cluster
11:21 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13910 and previous config saved to /var/cache/conftool/dbconfig/20210122-112106-kormat.json
11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1001.wikimedia.org
11:16 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13909 and previous config saved to /var/cache/conftool/dbconfig/20210122-111635-kormat.json
11:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1001.wikimedia.org
11:15 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13908 and previous config saved to /var/cache/conftool/dbconfig/20210122-111500-kormat.json
11:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2001.wikimedia.org
11:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2001.wikimedia.org
11:06 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13906 and previous config saved to /var/cache/conftool/dbconfig/20210122-110603-kormat.json
11:05 jbond42: deploy cairo updates to jessie
11:02 kormat@cumin1001: dbctl commit (dc=all): 'db1141 depooling: Rebooting for T272255', diff saved to https://phabricator.wikimedia.org/P13905 and previous config saved to /var/cache/conftool/dbconfig/20210122-110229-kormat.json
11:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1141.eqiad.wmnet with reason: Rebooting for T272255
11:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1141.eqiad.wmnet with reason: Rebooting for T272255
11:01 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13904 and previous config saved to /var/cache/conftool/dbconfig/20210122-110132-kormat.json
10:59 kormat@cumin1001: dbctl commit (dc=all): 'db1136 depooling: Rebooting for T272255', diff saved to https://phabricator.wikimedia.org/P13903 and previous config saved to /var/cache/conftool/dbconfig/20210122-105952-kormat.json
10:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1136.eqiad.wmnet with reason: Rebooting for T272255
10:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1136.eqiad.wmnet with reason: Rebooting for T272255
10:59 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1127 to api group T272255', diff saved to https://phabricator.wikimedia.org/P13902 and previous config saved to /var/cache/conftool/dbconfig/20210122-105921-kormat.json
10:56 kormat@cumin1001: dbctl commit (dc=all): 'db1134 depooling: Rebooting for T272255', diff saved to https://phabricator.wikimedia.org/P13901 and previous config saved to /var/cache/conftool/dbconfig/20210122-105636-kormat.json
10:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1134.eqiad.wmnet with reason: Rebooting for T272255
10:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1134.eqiad.wmnet with reason: Rebooting for T272255
10:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1088 from api group T271106', diff saved to https://phabricator.wikimedia.org/P13900 and previous config saved to /var/cache/conftool/dbconfig/20210122-105345-kormat.json
10:52 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13899 and previous config saved to /var/cache/conftool/dbconfig/20210122-105244-kormat.json
10:37 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13898 and previous config saved to /var/cache/conftool/dbconfig/20210122-103741-kormat.json
10:36 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 100%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13897 and previous config saved to /var/cache/conftool/dbconfig/20210122-103609-kormat.json
10:22 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13895 and previous config saved to /var/cache/conftool/dbconfig/20210122-102237-kormat.json
10:21 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 75%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13894 and previous config saved to /var/cache/conftool/dbconfig/20210122-102105-kormat.json
10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
10:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
10:07 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13893 and previous config saved to /var/cache/conftool/dbconfig/20210122-100734-kormat.json
10:06 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 50%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13892 and previous config saved to /var/cache/conftool/dbconfig/20210122-100602-kormat.json
10:03 kormat@cumin1001: dbctl commit (dc=all): 'db1130 depooling: Rebooting for T272255', diff saved to https://phabricator.wikimedia.org/P13891 and previous config saved to /var/cache/conftool/dbconfig/20210122-100307-kormat.json
10:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1130.eqiad.wmnet with reason: Rebooting for T272255
10:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1130.eqiad.wmnet with reason: Rebooting for T272255
10:02 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1110 to api group T272255', diff saved to https://phabricator.wikimedia.org/P13890 and previous config saved to /var/cache/conftool/dbconfig/20210122-100233-kormat.json
09:52 moritzm: uploaded cairo 1.14.0-2.1+deb8u2+wmf1 to apt.wikimedia.org
09:50 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 25%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13889 and previous config saved to /var/cache/conftool/dbconfig/20210122-095058-kormat.json
09:44 kormat@cumin1001: dbctl commit (dc=all): 'db1093 depooling: Rebooting for T272255', diff saved to https://phabricator.wikimedia.org/P13888 and previous config saved to /var/cache/conftool/dbconfig/20210122-094453-kormat.json
09:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1093.eqiad.wmnet with reason: Rebooting for T272255
09:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1093.eqiad.wmnet with reason: Rebooting for T272255
09:43 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1088 to api group T271106', diff saved to https://phabricator.wikimedia.org/P13887 and previous config saved to /var/cache/conftool/dbconfig/20210122-094337-kormat.json
08:49 moritzm: installing PIP security updates for stretch
08:44 moritzm: installing mutt updates for stretch
08:35 XioNoX: Remove BGP for Zayo transit in ulsfo, eqiad, eqord
08:33 elukey: update puppet compiler's facts
07:26 ryankemper: [WDQS Deploy] WDQS deploy complete; service is healthy
06:59 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
06:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
06:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
06:58 ryankemper: [WDQS Deploy] Initial deploy complete, `query.wikidata.org` handles queries fine, proceeding to post-deploy steps
06:57 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@70f9d37]: 0.3.60 (duration: 10m 43s)
06:50 ryankemper: [WDQS Deploy] All tests passing on canary `wdqs1003` following canary WDQS deploy, proceeding to rest of fleet
06:46 ryankemper@deploy1001: Started deploy [wdqs/wdqs@70f9d37]: 0.3.60
06:46 ryankemper: [WDQS Deploy] All tests passing on canary `wdqs1003` before WDQS deploy, beginning deploy
06:45 ryankemper: [wdqs] re-pooled `wdqs1013` (all caught up on lag)
06:16 marostegui: Stop MySQL on db1117 db2133 db2078 T272614
06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2143 and db2144 as x2 codfw slaves T269324', diff saved to https://phabricator.wikimedia.org/P13885 and previous config saved to /var/cache/conftool/dbconfig/20210122-060147-marostegui.json
06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2142 into x2 as codfw master T269324', diff saved to https://phabricator.wikimedia.org/P13884 and previous config saved to /var/cache/conftool/dbconfig/20210122-060007-marostegui.json
05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1118 weight', diff saved to https://phabricator.wikimedia.org/P13883 and previous config saved to /var/cache/conftool/dbconfig/20210122-054330-marostegui.json
01:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2368.codfw.wmnet
01:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2366.codfw.wmnet
01:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2374.codfw.wmnet
01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2368.codfw.wmnet
01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2366.codfw.wmnet
01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2374.codfw.wmnet
01:19 Urbanecm: Evening B&C window finished
01:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/AbuseFilter/: 7d8ab70: Dont return the status of doBlockInternal when processing block actions (duration: 00m 59s)
01:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 376cba1: Enroll idwiki in the DiscussionTools a/b test (T268191) (duration: 00m 55s)
01:14 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/DiscussionTools/: 513a786: A/B test output when a specific feature is being tested (T268191) (duration: 00m 55s)
01:12 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/WikibaseMediaInfo/: 4b0259b: Distinguish between null continue value and unknown one (T272548) (duration: 00m 59s)
01:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2376.codfw.wmnet
01:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2366.codfw.wmnet with reason: REIMAGE
01:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2376.codfw.wmnet
01:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2368.codfw.wmnet with reason: REIMAGE
01:00 Urbanecm: Evening B&C still in process, waiting on Zuul
00:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2366.codfw.wmnet with reason: REIMAGE
00:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
00:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2368.codfw.wmnet with reason: REIMAGE
00:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1174.eqiad.wmnet with reason: REIMAGE
00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1167.eqiad.wmnet with reason: REIMAGE
00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1168.eqiad.wmnet with reason: REIMAGE
00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1165.eqiad.wmnet with reason: REIMAGE
00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1170.eqiad.wmnet with reason: REIMAGE
00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1169.eqiad.wmnet with reason: REIMAGE
00:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
00:46 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1164.eqiad.wmnet with reason: REIMAGE
00:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1174.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1165.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1169.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1168.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1164.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: REIMAGE
00:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2376.codfw.wmnet with reason: REIMAGE
00:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2376.codfw.wmnet with reason: REIMAGE
00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2372.codfw.wmnet
00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2370.codfw.wmnet
00:31 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
00:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d4f5d6f: Temporarily amend ukwiki AF configuration (T272330) (duration: 01m 03s)
00:20 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/MobileFrontend: Backport: Fix toggling storage cleanup (T272638) (duration: 01m 07s)
00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2372.codfw.wmnet
00:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2370.codfw.wmnet
00:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2370.codfw.wmnet with reason: new install on buster
00:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2370.codfw.wmnet with reason: new install on buster

2021-01-21

23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2372.codfw.wmnet with reason: REIMAGE
23:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2370.codfw.wmnet with reason: REIMAGE
23:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
23:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2370.codfw.wmnet with reason: REIMAGE
23:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2372.codfw.wmnet with reason: REIMAGE
23:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
22:10 brennen: 1.36.0-wmf.27 train status: for avoidance of doubt, no deploys until further notice - sorting out T272638
21:27 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.26
20:06 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
20:04 razzi@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes - razzi@cumin1001
19:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ac99da7: Migrate WebUIActionsTracking schemas to Event Platform on testwiki (T267347; T271164) (duration: 01m 03s)
19:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 4bb9e5d: Enables the Wikisource extension on oldwikisource (T272163) (duration: 01m 04s)
19:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/EventLogging/: ee830a5: f7152a7: EventLogging backport, see commits for details (T253121) (duration: 01m 05s)
19:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2226.codfw.wmnet
19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2375.codfw.wmnet
19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2373.codfw.wmnet
19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2371.codfw.wmnet
19:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2226.codfw.wmnet
19:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 62c9c35: Migrate SuggestedTagsAction to Event Platform on all wikis (T267351) (duration: 01m 03s)
19:21 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 0b46c9f: [no-op] Add notes about load order of Wikisource and Collection extensions (T255790) (duration: 01m 11s)
19:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2375.codfw.wmnet
19:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2373.codfw.wmnet
19:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2371.codfw.wmnet
19:02 cstone: civicrm revision changed from a4caad22b1 to 3afb54f6f9
18:53 razzi@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes - razzi@cumin1001
18:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2226.codfw.wmnet with reason: REIMAGE
18:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2226.codfw.wmnet with reason: REIMAGE
18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2373.codfw.wmnet with reason: REIMAGE
18:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2375.codfw.wmnet with reason: REIMAGE
18:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2371.codfw.wmnet with reason: REIMAGE
18:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2375.codfw.wmnet with reason: REIMAGE
18:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2373.codfw.wmnet with reason: REIMAGE
18:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2371.codfw.wmnet with reason: REIMAGE
18:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:14 pt1979@cumin2001: START - Cookbook sre.dns.netbox
18:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:08 pt1979@cumin2001: START - Cookbook sre.dns.netbox
18:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
17:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
17:35 ryankemper: [wdqs] Depooled `wdqs1013` to allow it to catch up on lag
16:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
16:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
16:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
16:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
15:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
15:13 moritzm: installing cairo security updates on stretch
15:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
15:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
14:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5001.wikimedia.org
14:17 godog: roll-restart swift-object in eqiad to apply new concurrency
14:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast5001.wikimedia.org
14:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4002.wikimedia.org
14:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast4002.wikimedia.org
14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3004.wikimedia.org
13:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast3004.wikimedia.org
13:38 XioNoX: put eqiad/esams lumen link back in service
12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13872 and previous config saved to /var/cache/conftool/dbconfig/20210121-122043-root.json
12:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13871 and previous config saved to /var/cache/conftool/dbconfig/20210121-120540-root.json
11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13870 and previous config saved to /var/cache/conftool/dbconfig/20210121-115036-root.json
11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13868 and previous config saved to /var/cache/conftool/dbconfig/20210121-113533-root.json
11:29 marostegui: Stop replication on db1085 to move wiki replicas under the other sanitarium host
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P13867 and previous config saved to /var/cache/conftool/dbconfig/20210121-112849-marostegui.json
11:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
11:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
09:44 hoo: Updated the Wikidata property suggester with data from the 2021-01-11 JSON dump and applied the T132839 workarounds
09:00 marostegui: m1 master restart - T271540
08:51 jynus: stopping puppet and bacula for backup1001 T271540
08:43 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
08:37 marostegui: Silence m1 hosts in preparation for the restart T271540
08:34 godog: roll-restart swift-object in codfw to apply new concurrency
07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13864 and previous config saved to /var/cache/conftool/dbconfig/20210121-072101-marostegui.json
07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repoool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13863 and previous config saved to /var/cache/conftool/dbconfig/20210121-070346-marostegui.json
06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repoool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13862 and previous config saved to /var/cache/conftool/dbconfig/20210121-065459-marostegui.json
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087', diff saved to https://phabricator.wikimedia.org/P13861 and previous config saved to /var/cache/conftool/dbconfig/20210121-065408-marostegui.json
06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and pool db1099:3318 into s8 vslow', diff saved to https://phabricator.wikimedia.org/P13860 and previous config saved to /var/cache/conftool/dbconfig/20210121-064903-marostegui.json
03:54 milimetric@deploy1001: deploy aborted: Minor typo fix (duration: 01m 39s)
03:52 milimetric@deploy1001: Started deploy [analytics/refinery@57589e7]: Minor typo fix
01:27 ryankemper: [WDQS Deploy] Rollback complete, service health of `wdqs1003` is restored. Need to investigate source of 404 (possibly related to some recent changes we made in the `gui` repo)
01:26 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@70f9d37]: 0.3.60 (duration: 02m 53s)
01:26 ryankemper: [WDQS Deploy] Rollback of canary `wdqs1003` initiated
01:25 ryankemper: [WDQS Deploy] Automated tests passing on canary`wdqs1003` but manually visiting `http://localhost:9999` (my tunnel to `wdqs1003`) gives `404 Not Found`from nginx; aborting deploy
01:23 ryankemper@deploy1001: Started deploy [wdqs/wdqs@70f9d37]: 0.3.60
01:22 ryankemper: [WDQS Deploy] Tests on canary `wdqs1003` passing before start of deploy, proceeding with deploy of wdqs `0.3.60` to canary
00:44 legoktm: legoktm@mwmaint1002:~$ mwscript initSiteStats.php --wiki=trwikivoyage --update
00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2369.codfw.wmnet
00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2367.codfw.wmnet
00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2365.codfw.wmnet
00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2363.codfw.wmnet
00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2369.codfw.wmnet
00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2365.codfw.wmnet
00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2367.codfw.wmnet
00:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2363.codfw.wmnet

2021-01-20

23:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2369.codfw.wmnet with reason: REIMAGE
23:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2365.codfw.wmnet with reason: REIMAGE
23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2367.codfw.wmnet with reason: REIMAGE
23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2363.codfw.wmnet with reason: REIMAGE
23:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2369.codfw.wmnet with reason: REIMAGE
23:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2367.codfw.wmnet with reason: REIMAGE
23:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2365.codfw.wmnet with reason: REIMAGE
23:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2363.codfw.wmnet with reason: REIMAGE
23:30 mutante: releases2002 - rebooting VM
23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2361.codfw.wmnet
23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2359.codfw.wmnet
23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2355.codfw.wmnet
23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2357.codfw.wmnet
23:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on releases2002.codfw.wmnet with reason: rebooting to add a disk
23:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on releases2002.codfw.wmnet with reason: rebooting to add a disk
23:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2357.codfw.wmnet
23:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2361.codfw.wmnet
23:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2359.codfw.wmnet
23:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2355.codfw.wmnet
23:03 legoktm: updated docker-registry.discovery.wmnet/wikimedia-buster image
23:01 mutante: mw2331, mw2333 - scap pull
22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2359.codfw.wmnet with reason: new install on buster
22:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2359.codfw.wmnet with reason: new install on buster
22:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2359.codfw.wmnet with reason: REIMAGE
22:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2361.codfw.wmnet with reason: REIMAGE
22:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
22:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2361.codfw.wmnet with reason: REIMAGE
22:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2359.codfw.wmnet with reason: REIMAGE
22:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
22:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
22:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
22:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
22:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2353.codfw.wmnet
22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2351.codfw.wmnet
22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2339.codfw.wmnet
22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2338.codfw.wmnet
22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2353.codfw.wmnet
22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2351.codfw.wmnet
22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2339.codfw.wmnet
22:15 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2338.codfw.wmnet
21:35 milimetric@deploy1001: Finished deploy [analytics/refinery@1313244] (thin): Regular analytics weekly train THIN [analytics/refinery@1313244] (duration: 00m 07s)
21:35 milimetric@deploy1001: Started deploy [analytics/refinery@1313244] (thin): Regular analytics weekly train THIN [analytics/refinery@1313244]
21:34 milimetric@deploy1001: Finished deploy [analytics/refinery@1313244]: Regular analytics weekly train [analytics/refinery@1313244] (duration: 10m 52s)
21:24 milimetric@deploy1001: Started deploy [analytics/refinery@1313244]: Regular analytics weekly train [analytics/refinery@1313244]
21:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
21:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
21:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2338.codfw.wmnet with reason: REIMAGE
21:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
21:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2339.codfw.wmnet with reason: REIMAGE
21:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2339.codfw.wmnet with reason: REIMAGE
21:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2338.codfw.wmnet with reason: REIMAGE
21:13 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
21:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
20:56 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
20:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2337.codfw.wmnet
20:46 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin A:cp 'enable-puppet "cdanis deploying I558346d T272330"'
20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2335.codfw.wmnet
20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2333.codfw.wmnet
20:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2331.codfw.wmnet
20:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2337.codfw.wmnet
20:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2335.codfw.wmnet
20:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2333.codfw.wmnet
20:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2331.codfw.wmnet
20:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
20:41 effie: restart mc-gp2001, mc-gp2002, mc-gp2003 for T269596
20:31 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.27 (duration: 03m 05s)
20:28 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.27
20:23 brennen: 1.36.0-wmf.27 (T271341) train: proceeding to group1
20:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:17 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕒🍵 sudo cumin A:cp 'disable-puppet "cdanis deploying I558346d T272330"'
20:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:06 brennen: 1.36.0-wmf.27 (T271341) train status as of deploy window: currently blocked at group0 on T272508
20:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
19:50 bblack: lvs1015: bringing pybal back online
19:47 bblack: lvs1015: stopping pybal to try to fix a lingering ifup service state issue on the host, which may require downing an interface
19:33 urbanecm@deploy1001: Synchronized static/images/project-logos: 5c94167: Revert: [enwiki] Update celebration logo to "option A" (T272526) (duration: 01m 04s)
19:24 effie: depool and repool thumbor* to upgrade python-thumbor-wikimedia to v2.9
19:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2337.codfw.wmnet with reason: REIMAGE
19:22 urbanecm@deploy1001: Synchronized static/images/project-logos: 13fb338: [enwiki] Update celebration logo to "option A" (T272526) (duration: 01m 05s)
19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2335.codfw.wmnet with reason: REIMAGE
19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2333.codfw.wmnet with reason: REIMAGE
19:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2337.codfw.wmnet with reason: REIMAGE
19:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
19:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2335.codfw.wmnet with reason: REIMAGE
19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2333.codfw.wmnet with reason: REIMAGE
19:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
19:12 urbanecm@deploy1001: Synchronized wmf-config/config/kuwiki.yaml: a736d97: Enable visualeditor on kuwiki by default (T270841; 2/2) (duration: 01m 05s)
19:11 XioNoX: add BGP to Lumen in eqiad
19:11 urbanecm@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: a736d97: Enable visualeditor on kuwiki by default (T270841; 1/2) (duration: 01m 04s)
18:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2325.codfw.wmnet
18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2327.codfw.wmnet
18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2329.codfw.wmnet
18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2316.codfw.wmnet
18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2329.codfw.wmnet
18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2327.codfw.wmnet
18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2325.codfw.wmnet
18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2316.codfw.wmnet
18:42 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/AbuseFilter/includes/View/AbuseFilterViewDiff.php: Backport: Catch ClosestFilterVersionNotFoundException in ViewDiff (T272505) (duration: 01m 06s)
18:29 bblack: lvs1015: re-enabling puppet + pybal - T272258
18:25 XioNoX: draining esams-eqiad link
18:24 mutante: ganeti - creating 150G virtual hard disk and adding it to releases2002 for T272092
18:22 mutante: ganeti - creating 105G virtual harddisk and adding to releases1002 for T272092
18:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
18:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
18:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2327.codfw.wmnet with reason: REIMAGE
18:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2329.codfw.wmnet with reason: REIMAGE
18:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2325.codfw.wmnet with reason: REIMAGE
18:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2329.codfw.wmnet with reason: REIMAGE
18:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2316.codfw.wmnet with reason: REIMAGE
18:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2327.codfw.wmnet with reason: REIMAGE
18:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2325.codfw.wmnet with reason: REIMAGE
18:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2316.codfw.wmnet with reason: REIMAGE
18:01 bblack: lvs1015 - shutdown for T272258
17:58 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:54 bblack: lvs1015: stopping pybal with puppet disabled for T272258
17:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
17:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
17:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
17:24 volans@cumin2001: START - Cookbook sre.dns.netbox
16:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
16:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
16:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
16:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
16:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
16:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
16:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
15:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
15:58 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
15:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
15:55 elukey@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
15:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
15:47 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13858 and previous config saved to /var/cache/conftool/dbconfig/20210120-154726-kormat.json
15:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
15:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
15:32 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13857 and previous config saved to /var/cache/conftool/dbconfig/20210120-153223-kormat.json
15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
15:24 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.27
15:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
15:18 brennen: 1.36.0-wmf.27 train unblocked, proceeding to group0 (T271341)
15:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
15:17 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13856 and previous config saved to /var/cache/conftool/dbconfig/20210120-151719-kormat.json
15:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
15:15 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13855 and previous config saved to /var/cache/conftool/dbconfig/20210120-151555-kormat.json
15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
15:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
15:02 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13854 and previous config saved to /var/cache/conftool/dbconfig/20210120-150216-kormat.json
15:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
15:00 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 66%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13853 and previous config saved to /var/cache/conftool/dbconfig/20210120-150051-kormat.json
14:59 elukey@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
14:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate QuickSurveys schemas to EventGate on all wikis - T271165, T271166 (duration: 01m 05s)
14:56 kormat@cumin1001: dbctl commit (dc=all): 'db1109 depooling: Rebooting for T272255', diff saved to https://phabricator.wikimedia.org/P13852 and previous config saved to /var/cache/conftool/dbconfig/20210120-145605-kormat.json
14:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1109.eqiad.wmnet with reason: Rebooting for T272255
14:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1109.eqiad.wmnet with reason: Rebooting for T272255
14:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
14:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
14:47 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate QuickSurveys schemas to EventGate on testwiki - T271165, T271166 (duration: 01m 06s)
14:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
14:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
14:45 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 33%: Reboot T272255', diff saved to https://phabricator.wikimedia.org/P13851 and previous config saved to /var/cache/conftool/dbconfig/20210120-144547-kormat.json
14:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2048.codfw.wmnet
14:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
14:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
14:26 kormat@cumin1001: dbctl commit (dc=all): 'db1076 depooling: Rebooting for T272255', diff saved to https://phabricator.wikimedia.org/P13850 and previous config saved to /var/cache/conftool/dbconfig/20210120-142636-kormat.json
14:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1076.eqiad.wmnet with reason: Rebooting for T272255
14:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1076.eqiad.wmnet with reason: Rebooting for T272255
14:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
14:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T272255', diff saved to https://phabricator.wikimedia.org/P13849 and previous config saved to /var/cache/conftool/dbconfig/20210120-142139-kormat.json
14:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
14:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
14:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
14:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
14:12 kormat@cumin1001: dbctl commit (dc=all): 'db1075 depooling: Rebooting for T272255', diff saved to https://phabricator.wikimedia.org/P13848 and previous config saved to /var/cache/conftool/dbconfig/20210120-141230-kormat.json
14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1075.eqiad.wmnet with reason: Rebooting for T272255
14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1075.eqiad.wmnet with reason: Rebooting for T272255
14:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
14:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
14:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
14:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
13:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
13:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
13:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
13:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
13:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2039.codfw.wmnet
13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
13:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/Translate/: 20decbd: Add flag to toggle the usage of the group synchronization cache (T272428; T182433) (duration: 01m 10s)
13:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2039.codfw.wmnet
13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
12:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
12:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2087.codfw.wmnet with reason: Schema change T267767
12:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2087.codfw.wmnet with reason: Schema change T267767
12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2037.codfw.wmnet
12:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2037.codfw.wmnet
12:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2036.codfw.wmnet
12:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2036.codfw.wmnet
12:31 godog: bounce icinga on alert1001
12:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
12:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
12:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
12:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
12:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2034.codfw.wmnet
12:10 matthiasmullie: EU config window done
12:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2034.codfw.wmnet
12:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2033.codfw.wmnet
12:08 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 2fc57b259: Remove MediaSearch survey (duration: 01m 10s)
12:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2033.codfw.wmnet
12:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2032.codfw.wmnet
11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2032.codfw.wmnet
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13847 and previous config saved to /var/cache/conftool/dbconfig/20210120-112808-root.json
11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13846 and previous config saved to /var/cache/conftool/dbconfig/20210120-111305-root.json
10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13845 and previous config saved to /var/cache/conftool/dbconfig/20210120-105801-root.json
10:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2030.codfw.wmnet
10:51 XioNoX: Discard the non-whitelisted 172.16.0.0/12 traffic - T209082
10:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2030.codfw.wmnet
10:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2029.codfw.wmnet
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13844 and previous config saved to /var/cache/conftool/dbconfig/20210120-104257-root.json
10:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2029.codfw.wmnet
10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 to stop replication T272008', diff saved to https://phabricator.wikimedia.org/P13842 and previous config saved to /var/cache/conftool/dbconfig/20210120-103449-marostegui.json
10:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
10:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2027.codfw.wmnet
10:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2027.codfw.wmnet
10:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2026.codfw.wmnet
10:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2026.codfw.wmnet
10:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2025.codfw.wmnet
09:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2025.codfw.wmnet
09:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2024.codfw.wmnet
09:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2024.codfw.wmnet
09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2023.codfw.wmnet
09:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2023.codfw.wmnet
09:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2021.codfw.wmnet
09:32 moritzm: installing cuminunpriv1001
09:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2021.codfw.wmnet
09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2020.codfw.wmnet
09:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2020.codfw.wmnet
09:19 XioNoX: configure Lumen interfaces
09:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2019.codfw.wmnet
09:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2019.codfw.wmnet
09:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2018.codfw.wmnet
09:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2018.codfw.wmnet
00:43 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Update /analytics/legacy/homepagemodule/ schema version to 1.1.0 (T270309) (duration: 01m 03s)
00:30 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: (no-op) GrowthExperiments: Disable link recommendations (T261408) (duration: 01m 05s)
00:09 legoktm: uploaded docker-report 0.0.4-1~deb9u1 to stretch-wikimedia (T179696)

2021-01-19

21:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2314.codfw.wmnet
21:51 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.36.0-wmf.26
21:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2313.codfw.wmnet
21:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2312.codfw.wmnet
21:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2315.codfw.wmnet
21:46 ottomata: wiping kafka-test cluster data and starting from scratch - T255973
21:00 Urbanecm: Start of `foreachwikiindblist group2 extensions/AbuseFilter/maintenance/MigrateAflFilter.php --batch-size=1000` (T269713)
20:09 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.27
20:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2315.codfw.wmnet
20:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2314.codfw.wmnet
20:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2313.codfw.wmnet
20:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2312.codfw.wmnet
19:46 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
19:30 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
19:27 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
19:22 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
18:58 brennen@deploy1001: Pruned MediaWiki: 1.36.0-wmf.22 (duration: 03m 53s)
18:47 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
18:43 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
18:42 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.27 (duration: 41m 57s)
18:39 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
18:01 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.27
17:59 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on restbase2009.codfw.wmnet with reason: REIMAGE
17:59 brennen: starting deploy-promote to testwikis for 1.36.0-wmf.27 (T271341)
17:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2009.codfw.wmnet with reason: REIMAGE
17:30 Urbanecm: Start of `foreachwikiindblist group1 extensions/AbuseFilter/maintenance/MigrateAflFilter.php --batch-size=1000 ` (T269713)
17:08 Urbanecm: Run extensions/AbuseFilter/maintenance/MigrateAflFilter.php for all group0 wikis (T269713)
17:06 Urbanecm: mwscript extensions/AbuseFilter/maintenance/MigrateAflFilter.php --wiki=test2wiki --batch-size=1000 # T269713
17:04 Urbanecm: mwscript extensions/AbuseFilter/maintenance/MigrateAflFilter.php --wiki=testwiki --batch-size=1000 # T269713
16:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2314.codfw.wmnet with reason: new install on buster
16:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2314.codfw.wmnet with reason: new install on buster
16:50 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2314.codfw.wmnet with reason: REIMAGE
16:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2313.codfw.wmnet with reason: REIMAGE
16:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
16:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
16:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2315.codfw.wmnet with reason: REIMAGE
16:46 brennen: 1.36.0-wmf.27 was branched at fbb516d for T271341
16:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2312.codfw.wmnet with reason: REIMAGE
16:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2315.codfw.wmnet with reason: REIMAGE
16:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2314.codfw.wmnet with reason: REIMAGE
16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2313.codfw.wmnet with reason: REIMAGE
16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2312.codfw.wmnet with reason: REIMAGE
16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
16:41 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be1046.eqiad.wmnet
16:39 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
16:39 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
16:39 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
16:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13838 and previous config saved to /var/cache/conftool/dbconfig/20210119-163637-root.json
16:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
16:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
16:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
16:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
16:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
16:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
16:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13837 and previous config saved to /var/cache/conftool/dbconfig/20210119-162134-root.json
16:14 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
16:07 moritzm: powercycling ms-be1046, stuck during boot
16:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13836 and previous config saved to /var/cache/conftool/dbconfig/20210119-160630-root.json
15:58 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
15:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
15:51 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
15:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13835 and previous config saved to /var/cache/conftool/dbconfig/20210119-155127-root.json
15:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
15:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
15:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
15:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
15:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
15:43 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cuminunpriv1001.eqiad.wmnet
15:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
15:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
15:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
15:26 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host cuminunpriv1001.eqiad.wmnet
15:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
15:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
15:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
15:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
15:15 Urbanecm: Run `foreachwikiindblist closed extensions/AbuseFilter/maintenance/MigrateAflFilter.php` (T269713)
15:06 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
15:06 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
15:06 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
15:03 Jeff_Green: authdns-update DNS adjustments for frdata-(eqiad|codfw)
14:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
14:19 marostegui: Sanitize trwikivoyage on db2094:3315, db1124:3315, db1154:3315 T271261
14:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
14:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
14:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
14:08 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T271264)
14:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
13:49 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T271264)
13:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1039.eqiad.wmnet
13:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1039.eqiad.wmnet
13:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1038.eqiad.wmnet
13:39 Urbanecm: trwikivoyage is created
13:39 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 53s)
13:38 godog: bounce logstash on logstash1025 to debug unindexable logs
13:37 urbanecm@deploy1001: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 05s)
13:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating trwikivoyage (T271260) (duration: 00m 55s)
13:35 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating trwikivoyage (T271260) (duration: 00m 55s)
13:34 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating trwikivoyage (T271260)
13:32 urbanecm@deploy1001: Synchronized dblists: Creating trwikivoyage (T271260) (duration: 00m 55s)
13:31 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating trwikivoyage (T271260) (duration: 00m 55s)
13:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1038.eqiad.wmnet
13:30 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating trwikivoyage (T271260) (duration: 00m 56s)
13:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1037.eqiad.wmnet
13:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1037.eqiad.wmnet
13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1036.eqiad.wmnet
12:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1036.eqiad.wmnet
12:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1035.eqiad.wmnet
12:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1035.eqiad.wmnet
12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1034.eqiad.wmnet
12:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'staging' .
12:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'production' .
12:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
12:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1034.eqiad.wmnet
12:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1033.eqiad.wmnet
12:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 338c0f9: wgAbuseFilterAflFilterMigrationStage: Make WRITE_BOTH everywhere (T269712) (duration: 00m 56s)
12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1033.eqiad.wmnet
12:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1031.eqiad.wmnet
12:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1031.eqiad.wmnet
12:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1030.eqiad.wmnet
12:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
12:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
12:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1030.eqiad.wmnet
12:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1029.eqiad.wmnet
12:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
12:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
12:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1029.eqiad.wmnet
12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 6a4cbe6: Revert "Switch fiwiki to their 500k temporary logo!" (T270974) (duration: 00m 56s)
11:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1028.eqiad.wmnet
11:54 moritzm: installing remaining openssl 1.1 updates on stretch
11:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1028.eqiad.wmnet
11:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1026.eqiad.wmnet
11:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1026.eqiad.wmnet
11:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
11:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
11:33 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
11:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1025.eqiad.wmnet
11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1025.eqiad.wmnet
11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1024.eqiad.wmnet
11:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
11:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
11:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
11:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1024.eqiad.wmnet
11:10 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
11:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1023.eqiad.wmnet
11:06 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
11:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1023.eqiad.wmnet
10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1021.eqiad.wmnet
10:56 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
10:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1021.eqiad.wmnet
10:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
10:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2017.codfw.wmnet
10:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2017.codfw.wmnet
09:51 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2017.codfw.wmnet
09:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2017.codfw.wmnet
09:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2016.codfw.wmnet
09:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2016.codfw.wmnet
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 to stop replication T272008', diff saved to https://phabricator.wikimedia.org/P13828 and previous config saved to /var/cache/conftool/dbconfig/20210119-090100-marostegui.json
08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078, depooled by mistake', diff saved to https://phabricator.wikimedia.org/P13827 and previous config saved to /var/cache/conftool/dbconfig/20210119-085918-marostegui.json
08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to stop replication T272008', diff saved to https://phabricator.wikimedia.org/P13826 and previous config saved to /var/cache/conftool/dbconfig/20210119-085856-marostegui.json
08:54 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13825 and previous config saved to /var/cache/conftool/dbconfig/20210119-080839-root.json
07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13824 and previous config saved to /var/cache/conftool/dbconfig/20210119-075336-root.json
07:41 oblivian@deploy1001: Synchronized README: Null deployments to test php restarts from scap (duration: 01m 23s)
07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13823 and previous config saved to /var/cache/conftool/dbconfig/20210119-073832-root.json
07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13822 and previous config saved to /var/cache/conftool/dbconfig/20210119-072329-root.json
07:14 elukey: clean up prometheus es exporter units on es-codfw nodes not needed anymore
07:02 marostegui: Stop MySQL on db1082 T272008
06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 to stop replication T272008', diff saved to https://phabricator.wikimedia.org/P13821 and previous config saved to /var/cache/conftool/dbconfig/20210119-065748-marostegui.json
06:04 marostegui: Upgrade kernel on pc2007 pc2008 pc2009 pc2010 T272121
04:39 Krinkle: unlocked per ttps://phabricator.wikimedia.org/T272215#6755025
04:37 Krinkle: locks scap on deploy1001 as precaution

2021-01-18

21:33 eileen: civicrm revision changed from 4220fc8177 to a4caad22b1, config revision is f08249ecf9
21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2311.codfw.wmnet
21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2310.codfw.wmnet
21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2309.codfw.wmnet
21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2307.codfw.wmnet
21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2309.codfw.wmnet
21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2307.codfw.wmnet
21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2310.codfw.wmnet
21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2311.codfw.wmnet
20:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2311.codfw.wmnet with reason: REIMAGE
20:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2309.codfw.wmnet with reason: REIMAGE
20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2310.codfw.wmnet with reason: REIMAGE
20:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2311.codfw.wmnet with reason: REIMAGE
20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2307.codfw.wmnet with reason: REIMAGE
20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2310.codfw.wmnet with reason: REIMAGE
20:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2309.codfw.wmnet with reason: REIMAGE
20:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2307.codfw.wmnet with reason: REIMAGE
20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2305.codfw.wmnet
20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2303.codfw.wmnet
20:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2277.codfw.wmnet
20:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2276.codfw.wmnet
20:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2303.codfw.wmnet
20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2305.codfw.wmnet
20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2277.codfw.wmnet
20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2276.codfw.wmnet
19:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2305.codfw.wmnet with reason: REIMAGE
19:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2303.codfw.wmnet with reason: REIMAGE
19:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2305.codfw.wmnet with reason: REIMAGE
19:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2303.codfw.wmnet with reason: REIMAGE
19:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2277.codfw.wmnet with reason: REIMAGE
19:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2276.codfw.wmnet with reason: REIMAGE
19:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2277.codfw.wmnet with reason: REIMAGE
19:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2276.codfw.wmnet with reason: REIMAGE
18:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2275.codfw.wmnet
18:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2274.codfw.wmnet
18:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2273.codfw.wmnet
18:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2271.codfw.wmnet
18:36 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1136,1138].eqiad.wmnet
18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2274.codfw.wmnet
18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2275.codfw.wmnet
18:34 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1136,1138].eqiad.wmnet
18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2273.codfw.wmnet
18:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2271.codfw.wmnet
18:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1138.eqiad.wmnet with reason: REIMAGE
18:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1138.eqiad.wmnet with reason: REIMAGE
18:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1132.eqiad.wmnet
18:20 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1132.eqiad.wmnet
18:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1130.eqiad.wmnet
18:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1136.eqiad.wmnet with reason: REIMAGE
18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1130.eqiad.wmnet
18:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1136.eqiad.wmnet with reason: REIMAGE
18:14 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1128.eqiad.wmnet
18:12 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1128.eqiad.wmnet
17:51 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1124-1127].eqiad.wmnet
17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2274.codfw.wmnet with reason: REIMAGE
17:49 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1124-1127].eqiad.wmnet
17:49 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2275.codfw.wmnet with reason: REIMAGE
17:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2273.codfw.wmnet with reason: REIMAGE
17:48 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1121-1123].eqiad.wmnet
17:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2271.codfw.wmnet with reason: REIMAGE
17:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2275.codfw.wmnet with reason: REIMAGE
17:46 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1121-1123].eqiad.wmnet
17:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2274.codfw.wmnet with reason: REIMAGE
17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2273.codfw.wmnet with reason: REIMAGE
17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2271.codfw.wmnet with reason: REIMAGE
17:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1120.eqiad.wmnet
17:42 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1120.eqiad.wmnet
17:38 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1118.eqiad.wmnet
17:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1118.eqiad.wmnet
17:32 mutante: reimaging mw2271,mw2273,mw2274,mw227 (codfw only)
16:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1137.eqiad.wmnet with reason: REIMAGE
16:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1137.eqiad.wmnet with reason: REIMAGE
16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1135.eqiad.wmnet with reason: REIMAGE
16:03 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1135.eqiad.wmnet with reason: REIMAGE
15:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: REIMAGE
15:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: REIMAGE
15:48 moritzm: installing wavpack security updates
15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE
15:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE
15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE
15:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE
15:10 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
14:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
14:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
14:43 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
14:31 kormat@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
14:30 arturo: updating packages in buster-wikimedia/thirdparty/ceph-nautilus-buster (T272296)
14:26 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
14:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
14:18 kormat@cumin1001: START - Cookbook sre.hosts.decommission
13:34 moritzm: uploaded wmf-sre-laptop 0.3.2 to apt.wikimedia.org
13:26 volans: installed spicerack 0.0.48-1+deb10u1 on cumin hosts
13:12 marostegui: Upgrade db2071 to 10.4.17 - T268457
13:08 XioNoX: add NAT rule on pfw3-eqiad - T272066
12:56 XioNoX: add NAT rule on pfw3-codfw - T272066
12:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2008.codfw.wmnet
12:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE
12:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2008.codfw.wmnet
12:31 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE
12:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE
12:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE
12:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE
12:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2007.codfw.wmnet
12:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE
12:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2007.codfw.wmnet
12:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2006.codfw.wmnet
12:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE
12:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE
12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2006.codfw.wmnet
12:08 volans: uploaded spicerack_0.0.48 to apt.wikimedia.org buster-wikimedia
12:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2005.codfw.wmnet
12:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE
12:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE
11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2005.codfw.wmnet
11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE
11:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1008.eqiad.wmnet
11:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE
11:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1008.eqiad.wmnet
11:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1007.eqiad.wmnet
11:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1007.eqiad.wmnet
11:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE
11:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE
11:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1006.eqiad.wmnet
11:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1006.eqiad.wmnet
11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE
11:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE
11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1005.eqiad.wmnet
11:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1005.eqiad.wmnet
11:10 hashar: Restarting Gerrit main instance on gerrit1001.wikimedia.org
11:08 hashar: Restarting Gerrit replica on gerrit2001.wikimedia.org
10:58 moritzm: installing python2.7 security updates on Stretch
10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13799 and previous config saved to /var/cache/conftool/dbconfig/20210118-102959-root.json
10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13798 and previous config saved to /var/cache/conftool/dbconfig/20210118-101456-root.json
10:00 _joe_: restarting pybal on lvs1016, not talking to its etcd server
09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13797 and previous config saved to /var/cache/conftool/dbconfig/20210118-095952-root.json
09:51 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13796 and previous config saved to /var/cache/conftool/dbconfig/20210118-094449-root.json
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to stop replication T272008', diff saved to https://phabricator.wikimedia.org/P13795 and previous config saved to /var/cache/conftool/dbconfig/20210118-092546-marostegui.json
09:24 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13794 and previous config saved to /var/cache/conftool/dbconfig/20210118-092429-root.json
09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105:3311 from vslow', diff saved to https://phabricator.wikimedia.org/P13793 and previous config saved to /var/cache/conftool/dbconfig/20210118-092003-marostegui.json
09:13 moritzm: installing openssl 1.1 security updates on stretch
09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13791 and previous config saved to /var/cache/conftool/dbconfig/20210118-090926-root.json
09:06 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:01 kormat@cumin1001: START - Cookbook sre.dns.netbox
08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13790 and previous config saved to /var/cache/conftool/dbconfig/20210118-085422-root.json
08:46 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
08:42 kormat@cumin1001: START - Cookbook sre.hosts.decommission
08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13788 and previous config saved to /var/cache/conftool/dbconfig/20210118-083919-root.json
08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 to stop replication, place db1105:3311 temporarily in vslow T272008', diff saved to https://phabricator.wikimedia.org/P13787 and previous config saved to /var/cache/conftool/dbconfig/20210118-081740-marostegui.json
08:15 moritzm: installing remaining openssl 1.0 security updated on stretch
08:13 elukey: clean up old archiva debs and upload 2.2.4-3 to buster-wikimedia - T272082
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13786 and previous config saved to /var/cache/conftool/dbconfig/20210118-080122-root.json
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13785 and previous config saved to /var/cache/conftool/dbconfig/20210118-074618-root.json
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13784 and previous config saved to /var/cache/conftool/dbconfig/20210118-073115-root.json
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13783 and previous config saved to /var/cache/conftool/dbconfig/20210118-071611-root.json
06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P13782 and previous config saved to /var/cache/conftool/dbconfig/20210118-065312-marostegui.json
06:35 marostegui: Reboot dbproxy2001, dbproxy2002, dbproxy2003 for kernel upgrade
06:22 marostegui: Reboot db1154 and db1155 for kernel upgrade

2021-01-16

12:18 elukey: elukey@cumin1001:~$ sudo cumin 'A:mw-app-canary and A:mw-eqiad' 'run-puppet-agent' -b 10 - T272215
12:10 elukey: 'elukey@cumin1001:~$ sudo cumin 'A:mw-eqiad' 'run-puppet-agent' -b 10' T272215)
11:23 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad

2021-01-15

23:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE
23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1118.eqiad.wmnet with reason: REIMAGE
21:22 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5002.eqsin.wmnet with reason: REIMAGE
21:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5002.eqsin.wmnet with reason: REIMAGE
20:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 66e6be3: Set anniversary logo for frwiki (3/3; T272075) (duration: 00m 55s)
20:37 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-tagline-fr-20.svg: 66e6be3: Set anniversary logo for frwiki (2/3; T272075) (duration: 00m 55s)
20:36 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-fr-20.svg: 66e6be3: Set anniversary logo for frwiki (1/3; T272075) (duration: 00m 58s)
20:21 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-fr-20.svg: 66e6be3: Set anniversary logo for frwiki (1/3; T272075) (duration: 01m 54s)
17:17 legoktm: legoktm@contint2001:~$ sudo systemctl reload apache2 # for T272159
16:17 bstorm: canceled downtime for maintain-dbusers on labstore1004 T272127
15:30 elukey: restart archiva to apply hot-fix for T272082
15:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1002.wikimedia.org
15:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica1002.wikimedia.org
15:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1001.wikimedia.org
15:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica1001.wikimedia.org
15:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
15:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
14:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2003.wikimedia.org
14:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica2003.wikimedia.org
14:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2004.wikimedia.org
14:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-replica2004.wikimedia.org
11:30 jynus: rolling restart of eqiad source backup dbs
11:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
11:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
11:11 XioNoX: update cloud-in4 firewall rules
11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2036.codfw.wmnet
10:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2036.codfw.wmnet
10:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
10:56 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mc2036.codfw.wmnet
10:55 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2036.codfw.wmnet
10:53 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
10:53 vgutierrez: re-enable puppet on acme-chief clients
10:53 jynus: rolling restart of dbprov2* hosts
10:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
10:52 _joe_: rebuilding the docker images coredns,nutcracker,prometheus-statsd-exporter,service-checker,wmfdebug to use wikimedia-buster as a base
10:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
10:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
10:46 vgutierrez: disable puppet on acme-chief clients
10:45 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
10:43 effie: reboot mc2036 - T269596
10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
10:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
10:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
10:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1001.eqiad.wmnet
10:07 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1001.eqiad.wmnet
10:02 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
09:58 reedy@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/CacheDecorator.php: T272103 (duration: 00m 57s)
09:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
09:36 vgutierrez: rolling restart acme-chief servers to catch up on kernel upgrades
09:24 jynus: rolling restart of dbprov1* hosts
09:19 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
09:07 moritzm: installing bast5002 T257324
08:45 moritzm: installing bast4003 T257324
08:39 marostegui: Restart clouddb1013-clouddb1020
08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
08:28 ryankemper: WDQS puppet run successful
08:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
08:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
08:01 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
07:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
07:57 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
03:49 eileen: civicrm revision changed from f417a510a5 to 4220fc8177, config revision is f08249ecf9

2021-01-14

23:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2236.codfw.wmnet
23:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T272094 Change enwiki logo to 20th Birthday Celebration one (duration: 00m 56s)
23:11 jforrester@deploy1001: Synchronized static/images/project-logos/enwiki20-2x.png: T272094 Sync out logo before going live, 3/3 (duration: 00m 55s)
23:09 jforrester@deploy1001: Synchronized static/images/project-logos/enwiki20-1.5x.png: T272094 Sync out logo before going live, 2/3 (duration: 00m 55s)
23:07 jforrester@deploy1001: Synchronized static/images/project-logos/enwiki20.png: T272094 Sync out logo before going live, 1/3 (duration: 01m 02s)
23:02 mutante: Happy 20th Birthday Wikipedia - https://20.wikipedia.org - https://gerrit.wikimedia.org/r/656268
22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2236.codfw.wmnet
22:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2270.codfw.wmnet
22:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2268.codfw.wmnet
22:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2269.codfw.wmnet
22:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2269.codfw.wmnet
22:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2270.codfw.wmnet
22:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2268.codfw.wmnet
22:04 thcipriani: restart apache on gerrit1001
21:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2236.codfw.wmnet with reason: REIMAGE
21:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2236.codfw.wmnet with reason: REIMAGE
21:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2270.codfw.wmnet with reason: REIMAGE
21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2269.codfw.wmnet with reason: REIMAGE
21:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2270.codfw.wmnet with reason: REIMAGE
21:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2268.codfw.wmnet with reason: REIMAGE
21:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2269.codfw.wmnet with reason: REIMAGE
21:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2268.codfw.wmnet with reason: REIMAGE
21:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2258.codfw.wmnet
21:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2255.codfw.wmnet
21:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
21:18 razzi@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes - razzi@cumin1001
21:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
21:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2242.codfw.wmnet
21:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2241.codfw.wmnet
21:16 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
21:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
21:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: REIMAGE
21:14 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
21:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2258.codfw.wmnet
21:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2255.codfw.wmnet
21:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2242.codfw.wmnet
21:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2241.codfw.wmnet
20:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
20:17 mutante: ACKing all unhandled crit alerts about systemd on clouddb hosts - notifications are disabled but this cleans up Icinga web UI noise - T267090
20:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
20:05 razzi@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes - razzi@cumin1001
19:31 urbanecm@deploy1001: Synchronized dblists/closed.dblist: d3e274e: Close lrcwiki (T272041) (duration: 00m 58s)
19:03 mutante: mc1024 - attempting to power on via mgmt, went down and power down
18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2258.codfw.wmnet with reason: REIMAGE
18:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2255.codfw.wmnet with reason: REIMAGE
18:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2242.codfw.wmnet with reason: REIMAGE
18:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2258.codfw.wmnet with reason: REIMAGE
18:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2255.codfw.wmnet with reason: REIMAGE
18:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2241.codfw.wmnet with reason: REIMAGE
18:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2242.codfw.wmnet with reason: REIMAGE
18:38 Amir1: started mass deletion of lrcwiki (T272041) - https://w.wiki/uPV
18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2241.codfw.wmnet with reason: REIMAGE
18:36 jynus: restarting backup1002, backup2002 T271913
18:05 jynus: restarting backup1001, backup2001 T271913
16:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack
16:47 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading openstack
16:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 93 hosts with reason: upgrading openstack
16:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 93 hosts with reason: upgrading openstack
16:32 moritzm: installing php-pear updates on stretch
16:03 moritzm: installing tomcat8 security updates
15:40 moritzm: installing sqlite3 security updates on Stretch
15:30 papaul: power down ms-be2022 for maintenance
15:19 otto@deploy1001: Finished deploy [analytics/refinery@1117f45]: Explicitly set timeout in banner_activity-druid-monthly-coord - T264358 (duration: 02m 16s)
15:16 otto@deploy1001: Started deploy [analytics/refinery@1117f45]: Explicitly set timeout in banner_activity-druid-monthly-coord - T264358
15:11 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
15:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 93 hosts with reason: upgrading openstack
14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 93 hosts with reason: upgrading openstack
14:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack
14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading openstack
14:56 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
14:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
14:28 arturo: running homer in asw-b-codfw* (T271519)
14:26 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
14:24 arturo: running homer in asw-b-codfw* (T271519)
14:10 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.26
14:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
14:06 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
14:06 hashar@deploy1001: Synchronized php-1.36.0-wmf.26/skins/CologneBlue/includes/CologneBlueHooks.php: Edit link may not be present, avoid undefined index notice T271978 (duration: 01m 07s)
13:56 aborrero@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:47 marostegui: Restart mysql on db2094 for openssl upgrades test
13:42 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single
13:23 moritzm: restarting mw canaries for openssl update
13:22 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
13:22 aborrero@cumin2001: START - Cookbook sre.dns.netbox
13:17 moritzm: installing openssl1.0 security updates on stretch
13:15 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
13:11 moritzm: installing xerces-c security updates on stretch
12:50 volans: upgraded python3-pynetbox to 5.3.0-1 on all affected hosts - T266487
12:49 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
12:47 aborrero@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
12:34 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
12:34 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
12:33 aborrero@cumin2001: START - Cookbook sre.hosts.decommission
12:29 aborrero@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
12:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1004.eqiad.wmnet with reason: REIMAGE
12:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1004.eqiad.wmnet with reason: REIMAGE
12:24 XioNoX: push pfw3 firewall rules - T271935
12:16 volans: upgraded python3-pynetbox to 5.3.0-1 on cumin2001
12:16 aborrero@cumin2001: START - Cookbook sre.hosts.decommission
12:14 elukey@cumin1001: END (ERROR) - Cookbook sre.presto.reboot-workers (exit_code=97) for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
12:14 volans: built and uploaded python3-pynetbox 5.3.0-1 to apt.wikimedia.org - T266487
12:10 awight: EU config window finished.
12:09 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused WMDE TeWü QuickSurveys (T253112, T272013) (duration: 01m 07s)
12:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
12:02 moritzm: rebooting miscweb1002
12:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
11:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
11:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
11:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
11:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
11:43 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
11:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
11:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
11:34 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@4164318]: (no justification provided) (duration: 30m 34s)
11:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
11:22 elukey@cumin1001: START - Cookbook sre.presto.reboot-workers for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
11:17 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
11:04 oblivian@deploy1001: Started deploy [docker-pkg/deploy@4164318]: (no justification provided)
11:04 oblivian@deploy1001: deploy aborted: (no justification provided) (duration: 00m 14s)
11:03 oblivian@deploy1001: Started deploy [docker-pkg/deploy@4164318]: (no justification provided)
11:01 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
10:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
10:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
10:35 aborrero@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
10:28 jbond42: failover apt.wikimedia.org back to apt1001
10:28 aborrero@cumin2001: START - Cookbook sre.hosts.decommission
10:25 jbond42: reboot apt1001
10:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
10:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
10:16 jbond42: failover apt.wikimedia.org to apt2001
10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
10:12 jbond42: reboot apt2001
10:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
09:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
09:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13768 and previous config saved to /var/cache/conftool/dbconfig/20210114-093803-root.json
09:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
09:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13767 and previous config saved to /var/cache/conftool/dbconfig/20210114-092300-root.json
09:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
09:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
09:11 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13766 and previous config saved to /var/cache/conftool/dbconfig/20210114-090756-root.json
09:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
09:01 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
08:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13765 and previous config saved to /var/cache/conftool/dbconfig/20210114-085252-root.json
08:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single
08:51 vgutierrez: rolling restart of ncredir servers to catch up on kernel upgrades
08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
08:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
08:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
08:43 XioNoX: standardize cloudsw interfaces to prepare for switches homerisation
08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2140 T271084', diff saved to https://phabricator.wikimedia.org/P13764 and previous config saved to /var/cache/conftool/dbconfig/20210114-084243-marostegui.json
08:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
08:10 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
08:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
00:22 ryankemper: T266492 Restart of `relforge` successful
00:20 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
00:15 chaomodus: completed rebooting Netbox hosts, failure was due to report errors that would not have recovered.
00:14 crusnov@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
00:13 ryankemper: `sudo -i cookbook sre.elasticsearch.rolling-restart relforge "relforge cluster restart" --task-id T266492 --nodes-per-run 1 --without-lvs`
00:13 ryankemper: (Forgot to tell it `relforge` isn't lvs-managed)
00:13 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
00:10 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
00:10 ryankemper: T266492 Beginning rolling restart of `relforge`
00:09 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2240.codfw.wmnet
00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2239.codfw.wmnet
00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2238.codfw.wmnet
00:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2237.codfw.wmnet
00:01 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
00:01 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
00:00 ryankemper: T266492 T268779 T265699 Rolling restart of `cloudelastic` was successful

2021-01-13

23:53 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
23:53 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
23:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
23:49 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
23:49 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
23:46 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
23:46 crusnov@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
23:46 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
23:44 chaomodus: rebooting Netbox instances to apply updates
23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2240.codfw.wmnet
23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2239.codfw.wmnet
23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2238.codfw.wmnet
23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2237.codfw.wmnet
22:53 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
22:53 ryankemper: T266492 T268779 T265699 `sudo -i cookbook sre.elasticsearch.rolling-restart cloudelastic "cloudelastic cluster restart" --task-id T266492 --nodes-per-run 1`
22:53 ryankemper: T266492 T268779 T265699 Restarting cloudelastic to apply new readahead changes, this will also verify cloudelastic support works in our elasticsearch spicerack code. Only going one node at a time because cloudelastic elasticsearch indices only have 1 replica shard per index
21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2239.codfw.wmnet with reason: new install on buster
21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2239.codfw.wmnet with reason: new install on buster
21:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2240.codfw.wmnet with reason: REIMAGE
21:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2238.codfw.wmnet with reason: REIMAGE
21:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2240.codfw.wmnet with reason: REIMAGE
21:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2239.codfw.wmnet with reason: REIMAGE
21:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2237.codfw.wmnet with reason: REIMAGE
21:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2239.codfw.wmnet with reason: REIMAGE
21:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2238.codfw.wmnet with reason: REIMAGE
21:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2237.codfw.wmnet with reason: REIMAGE
21:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2235.codfw.wmnet
21:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2234.codfw.wmnet
21:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2233.codfw.wmnet
21:12 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2232.codfw.wmnet
21:12 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2231.codfw.wmnet
21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2235.codfw.wmnet
21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2234.codfw.wmnet
21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2233.codfw.wmnet
21:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2232.codfw.wmnet
20:40 mutante: DNS - new project language "alt" added. Altai (also Gorno-Altai) is a Turkic language, spoken officially in the Altai Republic, Russia.
20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2235.codfw.wmnet with reason: REIMAGE
20:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
20:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
20:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2235.codfw.wmnet with reason: REIMAGE
20:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
20:02 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: REIMAGE
20:02 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: REIMAGE
20:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: REIMAGE
19:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 726e972: Set import sources for mrwikibooks (T270402) (duration: 01m 04s)
19:47 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/WikibaseMediaInfo/src/Search/MediaSearchProfiles.php: Guard against this file being included twice T271933 (for real -- forgot to submodule update) (duration: 01m 04s)
19:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2234.codfw.wmnet with reason: REIMAGE
19:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2234.codfw.wmnet with reason: REIMAGE
19:42 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/WikibaseMediaInfo/src/Search/MediaSearchProfiles.php: Guard against this file being included twice T271933 (duration: 01m 04s)
19:39 razzi@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid test cluster: Reboot Druid nodes - razzi@cumin1001
19:36 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Undo - Migrate SpecialMuteSubmit to EventGate - T268517 (duration: 01m 06s)
19:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2231.codfw.wmnet
19:20 razzi@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid test cluster: Reboot Druid nodes - razzi@cumin1001
19:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2230.codfw.wmnet
19:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2229.codfw.wmnet
19:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2228.codfw.wmnet
19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2233.codfw.wmnet with reason: REIMAGE
19:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2233.codfw.wmnet with reason: REIMAGE
18:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2232.codfw.wmnet with reason: REIMAGE
18:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2231.codfw.wmnet with reason: REIMAGE
18:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2232.codfw.wmnet with reason: REIMAGE
18:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2231.codfw.wmnet with reason: REIMAGE
18:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2227.codfw.wmnet
18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2228.codfw.wmnet
18:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2230.codfw.wmnet
18:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2229.codfw.wmnet
18:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2230.codfw.wmnet
18:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2227.codfw.wmnet
17:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2228.codfw.wmnet with reason: REIMAGE
17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2228.codfw.wmnet with reason: REIMAGE
17:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2230.codfw.wmnet with reason: REIMAGE
17:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2230.codfw.wmnet with reason: REIMAGE
17:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2229.codfw.wmnet with reason: REIMAGE
17:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2227.codfw.wmnet with reason: REIMAGE
17:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2229.codfw.wmnet with reason: REIMAGE
17:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2227.codfw.wmnet with reason: REIMAGE
17:11 herron: beginning cutover of https://logstash.wikimedia.org frontend to ELK7 T234854
17:02 mutante: m2228 resetting DRAC/BMC - trying to solve remote IPMI issue - bmc-device --cold-reset; echo $?
17:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:39 sukhe: upload pdns-recursor_4.4.2-2wm1 to apt.wm.o (buster) - T252132
16:18 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
16:18 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
16:18 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
16:17 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
16:06 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.26/extensions/ProofreadPage/includes/Special/SpecialProofreadPages.php: d73ba7c: GlobalVarConfig::get should not be provided with the wg prefix (T271932) (duration: 01m 07s)
15:56 volans: upgraded spicerack to 0.0.47-1+deb10u1 on cumin1001 - T257905
15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
15:45 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
15:45 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
15:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
15:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
15:42 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
15:42 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
15:41 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
15:41 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
15:22 hashar: Stopping Jenkins CI on contint2001 to upgrade Jenkins # T271507
15:11 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
15:06 volans@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
15:06 volans@cumin2001: START - Cookbook sre.hosts.downtime for 0:10:00 on cumin2001.codfw.wmnet with reason: volans's test
15:05 volans: upgraded spicerack to 0.0.47-1+deb10u1 on cumin2001 - T257905
15:01 hashar: Upgraded Jenkins on releases1002 / releases2002 hosts # T271507
14:57 moritzm: imported jenkins 2.263.2 (security release) to apt.wikimedia.org/buster-wikimedia
14:27 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.26/skins/Vector/includes/templates/legacy/Sidebar.mustache: 5a117de: Use Template:Link-mainpage in legacy sidebar same as new logo (T271873) (duration: 01m 05s)
14:17 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
14:17 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
14:16 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
14:15 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
14:14 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
14:14 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
14:14 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
14:12 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
14:04 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.26 (duration: 01m 03s)
14:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.26
13:52 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
13:52 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
13:51 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
13:51 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
13:50 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
13:50 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
13:49 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
13:49 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
13:48 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
13:36 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
13:36 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
13:31 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
13:31 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
13:07 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
12:15 dcausse: European mid-day backport window done
12:09 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T239931: Revert "Disable sanity check cirrus jobs for Wikidata" (duration: 01m 16s)
11:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
11:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1029.eqiad.wmnet with reason: REIMAGE
11:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
11:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1029.eqiad.wmnet with reason: REIMAGE
11:40 kart_: Updated cxserver to 2021-01-12-095820-production (T234220, T270408)
11:37 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
11:33 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
11:23 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
11:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 100%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13756 and previous config saved to /var/cache/conftool/dbconfig/20210113-111312-root.json
11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight on es4 the master', diff saved to https://phabricator.wikimedia.org/P13755 and previous config saved to /var/cache/conftool/dbconfig/20210113-110419-marostegui.json
10:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 75%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13754 and previous config saved to /var/cache/conftool/dbconfig/20210113-105809-root.json
10:57 volans: uploaded spicerack_0.0.47 to apt.wikimedia.org buster-wikimedia
10:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 50%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13753 and previous config saved to /var/cache/conftool/dbconfig/20210113-104305-root.json
10:35 jbond42: puppet re-enabled on aall cp-text hosts
10:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13751 and previous config saved to /var/cache/conftool/dbconfig/20210113-102802-root.json
10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce weight on es1021', diff saved to https://phabricator.wikimedia.org/P13750 and previous config saved to /var/cache/conftool/dbconfig/20210113-102245-marostegui.json
10:18 jbond42: disable puppet on the cp::text to deploy block list changes 651174 + 651171
10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020', diff saved to https://phabricator.wikimedia.org/P13749 and previous config saved to /var/cache/conftool/dbconfig/20210113-101606-marostegui.json
10:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: After restarting mysql', diff saved to https://phabricator.wikimedia.org/P13748 and previous config saved to /var/cache/conftool/dbconfig/20210113-100253-root.json
09:59 marostegui: Enable report_host on es1020 T271106
09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020', diff saved to https://phabricator.wikimedia.org/P13747 and previous config saved to /var/cache/conftool/dbconfig/20210113-095834-marostegui.json
09:49 marostegui: Enable report_host on all codfw sby masters - T271106
09:42 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
09:05 ayounsi@deploy1001: Finished deploy [homer/deploy@723ebfe]: Netbox 2.9 changes (duration: 03m 11s)
09:03 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-restart (exit_code=99)
09:02 ayounsi@deploy1001: Started deploy [homer/deploy@723ebfe]: Netbox 2.9 changes
09:02 moritzm: installing efivar bugfix update
09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
08:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
08:47 moritzm: draining ganeti4003 for eventual reboot
08:46 ema: cp5008: re-enable puppet to undo JIT tslua experiment T265625
08:35 moritzm: failover ganeti master in ulsfo to ganeti4002
08:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
08:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
08:19 moritzm: draining ganeti4002 for eventual reboot
08:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
08:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
08:04 ryankemper: [WDQS Deploy] Deploy is complete, and the WDQS service is healthy
07:59 moritzm: draining ganeti4001 for eventual reboot
07:29 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
07:29 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
07:28 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts simultaneously: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
07:28 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@fdd2c2f]: 0.3.59 (duration: 14m 23s)
07:15 ryankemper: [WDQS Deploy] All tests passing on canary instance `wdqs1003` following canary deploy. Proceeding to rest of fleet...
07:13 ryankemper@deploy1001: Started deploy [wdqs/wdqs@fdd2c2f]: 0.3.59
07:13 ryankemper: [WDQS Deploy] All tests passing on canary instance `wdqs1003` prior to start of deploy. Proceeding with canary deploy of version `0.3.59`...
07:04 ryankemper: T266492 T268779 T265699 Restarting cloudelastic to apply new readahead changes, this will also verify cloudelastic support works in our elasticsearch spicerack code. Only going one node at a time because cloudelastic elasticsearch indices only have 1 replica shard per index.
07:03 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13745 and previous config saved to /var/cache/conftool/dbconfig/20210113-065535-root.json
06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13744 and previous config saved to /var/cache/conftool/dbconfig/20210113-064031-root.json
06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13743 and previous config saved to /var/cache/conftool/dbconfig/20210113-062528-root.json
06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After cloning db1155:3317', diff saved to https://phabricator.wikimedia.org/P13742 and previous config saved to /var/cache/conftool/dbconfig/20210113-061024-root.json

2021-01-12

22:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2225.codfw.wmnet
22:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2224.codfw.wmnet
22:46 crusnov@deploy1001: Finished deploy [netbox/deploy@b17db99]: Rerun production deploy of Netbox 2.9 just in case T266487 (duration: 00m 05s)
22:46 crusnov@deploy1001: Started deploy [netbox/deploy@b17db99]: Rerun production deploy of Netbox 2.9 just in case T266487
22:37 chaomodus: Upgrade of Netbox to 2.9 complete, checking support software. T266487
22:33 crusnov@deploy1001: Finished deploy [netbox/deploy@b17db99]: Deploy Netbox 2.9.10 to production T266487 (duration: 02m 33s)
22:30 crusnov@deploy1001: Started deploy [netbox/deploy@b17db99]: Deploy Netbox 2.9.10 to production T266487
22:12 chaomodus: Merged Netbox 2.9 related changes in puppet and -extras; testing on -next T266487
22:07 bblack: reboot authdns1001 - T266746#6741647
22:04 chaomodus: proceeding with Netbox 2.9 upgrade T266487
22:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2225.codfw.wmnet with reason: REIMAGE
22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2225.codfw.wmnet with reason: REIMAGE
21:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2224.codfw.wmnet with reason: REIMAGE
21:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2224.codfw.wmnet with reason: REIMAGE
21:50 jforrester@deploy1001: Synchronized php-1.36.0-wmf.25/extensions/AbuseFilter/modules/mode-abusefilter.js: T271487 Don't pass protocol-relative URLs to the Ace worker (duration: 01m 06s)
21:41 ottomata: rolling restart of eventgate-analytics-external pods
20:40 tgr_: running 'mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=ukwiki' on terbium
19:57 tgr_: backports done
19:52 bblack: dns1001,authdns1001 - upgrade gdnsd to 3.5.0
19:49 tgr_: synced Config: Disable DiscussionTools' upcoming newtopictool (T270119)
19:49 tgr_: synced Config: Migrate HomepageVisit and ServerSideAccountCreation to Event Platform on testwiki (T267333)
19:48 tgr_: synced Config: Migrate SuggestedTagsAction to Event Platform on testwiki (T267351)
19:48 tgr_: synced Config: Alphabetize ORES settings (T256887)
19:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable ORES filters on ukwiki (T256887) (duration: 01m 05s)
19:32 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bunch of no-op/testwiki changes: gerrit:654520, gerrit:655301, gerrit:655706, gerrit:655723 (duration: 01m 05s)
19:27 bblack: dns3001,dns4001 - upgrade gdnsd to 3.5.0
19:25 ottomata: rolling restart of eventgate-analytics-external pods to clear schema caches - T267333
19:01 ariel@deploy1001: Synchronized php-1.36.0-wmf.26/includes/api/ApiQueryInfo.php: Backport: (gerrit 655671) Fix undefined index error in ApiQueryInfo (T271815) (duration: 01m 06s)
18:06 bblack: dns2001,dns5001 - upgrade gdnsd to 3.5.0
17:40 bblack: dnsX002 - upgrade gdnsd to 3.5.0
17:20 herron: roll restarting eqiad/codfw low-traffic pybals for kibana-next -> kibana7 rename
17:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
17:09 jynus: shutting down db2132, db2078:m1 for m1 codfw replica reprovisioning T270877
17:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
17:09 moritzm: rebooting people1002 (people.wikimedia.org)
16:56 moritzm: reinstalling bast3005 with correct DHCP settings
16:39 herron@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: cluster=kibana7,service=kibana7
16:37 ema: cp5008: ats-backend-restart to apply jit.off(true, true) to all lua scripts T265625
16:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
16:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
16:18 herron@puppetmaster1001: conftool action : set/weight=10; selector: name=logstash2031.codfw.wmnet
15:56 ema: cp5008: ats-backend-restart to apply jit.off(true, true) in default.lua T265625
15:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
15:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
15:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2055.codfw.wmnet with reason: reboot
15:52 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ms-be2055.codfw.wmnet with reason: reboot
15:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2031.codfw.wmnet with reason: test unattended reboot
15:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ms-be2031.codfw.wmnet with reason: test unattended reboot
14:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
14:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
14:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
14:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.26
14:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
13:55 moritzm: draining ganeti3003 for eventual reboot
13:53 moritzm: failover ganeti master in esams to ganeti3002
13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
13:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
13:33 moritzm: draining ganeti3002 for eventual reboot
12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
12:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
12:08 moritzm: draining ganeti3001 for eventual reboot
11:22 moritzm: installing edk2 security updates
10:51 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
10:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: T271058
10:28 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: T271058
10:26 moritzm: installing systemd bugfix update from Buster 10.7 point release
10:15 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.26 (duration: 67m 18s)
10:13 marostegui: Restart mysql on db1138 to pick up new config T271427 T271106
10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P13736 and previous config saved to /var/cache/conftool/dbconfig/20210112-101211-marostegui.json
09:28 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.26
09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13732 and previous config saved to /var/cache/conftool/dbconfig/20210112-090533-root.json
08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13731 and previous config saved to /var/cache/conftool/dbconfig/20210112-085030-root.json
08:49 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: T271755 (duration: 00m 57s)
08:47 liw: 1.36.0-wmf.26 was branched at e6ad9ab for T267419
08:40 marostegui: Sanitize bclwiktionary diqwiktionary niawiki niawiktionary diqwiktionary on db1124 db2094 db11154 T270280 T270276 T270414 T270410 T271261
08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13730 and previous config saved to /var/cache/conftool/dbconfig/20210112-083526-root.json
08:30 moritzm: installing remaining curl security updates on stretch
08:21 marostegui: Deploy schema change on s3 eqiad master - T270187
08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13729 and previous config saved to /var/cache/conftool/dbconfig/20210112-082023-root.json
08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P13728 and previous config saved to /var/cache/conftool/dbconfig/20210112-080419-marostegui.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13727 and previous config saved to /var/cache/conftool/dbconfig/20210112-070051-root.json
06:53 XioNoX: push CR655445, only configure vlans relevant to a switch
06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13726 and previous config saved to /var/cache/conftool/dbconfig/20210112-064548-root.json
06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13725 and previous config saved to /var/cache/conftool/dbconfig/20210112-063044-root.json
06:30 jhuneidi@deploy1001: Pruned MediaWiki: 1.36.0-wmf.21 (duration: 03m 21s)
06:16 marostegui: Stop mysql on db1079 to clone db1155:3317 T268742
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13724 and previous config saved to /var/cache/conftool/dbconfig/20210112-061541-root.json
06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P13723 and previous config saved to /var/cache/conftool/dbconfig/20210112-060557-marostegui.json
05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075', diff saved to https://phabricator.wikimedia.org/P13722 and previous config saved to /var/cache/conftool/dbconfig/20210112-055953-marostegui.json

2021-01-11

22:16 eileen: process-control config revision is f08249ecf9 eoy jobs disabled
22:12 eileen: civicrm revision changed from 2df572bdcd to f417a510a5, config revision is f08249ecf9
21:58 Amir1: deleting watchlist enteries of Fawikibot in fawiki (1.1M rows)
21:20 mutante: docker images - [deneb:/srv/images/production-images] $ sudo -i build-production-images
21:02 bblack: dns4002 - upgrade gdnsd to 3.5.0 package
20:47 bblack: authdns2001 - upgrade gdnsd to 3.5.0 package
19:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate UniversalLanguageSelector to Event Platform - T268517 (duration: 00m 57s)
19:43 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T270417 T270413 T270279)
19:14 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T270417 T270413 T270279)
18:48 dpifke@deploy1001: Finished deploy [performance/arc-lamp@6bbac6d]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/650277 (duration: 00m 04s)
18:48 dpifke@deploy1001: Started deploy [performance/arc-lamp@6bbac6d]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/650277
18:01 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: T181217 (duration: 00m 56s)
18:00 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: T181217 (duration: 00m 57s)
17:57 reedy@deploy1001: Synchronized wmf-config/extension-list: T181217 (duration: 00m 56s)
17:48 Amir1: manually removing watchlist rows for Dexbot in Wikidata
17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on deploy2002.codfw.wmnet with reason: new install on buster
17:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on deploy2002.codfw.wmnet with reason: new install on buster
17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on deploy1002.eqiad.wmnet with reason: new install on buster
17:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on deploy1002.eqiad.wmnet with reason: new install on buster
17:40 mutante: deploy2002 - scap pull
17:39 mutante: deploy1002 - scap pull
17:15 andrew@deploy1001: Finished deploy [striker/deploy@b6441b8]: Striker deploy for T271621 (duration: 01m 59s)
17:13 andrew@deploy1001: Started deploy [striker/deploy@b6441b8]: Striker deploy for T271621
17:12 andrew@deploy1001: Finished deploy [striker/deploy@b6441b8]: Striker deploy for T271621 (duration: 02m 05s)
17:10 andrew@deploy1001: Started deploy [striker/deploy@b6441b8]: Striker deploy for T271621
16:48 Urbanecm: Create new wiki window is completed
16:43 andrew@deploy1001: Finished deploy [striker/deploy@3180f72]: Striker deploy for T271621 (duration: 01m 01s)
16:42 andrew@deploy1001: Started deploy [striker/deploy@3180f72]: Striker deploy for T271621
16:37 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 18s)
16:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating bclwiktionary (T270274) (duration: 00m 56s)
16:33 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating bclwiktionary (T270274) (duration: 00m 56s)
16:32 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating bclwiktionary (T270274)
16:30 urbanecm@deploy1001: Synchronized dblists: Creating bclwiktionary (T270274) (duration: 00m 55s)
16:29 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating bclwiktionary (T270274) (duration: 00m 55s)
16:26 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating bclwiktionary (T270274) (duration: 00m 54s)
16:25 moritzm: installing openldap security updates on stretch (client tools/libs only, all slapd installation on Buster and fixed already)
16:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating diqwiktionary (T270275) (duration: 00m 56s)
16:20 andrew@deploy1001: Finished deploy [striker/deploy@ba6c0ae]: Striker deploy for T271621 (duration: 02m 02s)
16:18 andrew@deploy1001: Started deploy [striker/deploy@ba6c0ae]: Striker deploy for T271621
16:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating diqwiktionary (T270275) (duration: 01m 34s)
16:17 moritzm: installing remaining p11-kit security updates on stretch
16:15 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating diqwiktionary (T270275)
16:14 urbanecm@deploy1001: Synchronized dblists: Creating diqwiktionary (T270275) (duration: 00m 57s)
16:13 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating diqwiktionary (T270275) (duration: 00m 57s)
16:12 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating diqwiktionary (T270275) (duration: 00m 55s)
16:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating niawiktionary (T270409) (duration: 00m 55s)
16:05 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating niawiktionary (T270409) (duration: 00m 56s)
16:04 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating niawiktionary (T270409)
16:03 urbanecm@deploy1001: Synchronized dblists: Creating niawiktionary (T270409) (duration: 00m 55s)
16:02 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating niawiktionary (T270409) (duration: 00m 56s)
16:01 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating niawiktionary (T270409) (duration: 00m 56s)
15:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
15:57 andrew@deploy1001: Finished deploy [striker/deploy@b2804f2]: Striker deploy for T271621 (duration: 02m 05s)
15:56 urbanecm@deploy1001: Synchronized langlist: Creating niawiki (T270408) (duration: 00m 53s)
15:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating niawiki (T270408) (duration: 00m 55s)
15:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1028.eqiad.wmnet with reason: REIMAGE
15:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
15:55 andrew@deploy1001: Started deploy [striker/deploy@b2804f2]: Striker deploy for T271621
15:54 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating niawiki (T270408) (duration: 00m 56s)
15:54 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating niawiki (T270408)
15:53 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1028.eqiad.wmnet with reason: REIMAGE
15:52 urbanecm@deploy1001: Synchronized dblists: Creating niawiki (T270408) (duration: 00m 57s)
15:51 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating niawiki (T270408) (duration: 00m 57s)
15:50 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating niawiki (T270408) (duration: 00m 56s)
15:48 andrew@deploy1001: Finished deploy [striker/deploy@fb85bfd]: Striker deploy for T271621 (duration: 00m 43s)
15:47 andrew@deploy1001: Started deploy [striker/deploy@fb85bfd]: Striker deploy for T271621
15:47 andrew@deploy1001: Finished deploy [striker/deploy@fb85bfd]: Striker deploy for T271621 (duration: 01m 45s)
15:45 andrew@deploy1001: Started deploy [striker/deploy@fb85bfd]: Striker deploy for T271621
15:42 andrew@deploy1001: Finished deploy [striker/deploy@fb85bfd]: Striker deploy for T271621 (duration: 01m 04s)
15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13720 and previous config saved to /var/cache/conftool/dbconfig/20210111-154123-root.json
15:41 andrew@deploy1001: Started deploy [striker/deploy@fb85bfd]: Striker deploy for T271621
15:36 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate SpecialMuteSubmit to EventGate - T268517 (duration: 00m 58s)
15:32 effie: upgrading python-thumbor-wikimedia to 2.9 on thumbor1001
15:31 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13719 and previous config saved to /var/cache/conftool/dbconfig/20210111-152619-root.json
15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13718 and previous config saved to /var/cache/conftool/dbconfig/20210111-151116-root.json
15:06 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
15:05 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
15:05 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13717 and previous config saved to /var/cache/conftool/dbconfig/20210111-145612-root.json
14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P13716 and previous config saved to /var/cache/conftool/dbconfig/20210111-145239-marostegui.json
14:32 XioNoX: add Routinator 0.8.2 to APT repo - T269738
14:22 moritzm: restarting FPM/Apache on app server canaries for curl update
14:13 marostegui: Deploy schema change on s3 codfw master - T270187
13:52 moritzm: installing curl security updates on stretch
13:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13713 and previous config saved to /var/cache/conftool/dbconfig/20210111-134213-root.json
13:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13712 and previous config saved to /var/cache/conftool/dbconfig/20210111-132709-root.json
13:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13711 and previous config saved to /var/cache/conftool/dbconfig/20210111-131206-root.json
11:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 03s)
11:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
11:10 XioNoX: upgrade Routinator to 0.8.2 on rpki2001 - T269738
11:10 jbond42: push change to ratelimit vscode-phabricator - T271528
10:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ab9e80d: Enable anniversary logo for cs.wikipedia (T271662; 2/2) (duration: 00m 56s)
10:35 urbanecm@deploy1001: Synchronized static/images/project-logos/: ab9e80d: Enable anniversary logo for cs.wikipedia (T271662; 1/2) (duration: 01m 00s)
10:06 ema: cp3050: restart ats-be to lower lua states from 256 to 64 T265625
09:31 marostegui: Sanitize db1155:3314 - T268742
09:31 marostegui: Deploy schema change on s1 codfw master - T270187
09:02 elukey: force puppet on logstash1007 after ES OOM
08:55 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
08:24 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1030.eqiad.wmnet with reason: REIMAGE
08:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2030.codfw.wmnet with reason: REIMAGE
08:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1030.eqiad.wmnet with reason: REIMAGE
08:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2030.codfw.wmnet with reason: REIMAGE
07:49 dcausse: depooling & restarting blazegraph on wdqs2007 (T242453)
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13709 and previous config saved to /var/cache/conftool/dbconfig/20210111-074853-root.json
07:43 dcausse: repool wdqs1007 (wrong machine) (T242453)
07:41 dcausse: depooling & restarting blazegraph on wdqs1007 (T242453)
07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13708 and previous config saved to /var/cache/conftool/dbconfig/20210111-073349-root.json
07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13707 and previous config saved to /var/cache/conftool/dbconfig/20210111-071846-root.json
07:12 marostegui: Deploy schema change on s8 codfw master - T270187
07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13706 and previous config saved to /var/cache/conftool/dbconfig/20210111-070342-root.json
06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P13704 and previous config saved to /var/cache/conftool/dbconfig/20210111-065640-marostegui.json
06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13703 and previous config saved to /var/cache/conftool/dbconfig/20210111-065550-root.json
06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13702 and previous config saved to /var/cache/conftool/dbconfig/20210111-064046-root.json
06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P13701 and previous config saved to /var/cache/conftool/dbconfig/20210111-063226-marostegui.json
06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1094', diff saved to https://phabricator.wikimedia.org/P13700 and previous config saved to /var/cache/conftool/dbconfig/20210111-063155-marostegui.json
06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094', diff saved to https://phabricator.wikimedia.org/P13699 and previous config saved to /var/cache/conftool/dbconfig/20210111-063124-marostegui.json
06:04 marostegui: Depool db1121 to clone db1155:3314
06:04 marostegui: Deploy schema change on s7 codfw master - T270187
06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13698 and previous config saved to /var/cache/conftool/dbconfig/20210111-060342-marostegui.json

2021-01-09

00:11 mutante: puppetmaster2003 - restarted apache after spweing 500s

2021-01-08

19:48 andrew@deploy1001: Finished deploy [striker/deploy@e4db843]: Striker deploy for T269004 (duration: 02m 11s)
19:45 andrew@deploy1001: Started deploy [striker/deploy@e4db843]: Striker deploy for T269004
19:28 andrew@deploy1001: Finished deploy [horizon/deploy@7466703]: Horizon with a bunch of Buster patches (duration: 02m 35s)
19:26 andrew@deploy1001: Started deploy [horizon/deploy@7466703]: Horizon with a bunch of Buster patches
18:02 joal@deploy1001: Finished deploy [analytics/refinery@db9da3c] (thin): Hotfix analytics deployment - THIN [analytics/refinery@db9da3c] (duration: 00m 07s)
18:02 joal@deploy1001: Started deploy [analytics/refinery@db9da3c] (thin): Hotfix analytics deployment - THIN [analytics/refinery@db9da3c]
18:01 joal@deploy1001: Finished deploy [analytics/refinery@db9da3c]: Hotfix analytics deployment [analytics/refinery@db9da3c] (duration: 11m 27s)
17:50 joal@deploy1001: Started deploy [analytics/refinery@db9da3c]: Hotfix analytics deployment [analytics/refinery@db9da3c]
17:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on maps2007.codfw.wmnet with reason: Downtiming while not pooled
17:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on maps2007.codfw.wmnet with reason: Downtiming while not pooled
17:15 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2 days, 12:00:00 on maps2007.codfw.wmnet with reason: Downtiming while not pooled
17:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on maps2007.codfw.wmnet with reason: Downtiming while not pooled
17:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on maps1009.eqiad.wmnet with reason: Downtiming while not pooled
17:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on maps1009.eqiad.wmnet with reason: Downtiming while not pooled
17:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labweb1001.wikimedia.org with reason: REIMAGE
17:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labweb1001.wikimedia.org with reason: REIMAGE
16:50 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:43 razzi@cumin1001: START - Cookbook sre.dns.netbox
16:42 andrewbogott: shutting down labweb1001 so I can really believe that all traffic is being served by 1002
16:35 andrew@deploy1001: Finished deploy [horizon/deploy@7466703]: selective disable of problematic compression block (duration: 01m 42s)
16:33 andrew@deploy1001: Started deploy [horizon/deploy@7466703]: selective disable of problematic compression block
16:32 andrew@deploy1001: Finished deploy [horizon/deploy@7466703]: selective disable of problematic compression block (duration: 01m 52s)
16:30 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
16:30 andrew@deploy1001: Started deploy [horizon/deploy@7466703]: selective disable of problematic compression block
16:24 razzi@cumin1001: START - Cookbook sre.hosts.decommission
15:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
15:58 andrew@deploy1001: Finished deploy [horizon/deploy@ecaad83]: minor django package upgrades -> labweb1002 (duration: 04m 25s)
15:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
15:54 andrew@deploy1001: Started deploy [horizon/deploy@ecaad83]: minor django package upgrades -> labweb1002
15:51 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
15:43 andrew@deploy1001: Finished deploy [horizon/deploy@ecaad83]: minor django package upgrades -> codfw1dev (duration: 00m 29s)
15:43 andrew@deploy1001: Started deploy [horizon/deploy@ecaad83]: minor django package upgrades -> codfw1dev
15:39 reedy@deploy1001: Synchronized php-1.36.0-wmf.25/extensions/AbuseFilter/: T271430 T271431 T271432 T271433 (duration: 01m 00s)
15:39 andrew@deploy1001: Finished deploy [horizon/deploy@ecaad83]: minor django package upgrades -> codfw1dev (duration: 01m 39s)
15:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
15:38 andrew@deploy1001: Started deploy [horizon/deploy@ecaad83]: minor django package upgrades -> codfw1dev
15:24 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev (duration: 01m 06s)
15:23 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev
15:18 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades + compression (duration: 01m 47s)
15:17 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades + compression
15:14 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev (duration: 01m 00s)
15:13 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev
15:12 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev (duration: 00m 05s)
15:12 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev
15:11 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev (duration: 01m 30s)
15:09 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades -> codfw1dev
15:08 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades (duration: 01m 49s)
15:06 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades
15:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13697 and previous config saved to /var/cache/conftool/dbconfig/20210108-150617-root.json
15:03 andrew@deploy1001: Finished deploy [horizon/deploy@f6c50db]: minor django package upgrades (duration: 01m 35s)
15:02 andrew@deploy1001: Started deploy [horizon/deploy@f6c50db]: minor django package upgrades
14:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 66%: After schema change', diff saved to https://phabricator.wikimedia.org/P13696 and previous config saved to /var/cache/conftool/dbconfig/20210108-145113-root.json
14:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 33%: After schema change', diff saved to https://phabricator.wikimedia.org/P13695 and previous config saved to /var/cache/conftool/dbconfig/20210108-143610-root.json
13:42 klausman@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2004.codfw.wmnet with reason: REIMAGE
13:41 klausman@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
13:39 klausman@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2003.codfw.wmnet with reason: REIMAGE
13:37 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2004.codfw.wmnet with reason: REIMAGE
13:37 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: REIMAGE
13:37 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2003.codfw.wmnet with reason: REIMAGE
12:52 klausman@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE
12:49 klausman@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2001.codfw.wmnet with reason: REIMAGE
12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13694 and previous config saved to /var/cache/conftool/dbconfig/20210108-120415-root.json
11:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13693 and previous config saved to /var/cache/conftool/dbconfig/20210108-114912-root.json
11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13692 and previous config saved to /var/cache/conftool/dbconfig/20210108-113408-root.json
11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13691 and previous config saved to /var/cache/conftool/dbconfig/20210108-111905-root.json
11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P13690 and previous config saved to /var/cache/conftool/dbconfig/20210108-111733-marostegui.json
11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P13689 and previous config saved to /var/cache/conftool/dbconfig/20210108-111345-root.json
10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P13688 and previous config saved to /var/cache/conftool/dbconfig/20210108-105842-root.json
10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P13676 and previous config saved to /var/cache/conftool/dbconfig/20210108-104338-root.json
10:38 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 10s)
10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P13675 and previous config saved to /var/cache/conftool/dbconfig/20210108-102835-root.json
10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P13674 and previous config saved to /var/cache/conftool/dbconfig/20210108-102606-marostegui.json
10:01 elukey: restart varnishkafka-webrequest on cp5001 - timeouts to kafka-jumbo1001, librdkafka seems not recovering very well
10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13673 and previous config saved to /var/cache/conftool/dbconfig/20210108-100040-root.json
09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13672 and previous config saved to /var/cache/conftool/dbconfig/20210108-094535-root.json
09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13671 and previous config saved to /var/cache/conftool/dbconfig/20210108-093032-root.json
09:30 marostegui: Restart mysql on db1115 (tendril/dbtree)
09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After cloning db1155:3316', diff saved to https://phabricator.wikimedia.org/P13670 and previous config saved to /var/cache/conftool/dbconfig/20210108-091528-root.json
09:08 moritzm: installing libxstream-java security updates on Buster
09:01 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
08:12 marostegui: Deploy schema change on s4 codfw master - T270187
07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P13669 and previous config saved to /var/cache/conftool/dbconfig/20210108-075714-marostegui.json
07:23 marostegui: Deploy schema change on s5 codfw master - T270187
06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 to clone db1155:3316 T268742 ', diff saved to https://phabricator.wikimedia.org/P13666 and previous config saved to /var/cache/conftool/dbconfig/20210108-063301-marostegui.json
06:18 marostegui: Deploy schema change on s2 codfw master - T270187
04:59 mutante: mw1266 - restart-php7.2-fpm
03:04 ryankemper: [wdqs deploy] Deploy complete, service is healthy. This is done.
02:35 ryankemper: [wdqs deploy] Restarting `wdqs-categories` across load-balanced instances, one host at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
02:35 ryankemper: [wdqs deploy] Restarted `wdqs-categories` across test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
02:34 ryankemper: [wdqs deploy] Restarted `wdqs-updater` across all instances: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
02:27 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b15fc5c]: 0.3.58 (duration: 18m 04s)
02:15 ryankemper: [wdqs deploy] Nevermind - the UI failure I mentioned above is transient. Restarting my ssh tunnel seemed to make the problem go away. Proceeding with deploy
02:12 ryankemper: [wdqs deploy] While queries run fine, it looks like there might be a UI glitch in this version. Digging in to see if it's transient, but I'll likely be aborting this deploy
02:09 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b15fc5c]: 0.3.58
02:09 ryankemper: [wdqs deploy] Tests passing on canary before beginning wdqs deploy, proceeding
01:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
01:28 mutante: mw1276, mw1277 - first API appervers on buster, now serving traffic, free to depool if any issues
01:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1277.eqiad.wmnet
01:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1276.eqiad.wmnet
01:24 mutante: mw1266 - another buster appserver now serving traffic
01:24 mutante: mw1265 - raised weight to 25 like regular appservers (buster)
01:23 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1265.eqiad.wmnet
01:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1266.eqiad.wmnet
01:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1277.eqiad.wmnet
01:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1276.eqiad.wmnet
01:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
01:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1266.eqiad.wmnet
00:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1277.eqiad.wmnet with reason: REIMAGE
00:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1267.eqiad.wmnet with reason: REIMAGE
00:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1277.eqiad.wmnet with reason: REIMAGE
00:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1276.eqiad.wmnet with reason: REIMAGE
00:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1267.eqiad.wmnet with reason: REIMAGE
00:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1276.eqiad.wmnet with reason: REIMAGE
00:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1266.eqiad.wmnet with reason: REIMAGE
00:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1266.eqiad.wmnet with reason: REIMAGE
00:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Undeploy graphoid on enwiki T271495 (duration: 00m 57s)

2021-01-07

23:55 mutante: reimaging mw1267,mw1276,mw1277
23:28 mutante: reimaging mw1266
23:14 andrew@deploy1001: Finished deploy [horizon/deploy@25ffdee]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 00s)
23:12 andrew@deploy1001: Started deploy [horizon/deploy@25ffdee]: trying to debug a compression error that doesn't happen on the test host
22:54 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 04s)
22:54 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
22:52 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 07m 44s)
22:44 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
22:41 andrew@deploy1001: Finished deploy [striker/deploy@e4db843]: striker -> labweb1002 (duration: 00m 04s)
22:41 andrew@deploy1001: Started deploy [striker/deploy@e4db843]: striker -> labweb1002
22:39 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 06s)
22:39 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
22:31 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:24 robh@cumin1001: START - Cookbook sre.dns.netbox
22:19 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=labweb1002.wikimedia.org
22:12 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.25 refs T267418
21:43 jforrester@deploy1001: Synchronized php-1.36.0-wmf.25/extensions/CodeMirror/resources/ext.CodeMirror.js: T271457 Guard against WikiEditor being removed by the time the hook runs (duration: 01m 05s)
21:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE
21:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE
21:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Revert "group[2] wikis to 1.36.0-wmf.22"
20:54 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:44 razzi@cumin1001: START - Cookbook sre.dns.netbox
20:43 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
20:24 razzi@cumin1001: START - Cookbook sre.hosts.decommission
20:08 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.25 refs T267418
20:01 bstorm: restarting haproxy on dbproxy1018 to pick up new config file
19:56 mutante: removing mongodb PHP extension, config, package from mwdebug* hosts - T180761
19:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.25/includes/DefaultSettings.php: 5986673: Revert "Provide native support to dismiss sitenotice in core." (T271365; T259903; 3/3) (duration: 01m 03s)
19:55 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.25/resources/: 5986673: Revert "Provide native support to dismiss sitenotice in core." (T271365; T259903; 2/3) (duration: 01m 05s)
19:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.25/includes/skins/: 5986673: Revert "Provide native support to dismiss sitenotice in core." (T271365; T259903; 1/3) (duration: 01m 04s)
19:10 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: 8a849d9: throttle: Cleanup outdated rules (duration: 01m 06s)
19:05 urbanecm@deploy1001: Synchronized wmf-config/Wikibase.php: 90f98c6: Use DisabledSpecialPage to disable ItemDisambiguation (T271389) (duration: 01m 08s)
18:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
18:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1027.eqiad.wmnet with reason: REIMAGE
18:46 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
18:45 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1027.eqiad.wmnet with reason: REIMAGE
18:34 volans@deploy1001: Finished deploy [homer/deploy@fe7acbc]: Release v0.2.6 (duration: 04m 25s)
18:30 volans@deploy1001: Started deploy [homer/deploy@fe7acbc]: Release v0.2.6
16:50 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 50s)
16:47 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
16:46 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 05s)
16:44 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
16:44 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 08s)
16:44 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
16:16 moritzm: installing xerces-c security updates on Buster
15:53 moritzm: installing xorg-server security updates on stretch
15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb2001-dev.wikimedia.org with reason: REIMAGE
15:46 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE
15:45 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb2001-dev.wikimedia.org with reason: REIMAGE
15:44 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE
15:14 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:11 kormat@cumin1001: START - Cookbook sre.dns.netbox
15:10 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
15:09 moritzm: installing libmaxminddb security updates on stretch
15:06 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 03m 34s)
15:03 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
15:01 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 25s)
14:59 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
14:58 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 00m 03s)
14:58 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
14:58 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host (duration: 02m 04s)
14:57 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
14:56 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: trying to debug a compression error that doesn't happen on the test host
14:54 kormat@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
14:54 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
14:54 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels (duration: 02m 05s)
14:52 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels
14:51 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels (duration: 00m 04s)
14:51 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels
14:51 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: refresh cloudweb2001-dev (duration: 01m 53s)
14:49 kormat@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
14:49 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: refresh cloudweb2001-dev
14:46 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels (duration: 03m 39s)
14:42 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels (duration: 00m 04s)
14:42 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: refresh labweb1002 with buster-ready wheels
14:40 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
14:33 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
14:32 jayme: imported calico 3.17.0-2 to component/calico-future stretch-wikimedia
14:32 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
14:08 moritzm: installing sqlite3 security updates on buster
12:42 _joe_: running puppet on logstash1007, elasticsearch oomkilled
12:24 marostegui: Deploy schema change on s2 primary master with replication T270053
12:21 kart_: EU-Mid day backport window done.
12:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable ContentTranslation in Tsonga Wikipedia as a default tool (T271204) (duration: 01m 09s)
12:02 XioNoX: push "Allow specific flows from 172.16/12 to prod + default permit" - T209082
10:51 godog: swift eqiad-prod: add weight to ms-be106[0-3] - T268435
10:06 godog: bounce apache on prometheus codfw
09:37 moritzm: installing nodejs security updates on buster
09:27 XioNoX: push new pfw policies - T271384
09:17 XioNoX: re-pool eqsin - T267544
08:57 XioNoX: re-enable BGP on cr2-eqsin - T267544
08:14 XioNoX: shutdown cr2-eqsin - T267544
07:43 moritzm: installing libxml2 security updates on buster
07:19 XioNoX: depool eqsin for router replacement - T267544
04:12 andrew@deploy1001: Finished deploy [striker/deploy@e4db843]: update codfw1dev striker (duration: 00m 48s)
04:11 andrew@deploy1001: Started deploy [striker/deploy@e4db843]: update codfw1dev striker
02:57 andrew@deploy1001: Finished deploy [striker/deploy@e120c6c]: update codfw1dev striker (duration: 00m 05s)
02:57 andrew@deploy1001: Started deploy [striker/deploy@e120c6c]: update codfw1dev striker
02:49 andrew@deploy1001: Finished deploy [horizon/deploy@ce4c515]: update codfw1dev deploy from train-buster branch (duration: 01m 48s)
02:48 andrew@deploy1001: Started deploy [horizon/deploy@ce4c515]: update codfw1dev deploy from train-buster branch
00:50 eileen: civicrm revision changed from 95fefcb421 to 2df572bdcd, config revision is 603c0ff4a0
00:44 shdubsh: restart elasticsearch on logstash1011 - oom

2021-01-06

22:34 sbassett: Deployed security patch for T270988 to wmf.22
22:32 sbassett: Deployed security patch for T270988 to wmf.25
22:17 legoktm: forced gerrit to replicate RequestTimeout to github (`ssh gerrit.wikimedia.org replication start mediawiki/libs/RequestTimeout --wait`)
21:59 dwisehaupt: increased batch size for eoy_receipt to 4000 per run when frmx1001 is the primary mx
21:58 dwisehaupt: process-control config revision is 603c0ff4a0
21:25 andrew@deploy1001: Finished deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch (duration: 09m 04s)
21:24 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.25 refs T267418 (duration: 01m 04s)
21:22 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.25 refs T267418
21:16 andrew@deploy1001: Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch
21:16 andrew@deploy1001: Finished deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch (duration: 00m 06s)
21:16 andrew@deploy1001: Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch
21:15 andrew@deploy1001: Finished deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch (duration: 00m 05s)
21:15 andrew@deploy1001: Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch
21:14 andrew@deploy1001: Finished deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch (duration: 00m 05s)
21:14 andrew@deploy1001: Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch
21:13 andrew@deploy1001: Finished deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch (duration: 08m 42s)
21:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.25 refs T267418
21:04 andrew@deploy1001: Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch
21:04 andrew@deploy1001: deploy aborted: update codfw1dev deploy from train-buster branch (duration: 00m 07s)
21:04 andrew@deploy1001: Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch
21:04 andrew@deploy1001: deploy aborted: update codfw1dev deploy from train-buster branch (duration: 00m 07s)
21:04 andrew@deploy1001: Started deploy [horizon/deploy@b285acd]: update codfw1dev deploy from train-buster branch
21:00 andrew@deploy1001: Finished deploy [horizon/deploy@965995d]: update codfw1dev deploy from train-buster branch (duration: 01m 51s)
20:58 andrew@deploy1001: Started deploy [horizon/deploy@965995d]: update codfw1dev deploy from train-buster branch
20:43 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster (duration: 02m 30s)
20:40 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster
20:40 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster (duration: 00m 07s)
20:40 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster
20:38 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster (duration: 00m 05s)
20:38 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy for Buster
20:27 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: (no justification provided) (duration: 00m 05s)
20:27 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: (no justification provided)
20:26 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: update codfw1dev deploy (duration: 00m 08s)
20:26 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: update codfw1dev deploy
19:37 eileen: increased batch processes on eoy_receipt - now 9000 per hour process-control config revision is 7b8319b97d
19:25 dpifke: Morning backport window complete.
19:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb2001-dev.wikimedia.org with reason: REIMAGE
19:20 dpifke@deploy1001: Synchronized lib: Removing unused profiler libraries (duration: 01m 03s)
19:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb2001-dev.wikimedia.org with reason: REIMAGE
19:18 dpifke@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: Removing unused profiler code (duration: 01m 04s)
19:17 dpifke@deploy1001: Synchronized wmf-config/profiler.php: Removing unused profiler code (duration: 01m 08s)
19:13 eileen: process-control config revision is 2909e8b9ed
19:11 eileen: civicrm revision changed from 1d5f6365ba to 95fefcb421, config revision is 9bc3d67b02
18:14 mutante: creating uz.wikimedia.org - Uzbek language User Group - https://meta.wikimedia.org/wiki/Affiliations_Committee/Resolutions/Wikimedians_of_the_Uzbek_language_User_Group
17:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE
17:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labweb1002.wikimedia.org with reason: REIMAGE
17:26 andrewbogott: depooling labweb1002 for rebuild
17:00 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2026.codfw.wmnet with reason: REIMAGE
16:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1026.eqiad.wmnet with reason: REIMAGE
16:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2026.codfw.wmnet with reason: REIMAGE
16:56 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1026.eqiad.wmnet with reason: REIMAGE
16:42 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
16:32 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
16:31 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
16:15 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
16:15 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
16:02 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
16:01 moritzm: installing cups security updates on buster (client-side tools/libs)
15:54 moritzm: installing openexr security updates
15:28 jayme: imported calico 3.17.1-1 to component/calico-future stretch-wikimedia
15:20 moritzm: restarting FPM/Apache on mw canaries to pick up p11-kit update
15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
15:01 moritzm: installing p11-kit security updates on stretch
14:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
14:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
13:03 moritzm: installing tcpdump security updates
12:11 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Create rollbacker group on mrwiki (T270864) (duration: 01m 21s)
11:33 moritzm: installing ruby2.5 security updates
11:18 moritzm: installing libjpeg-turbo security updates on buster
11:14 moritzm: remove cloudceph2002-dev.wikimedia.org and cloudceph2003-dev.wikimedia.org from debmonitor (got reinstalled as .wmnet)
10:40 jmm@cumin2001: dbctl commit (dc=all): 'Depool db2140', diff saved to https://phabricator.wikimedia.org/P13658 and previous config saved to /var/cache/conftool/dbconfig/20210106-104029-jmm.json
10:38 moritzm: depooling db2140 T271084
08:40 moritzm: installing Linux 4.9.246 on stretch hosts (no reboots yet)
01:22 mutante: testreduce1001 rm -rf /srv/deployment/parsoid/deploy
00:55 eileen: process-control config revision is 9bc3d67b02
00:46 eileen: process-control config revision is c38eaa20ed
00:35 eileen: civicrm revision changed from 6be8a130df to 1d5f6365ba, config revision is d8756a45c1

2021-01-05

23:13 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.25/includes/logging/LogPager.php: Check for the index name while it's being renamed (duration: 01m 06s)
22:26 reedy@deploy1001: Synchronized php-1.36.0-wmf.25/extensions/AbuseFilter/extension.json: T271266 (duration: 01m 04s)
21:48 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.36.0-wmf.22"
21:12 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
21:02 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
20:53 razzi@deploy1001: Finished deploy [analytics/aqs/deploy@5d05f83]: Configure http request timeout and caching for T268809 (duration: 04m 48s)
20:50 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.25 refs T267418
20:48 razzi@deploy1001: Started deploy [analytics/aqs/deploy@5d05f83]: Configure http request timeout and caching for T268809
20:44 razzi: deploy aqs (analytics query service) as part of analytics train
20:38 rzl: rzl@mw1362:~$ sudo -i /usr/local/sbin/restart-php7.2-fpm
20:28 mutante: repooled mw1362
20:20 mutante: mw1344 - /usr/local/sbin/restart-php7.2-fpm
20:04 mutante: mw1344 - restarted apache2 - it was showing the same "partial results" error a mw1362 - no other appservers are showing up in logstash, but these were #1 and #2 source of errors
19:47 mutante: depooled mw1362
19:41 mutante: mw1362 - restarted apache2
19:29 razzi@deploy1001: Finished deploy [analytics/refinery@56fb3ff] (thin): Regular analytics weekly train THIN [analytics/refinery@6ce68c950fc339dc3748cf50e6925cd1031287c4] (duration: 00m 08s)
19:29 razzi@deploy1001: Started deploy [analytics/refinery@56fb3ff] (thin): Regular analytics weekly train THIN [analytics/refinery@6ce68c950fc339dc3748cf50e6925cd1031287c4]
19:28 razzi@deploy1001: Finished deploy [analytics/refinery@56fb3ff]: Regular analytics weekly train [analytics/refinery@6ce68c950fc339dc3748cf50e6925cd1031287c4] (duration: 09m 37s)
19:19 razzi@deploy1001: Started deploy [analytics/refinery@56fb3ff]: Regular analytics weekly train [analytics/refinery@6ce68c950fc339dc3748cf50e6925cd1031287c4]
19:17 razzi: deploying refinery for weekly train
19:16 mutante: mwdebug1003 - editing apache2 defaults conf and dropping ServerAdmin address.restarting
18:59 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.25 refs T267418 (duration: 39m 07s)
18:22 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.25 refs T267418
18:21 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:18 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
18:13 elukey: run homer on cr1/cr2-eqiad to update the analytics-in4 filter (https://gerrit.wikimedia.org/r/c/operations/homer/public/+/654469)
18:08 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
17:10 longma: 1.36.0-wmf.25 was branched at 083fd09 for T267418
17:00 XioNoX: capture packets on pfw3-eqiad:reth0.1134 - T263833
15:50 jbond42: merging puppetlabs-lvm update
15:41 volans: upgraded wmflib to 0.0.6 on all hosts where it's installed - T257905
15:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
15:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
15:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1025.eqiad.wmnet with reason: REIMAGE
15:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1025.eqiad.wmnet with reason: REIMAGE
14:59 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove overrides from wgEventLoggingSchemas (duration: 00m 57s)
13:40 moritzm: installing python-apt security updates on buster/stretch
13:29 moritzm: installing xen security updates on buster
13:01 moritzm: installing lxml security updates for stretch
12:48 elukey: add PXE d-i rescue bootable image config for jessie/stretch/buster to tftp
12:43 jmm@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:29 jmm@cumin2001: START - Cookbook sre.dns.netbox
12:13 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on malmok.wikimedia.org with reason: rebooting for kernel update
12:13 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on malmok.wikimedia.org with reason: rebooting for kernel update
12:12 moritzm: installing p11-kit security updates on buster
12:01 marostegui: Restart db2121 T271106
11:53 moritzm: installing lxml security updates for buster
11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After cloning db1155:3312', diff saved to https://phabricator.wikimedia.org/P13656 and previous config saved to /var/cache/conftool/dbconfig/20210105-110246-root.json
10:56 jmm@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:49 jmm@cumin2001: START - Cookbook sre.dns.netbox
10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After cloning db1155:3312', diff saved to https://phabricator.wikimedia.org/P13655 and previous config saved to /var/cache/conftool/dbconfig/20210105-104742-root.json
10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After cloning db1155:3312', diff saved to https://phabricator.wikimedia.org/P13654 and previous config saved to /var/cache/conftool/dbconfig/20210105-103239-root.json
10:26 godog: swift codfw-prod: more weight to ms-be20[58-61] - T269337
10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After cloning db1155:3312', diff saved to https://phabricator.wikimedia.org/P13653 and previous config saved to /var/cache/conftool/dbconfig/20210105-101735-root.json
10:02 hnowlan: stopping stray cpjobqueue processes on scb hosts
09:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:39 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
09:21 ema: cp3054: upgrade varnish to 6.0.1-1wm1 T264398
08:56 moritzm: installing flac security updates
08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2140 after on-site maintenance', diff saved to https://phabricator.wikimedia.org/P13652 and previous config saved to /var/cache/conftool/dbconfig/20210105-084807-marostegui.json
08:32 elukey: reboot sretest1001 to test some new PXE rescue settings
08:30 marostegui: Restart db2127 T271106
08:27 hashar: Restarted CI Jenkins on contint2001
07:14 elukey: execute 'apt-get clean' on an-airflow1001 to recover disk space (root partition almost saturated)
06:41 marostegui: Stop MySQL on db1074 - this will generate lag on s2 on labs
06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to clone db1155:3312 T268742 ', diff saved to https://phabricator.wikimedia.org/P13647 and previous config saved to /var/cache/conftool/dbconfig/20210105-064026-marostegui.json
03:42 eileen: eoy receipts off to investigate issue ds has hit with Japanese names process-control config revision is d8756a45c1
02:55 legoktm@deploy1001: Synchronized php-1.36.0-wmf.22/extensions/AbuseFilter/: Rename maintenance/purgeOldLogIPData.php script (T271182) (duration: 00m 59s)
02:20 ryankemper: [wdqs deploy] Deploy completed without issue
01:51 ryankemper: [wdqs deploy] Restarting `wdqs-categories` across non-test wdqs nodes one at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
01:50 ryankemper: [wdqs deploy] Restarted categories across all wdqs test instances: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
01:50 ryankemper: [wdqs deploy] Restarted `wdqs-updater` across the whole fleet simultaneously: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
01:48 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@0432f8c]: 0.3.57 (duration: 08m 44s)
01:41 ryankemper: [wdqs deploy] Canary `wdqs1003` passing all tests following deploy, proceeding to rest of fleet
01:40 ryankemper@deploy1001: Started deploy [wdqs/wdqs@0432f8c]: 0.3.57
01:38 ryankemper: [wdqs deploy] Pre-deploy tests are all passing, proceeding with deploy shortly
01:20 jgleeson: updated process-control config revision to 276a8ff5b6
00:40 jgleeson: updated civicrm revision changed from bb8baac617 to 6be8a130df

2021-01-04

23:55 jgleeson: updated process-control config revision on c371242dbc
23:38 jgleeson: updated process-control config revision to 933ec73271
22:02 jgleeson: update process-control config revision to 400eae708d
21:35 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
21:14 razzi@cumin1001: START - Cookbook sre.hosts.decommission
20:48 ebernhardson: restart airflow-scheduler on an-airflow1001 to maybe resolve kerberos issue ('GSS initiate failed')
19:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 8058502: Add localised logos for the Madurese Wikipedia (T270693) (duration: 00m 54s)
19:39 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 9a5ec62: Enable abusefilter block at zh_yuewiki (T270567) (duration: 00m 54s)
19:36 urbanecm@deploy1001: Synchronized static/images/project-logos/: d5fa55a: Add localised logos for the Madurese Wikipedia (T270693) (duration: 00m 55s)
19:35 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 57f11b3: Enable abusefilter block at hrwiki (T270997) (duration: 00m 54s)
19:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9611519: Add wgImportSources for zhwikinews (T266388) (duration: 00m 56s)
19:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5bb1e32: ukwikisource: Search Archive NS by default (T270627) (duration: 00m 55s)
19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 19aa23f: mediawikiwiki: Enable PageImages on couple more namespaces (duration: 00m 55s)
19:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 88b9316: hrwiki: Restrict changetags permissions to sysop and bot group (T270996) (duration: 00m 55s)
19:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7fe2d56: hrwiki: Enable visual editor in the draft (Nacrt) namespace (T270688) (duration: 00m 55s)
19:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d8670c2: frwiktionary: Mark several namespaces as content namespaces (T270821) (duration: 00m 57s)
19:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 192fe58: ukwikisource: Delete Translation namespace (T270628) (duration: 00m 58s)
19:07 Urbanecm: mwscript namespaceDupes.php --wiki=ukwikisource --fix # T270627
19:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c54783b: ukwikisource: Add Archive namespace (T270627) (duration: 00m 57s)
19:03 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9af0b01: metawiki: Grant oathauth-view-log to stewards (duration: 00m 57s)
18:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:50 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
18:14 shdubsh: restart elasticsearch on logstash1012 - oom
16:35 jayme: import kubernetes 1.16.15-4 to component/kubernetes-future buster-wikimedia and stretch-wikimedia
16:32 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:25 pt1979@cumin2001: START - Cookbook sre.dns.netbox
15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 ', diff saved to https://phabricator.wikimedia.org/P13644 and previous config saved to /var/cache/conftool/dbconfig/20210104-153339-marostegui.json
15:33 marostegui: Depool db2140 T271084
14:50 ema: cp3058: ats-backend-restart T265625
14:34 marostegui: Upgrade and restart mysql on es2020 and es2024 - T271106
14:31 moritzm: installing openssl updates on buster-based DB hosts
14:15 moritzm: installing libdatetime-timezone-perl updates
14:13 marostegui: Restart mysql on pc2009
12:52 ema: deployment-cache-text06: try out varnish 6.0.1-1wm1 T264398
12:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant several OATHAuth-related permissions to wmf-supportsafety at Meta (T180896) (duration: 00m 56s)
11:59 volans: uploaded python3-wmflib_0.0.6 to apt.wikimedia.org buster-wikimedia
11:36 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
11:35 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 14s)
10:50 XioNoX: push pfw policies - T269958
10:19 _joe_: uploading docker-report 0.0.10 to debian buster
09:48 marostegui: Deploy schema change on s6 codfw master (lag will appear on codfw) - T270187
09:02 XioNoX: bounce asw-d-codfw:xe-7/0/8 - T271041

2021-01-03

16:17 arturo: merged change to TLS cert used by slapd/openldap servers https://gerrit.wikimedia.org/r/c/operations/puppet/+/653871
15:49 vgutierrez: reenable puppet on ldap-replica2004.wm.o
15:30 andrewbogott: disabling puppet fleet-wide to avert potential disaster from acme-chief cert rotation T271063
14:42 andrewbogott: restarting slapd on serpens and seaborgium
11:38 elukey: powercycle an-worker1114 (kernel errors in the serial console)
09:07 elukey: reboot ms-be2050 as attempt to recover/fix its broken networking state (started from Dec 30th) - T271041

2021-01-02

19:27 vgutierrez: restart acme-chief on acmechief1001

2021-01-01

14:49 milimetric@deploy1001: Finished deploy [analytics/refinery@f9281dd] (thin): [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent (duration: 00m 07s)
14:49 milimetric@deploy1001: Started deploy [analytics/refinery@f9281dd] (thin): [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent
14:48 milimetric@deploy1001: Finished deploy [analytics/refinery@f9281dd]: [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent (duration: 10m 00s)
14:38 milimetric@deploy1001: Started deploy [analytics/refinery@f9281dd]: [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent
08:52 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch fiwiki to their 500k temporary logo! (T270974) (duration: 00m 55s)
08:46 legoktm@deploy1001: Synchronized static/images/project-logos/fiwiki-500k-2x.png: Add fiwiki 500k temporary logos (3/3) (duration: 00m 55s)
08:45 legoktm@deploy1001: Synchronized static/images/project-logos/fiwiki-500k-1.5x.png: Add fiwiki 500k temporary logos (2/3) (duration: 00m 54s)
08:44 legoktm@deploy1001: Synchronized static/images/project-logos/fiwiki-500k.png: Add fiwiki 500k temporary logos (1/3) (duration: 00m 58s)

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s