Server Admin Log/Archive 56

2022-08-31

23:31 krinkle@deploy1002: Synchronized wmf-config/: I493b5e4662 (duration: 03m 43s)
23:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
23:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
23:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
23:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
23:17 krinkle@deploy1002: Synchronized private/: (no justification provided) (duration: 03m 42s)
23:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
23:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
23:13 krinkle@deploy1002: Synchronized wmf-config/: Ibdac0a (duration: 03m 44s)
23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
23:12 Krinkle: krinkle@deploy1002 Change /srv/mediawiki-staging/private to remove wmgElectronSecret
23:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
23:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
22:45 ejegg: payments-wiki upgraded from 80657b06 to 648842f9
22:38 eileen: config revision changed from 4331ef59 to d3696af7
22:00 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1030.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
21:59 mutante: etherpad (etherpad1003) - rebooting for maintenance
21:58 mutante: mw1383 start php7.2-fpm_check_restart.service
21:52 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1029.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
21:48 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1030.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
21:42 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1029.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
21:42 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
21:42 ebernhardson: run search index creation for pcmwiki
21:41 ebernhardson: run search index creation for bjnwiktionary
21:40 ebernhardson: run search index creation for guwwiktionary
21:32 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
21:30 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
21:30 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
21:30 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
21:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
21:00 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@94b160c]: drop_old_data: Add new required param --allowed-interval (duration: 02m 07s)
20:58 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@94b160c]: drop_old_data: Add new required param --allowed-interval
20:57 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
20:43 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
20:39 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
20:38 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
20:36 eileen: config revision changed from b1bd9422 to 4331ef59
20:31 denisse@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
m: rebooting netmon1003 for a kernel upgrade
20:23 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:08 bking@cumin1001: conftool action : get/pooled; selector: dnsdisc=wdqs,name=codfw
19:56 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
19:41 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
19:37 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
19:30 ryankemper: T316719 Rolling upgrade operation complete; all of elastic codfw is now on `7.10.2`. Next week our related cirrus changes will go out with the mediawiki deploy train in `1.39.0-wmf.28`
19:21 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
19:21 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P33725 and previous config saved to /var/cache/conftool/dbconfig/20220831-192120-ladsgroup.json
19:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
19:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
19:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
19:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T314041)', diff saved to https://phabricator.wikimedia.org/P33724 and previous config saved to /var/cache/conftool/dbconfig/20220831-192032-ladsgroup.json
19:18 dzahn@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gitlab2003.wikimedia.org with OS bullseye
19:16 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
19:15 mutante: gitlab: reimaging gitlab2003 with cookbook after reverting partman change and comment on gerrit:827578 T274463
19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P33723 and previous config saved to /var/cache/conftool/dbconfig/20220831-190526-ladsgroup.json
18:56 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P33722 and previous config saved to /var/cache/conftool/dbconfig/20220831-185020-ladsgroup.json
18:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T314041)', diff saved to https://phabricator.wikimedia.org/P33721 and previous config saved to /var/cache/conftool/dbconfig/20220831-183513-ladsgroup.json
18:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
18:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
18:27 dduvall@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.27 refs T314188 (duration: 03m 37s)
18:24 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.27 refs T314188
18:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
18:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
17:06 volans: installing spicerack 3.2.1 on cumin1001
16:52 volans: installing spicerack 3.2.1 on cumin2002
16:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet,service=varnish-fe
16:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet,service=ats-be
16:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet,service=ats-tls
16:00 volans: uploaded spicerack_3.2.1 to apt.wikimedia.org bullseye-wikimedia
15:57 _joe_: updated php 7.4 in all of production T316691
15:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
15:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti2015.codfw.wmnet with reason: Remove node for eventual reimage, T311686
15:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti2015.codfw.wmnet with reason: Remove node for eventual reimage, T311686
15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
15:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
15:06 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6016.drmrs.wmnet,service=varnish-fe
15:06 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6016.drmrs.wmnet,service=ats-be
15:06 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6016.drmrs.wmnet,service=ats-tls
15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet,service=varnish-fe
15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet,service=ats-be
15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet,service=ats-tls
15:01 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-serve-ctrl2001.codfw.wmnet
15:01 klausman@cumin1001: START - Cookbook sre.hosts.remove-downtime for ml-serve-ctrl2001.codfw.wmnet
15:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS buster
14:56 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-serve-ctrl2001.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
14:56 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-serve-ctrl2001.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
14:56 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-serve-ctrl2002.codfw.wmnet
14:56 klausman@cumin1001: START - Cookbook sre.hosts.remove-downtime for ml-serve-ctrl2002.codfw.wmnet
14:50 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-serve-ctrl2002.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
14:50 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-serve-ctrl2002.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
14:49 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
14:48 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
14:47 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
14:47 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
14:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
14:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
14:42 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
14:42 jayme@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
14:41 klausman@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
14:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
14:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
14:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS buster
14:11 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6008.drmrs.wmnet,service=varnish-fe
14:11 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6008.drmrs.wmnet,service=ats-be
14:11 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6008.drmrs.wmnet,service=ats-tls
14:09 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp6008.drmrs.wmnet with OS buster
14:08 vgutierrez: deploy trafficserver: Hide non session cookies during cache lookup globally - T316338
14:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS buster
13:49 Lucas_WMDE: UTC afternoon backport+config window done
13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikidata.php: Config: Remove unused assignments from SearchSettingsForWikibase.php (2/2) (duration: 03m 33s)
13:43 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikibase.php: Config: Remove unused assignments from SearchSettingsForWikibase.php (1/2) (duration: 03m 38s)
13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikidata.php: Config: Directly set WikibaseCirrusSearch settings in IS.php (3/3) (duration: 03m 42s)
13:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
13:31 moritzm: restarting exim on the MXes to pick up zlib update
13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Directly set WikibaseCirrusSearch settings in IS.php (2/3) (duration: 03m 39s)
13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:26 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Directly set WikibaseCirrusSearch settings in IS.php (1/3) (duration: 03m 47s)
13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikibase.php: Config: Only set WikibaseCirrusSearch settings if wmg globals are set (duration: 03m 42s)
13:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
13:13 moritzm: installing zlib security updates on bullseye
13:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:11 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: InitialiseSettings.php: Enable Realtime Preview on Group 2 (T314828) (duration: 03m 54s)
13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:07 klausman@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
12:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
12:57 vgutierrez: test trafficserver: Hide non session cookies during cache lookup in drmrs - T316338
12:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
12:39 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-staging-ctrl2002.codfw.wmnet
12:39 klausman@cumin1001: START - Cookbook sre.hosts.remove-downtime for ml-staging-ctrl2002.codfw.wmnet
12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
12:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
12:34 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-staging-ctrl2002.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
12:34 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-staging-ctrl2002.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
12:33 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-staging-ctrl2001.codfw.wmnet
12:33 klausman@cumin1001: START - Cookbook sre.hosts.remove-downtime for ml-staging-ctrl2001.codfw.wmnet
12:28 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-staging-ctrl2001.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
12:28 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-staging-ctrl2001.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
12:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
12:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
12:17 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
12:16 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
12:06 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
11:58 marostegui: Reboot sanitarium hosts, lag will appear on clouddb* hosts
11:49 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host webperf1004.eqiad.wmnet
11:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1004.eqiad.wmnet
11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
11:27 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
11:27 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
11:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
11:22 moritzm: draining ganeti2015 for eventual reimage T311686
11:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
11:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
11:04 vgutierrez: test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338
11:00 _joe_: updating php 7.4 packages in wikimedia/bustrer T316601
10:42 _joe_: updating php 7.4 on mwdebug1002 to test the new patched packages T316601
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33715 and previous config saved to /var/cache/conftool/dbconfig/20220831-100853-root.json
10:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1002.eqiad.wmnet with OS buster
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33714 and previous config saved to /var/cache/conftool/dbconfig/20220831-095348-root.json
09:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
09:44 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
09:44 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33713 and previous config saved to /var/cache/conftool/dbconfig/20220831-093844-root.json
09:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
09:34 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe2001.codfw.wmnet
09:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1002.eqiad.wmnet with reason: host reimage
09:29 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1002.eqiad.wmnet with reason: host reimage
09:27 moritzm: installing docker.io bugfix updates from Bullseye point release
09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33712 and previous config saved to /var/cache/conftool/dbconfig/20220831-092339-root.json
09:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
09:17 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host parse1002.eqiad.wmnet with OS buster
09:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
09:11 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33711 and previous config saved to /var/cache/conftool/dbconfig/20220831-090834-root.json
08:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33710 and previous config saved to /var/cache/conftool/dbconfig/20220831-085329-root.json
08:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
08:43 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
08:39 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33709 and previous config saved to /var/cache/conftool/dbconfig/20220831-083824-root.json
08:32 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
08:28 moritzm: upgrading ganeti2016/ganeti2018 to 3.0.2 T312637
08:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 24 hosts with reason: Downtiming php7.4 parsoid servers until they are ready to pool
08:27 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 24 hosts with reason: Downtiming php7.4 parsoid servers until they are ready to pool
08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33708 and previous config saved to /var/cache/conftool/dbconfig/20220831-082319-root.json
08:20 vgutierrez: end test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338
08:12 vgutierrez: test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338
08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33707 and previous config saved to /var/cache/conftool/dbconfig/20220831-080815-root.json
07:54 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus2006.codfw.wmnet
07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33706 and previous config saved to /var/cache/conftool/dbconfig/20220831-075310-root.json
07:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2022.codfw.wmnet to cluster codfw and group B
07:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
07:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2022.codfw.wmnet to cluster codfw and group B
07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 for upgrade', diff saved to https://phabricator.wikimedia.org/P33705 and previous config saved to /var/cache/conftool/dbconfig/20220831-074748-root.json
07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
07:40 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
07:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
07:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
07:15 godog: bounce thanos-compact on thanos-fe2001
05:00 marostegui: Failover m3 from db1183 to db1159 - T316506
04:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1195].eqiad.wmnet with reason: switchover m1 T316506
04:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1195].eqiad.wmnet with reason: switchover m1 T316506
03:23 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
03:23 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
03:17 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
02:50 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
02:49 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
00:15 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
00:14 ryankemper: T316719 First elastic host upgraded properly. Cancelling cookbook to kick off a new rolling upgrade that will go 3 nodes at a time (first run was just one host as a sanity check)
00:14 ryankemper@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
00:08 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719

2022-08-30

23:55 ryankemper: T316719 Merged https://phabricator.wikimedia.org/T316719; running puppet across codfw fleet: `ryankemper@cumin2002:~$ sudo -E cumin -b 6 'A:elastic-codfw' 'run-puppet-agent'`
23:50 ryankemper@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
23:50 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
22:02 eileen: civicrm upgraded from a31c7590 to 76308ffb
21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T314041)', diff saved to https://phabricator.wikimedia.org/P33703 and previous config saved to /var/cache/conftool/dbconfig/20220830-210218-ladsgroup.json
21:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
21:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
20:43 ryankemper@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
20:43 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:11 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 03m 43s)
19:45 ryankemper: [Relforge] `ryankemper@cumin1001:~$ sudo -E cumin '*relforge*' 'run-puppet-agent --force'`
18:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
18:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
18:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
18:15 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.27 refs T314188
17:46 joal@deploy1002: Finished deploy [analytics/refinery@aa8f88f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aa8f88f] (duration: 08m 19s)
17:38 joal@deploy1002: Started deploy [analytics/refinery@aa8f88f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aa8f88f]
17:38 joal@deploy1002: Finished deploy [analytics/refinery@aa8f88f] (thin): Regular analytics weekly train THIN [analytics/refinery@aa8f88f] (duration: 00m 08s)
17:37 joal@deploy1002: Started deploy [analytics/refinery@aa8f88f] (thin): Regular analytics weekly train THIN [analytics/refinery@aa8f88f]
17:37 joal@deploy1002: Finished deploy [analytics/refinery@aa8f88f]: Regular analytics weekly train [analytics/refinery@aa8f88f] (duration: 26m 10s)
17:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
17:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
17:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
17:12 moritzm: installing logrotate security updates on Bullseye
17:11 joal@deploy1002: Started deploy [analytics/refinery@aa8f88f]: Regular analytics weekly train [analytics/refinery@aa8f88f]
17:08 dduvall@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.27 refs T314188 (duration: 39m 07s)
17:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2022.codfw.wmnet with OS bullseye
17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
17:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
16:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
16:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
16:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
16:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
16:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2022.codfw.wmnet with reason: host reimage
16:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
16:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2022.codfw.wmnet with reason: host reimage
16:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
16:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T316186)', diff saved to https://phabricator.wikimedia.org/P33701 and previous config saved to /var/cache/conftool/dbconfig/20220830-163619-ladsgroup.json
16:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
16:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
16:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
16:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2022.codfw.wmnet with OS bullseye
16:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1010.eqiad.wmnet
16:29 dduvall@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.27 refs T314188
16:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host stat1010.eqiad.wmnet
16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P33700 and previous config saved to /var/cache/conftool/dbconfig/20220830-162113-ladsgroup.json
16:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1005.eqiad.wmnet
16:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1005.eqiad.wmnet
16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P33699 and previous config saved to /var/cache/conftool/dbconfig/20220830-160607-ladsgroup.json
15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T316186)', diff saved to https://phabricator.wikimedia.org/P33698 and previous config saved to /var/cache/conftool/dbconfig/20220830-155101-ladsgroup.json
15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T316186)', diff saved to https://phabricator.wikimedia.org/P33697 and previous config saved to /var/cache/conftool/dbconfig/20220830-154337-ladsgroup.json
15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T316186)', diff saved to https://phabricator.wikimedia.org/P33696 and previous config saved to /var/cache/conftool/dbconfig/20220830-154314-ladsgroup.json
15:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
15:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
15:32 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
15:32 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P33695 and previous config saved to /var/cache/conftool/dbconfig/20220830-152807-ladsgroup.json
15:25 vgutierrez: restarting ats in cp6008
15:25 vgutierrez: restarting ats in cp6007
15:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1009.eqiad.wmnet
15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti2022.codfw.wmnet with reason: Remove node for eventual reimage, T311686
15:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti2022.codfw.wmnet with reason: Remove node for eventual reimage, T311686
15:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host stat1009.eqiad.wmnet
15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P33694 and previous config saved to /var/cache/conftool/dbconfig/20220830-151301-ladsgroup.json
15:10 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
15:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
15:09 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
15:07 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
15:06 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
15:06 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:05 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T316186)', diff saved to https://phabricator.wikimedia.org/P33693 and previous config saved to /var/cache/conftool/dbconfig/20220830-145755-ladsgroup.json
14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T316186)', diff saved to https://phabricator.wikimedia.org/P33691 and previous config saved to /var/cache/conftool/dbconfig/20220830-145035-ladsgroup.json
14:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
14:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T316186)', diff saved to https://phabricator.wikimedia.org/P33690 and previous config saved to /var/cache/conftool/dbconfig/20220830-145011-ladsgroup.json
14:45 cmooney@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1439.eqiad.wmnet
14:45 cmooney@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1437.eqiad.wmnet
14:39 cmooney@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1440.eqiad.wmnet
14:39 cmooney@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1439.eqiad.wmnet
14:39 cmooney@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1437.eqiad.wmnet
14:38 topranks: de-pooling mw1437/mw1439/mw1440 from jobrunner cluster as those hosts are busy running videoscaler tasks
14:35 herron@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1439.eqiad.wmnet
14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P33689 and previous config saved to /var/cache/conftool/dbconfig/20220830-143505-ladsgroup.json
14:33 herron@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1437.eqiad.wmnet
14:28 herron@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1438.eqiad.wmnet
14:27 herron@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1438
14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P33688 and previous config saved to /var/cache/conftool/dbconfig/20220830-141959-ladsgroup.json
14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T316186)', diff saved to https://phabricator.wikimedia.org/P33687 and previous config saved to /var/cache/conftool/dbconfig/20220830-140452-ladsgroup.json
13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T316186)', diff saved to https://phabricator.wikimedia.org/P33686 and previous config saved to /var/cache/conftool/dbconfig/20220830-135733-ladsgroup.json
13:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
13:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T316186)', diff saved to https://phabricator.wikimedia.org/P33685 and previous config saved to /var/cache/conftool/dbconfig/20220830-135658-ladsgroup.json
13:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:52 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/Translate/src/PageTranslation/RenderTranslationPageJob.php: 75d8e6c: RenderTranslationPageJob: Add patrol status for translation page (T315708) (duration: 03m 59s)
13:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P33684 and previous config saved to /var/cache/conftool/dbconfig/20220830-134152-ladsgroup.json
13:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 25500e5: Make DiscussionTools topicsubscription, autotopicsub opt-out on all wikis (T315714) (duration: 03m 56s)
13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ea3228b: Enable reply tool by default on fiwiki (T297533) (duration: 04m 01s)
13:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:28 vgutierrez: Increase roll-out of query-sorting to 100% - T314868
13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P33683 and previous config saved to /var/cache/conftool/dbconfig/20220830-132646-ladsgroup.json
13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:21 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: testwiki: Fix language code for Bhojpuri (T313296) (duration: 03m 53s)
13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:12 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SectionTranslation on 10 more WPs where ContentTranslation is default (T313300) (duration: 03m 56s)
13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T316186)', diff saved to https://phabricator.wikimedia.org/P33682 and previous config saved to /var/cache/conftool/dbconfig/20220830-131140-ladsgroup.json
13:11 moritzm: installing libxslt security updates for stretch
13:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T316186)', diff saved to https://phabricator.wikimedia.org/P33681 and previous config saved to /var/cache/conftool/dbconfig/20220830-130521-ladsgroup.json
13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T316186)', diff saved to https://phabricator.wikimedia.org/P33680 and previous config saved to /var/cache/conftool/dbconfig/20220830-130457-ladsgroup.json
12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P33679 and previous config saved to /var/cache/conftool/dbconfig/20220830-124951-ladsgroup.json
12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P33678 and previous config saved to /var/cache/conftool/dbconfig/20220830-123445-ladsgroup.json
12:31 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1004.eqiad.wmnet
12:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
12:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
12:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
12:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
12:24 btullis: committing updated switch configuration https://gerrit.wikimedia.org/r/c/operations/homer/public/+/827979
12:20 godog: rollback and reboot graphite1004 with linux-image-5.10.0-16-amd64
12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T316186)', diff saved to https://phabricator.wikimedia.org/P33677 and previous config saved to /var/cache/conftool/dbconfig/20220830-121938-ladsgroup.json
12:19 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T316186)', diff saved to https://phabricator.wikimedia.org/P33676 and previous config saved to /var/cache/conftool/dbconfig/20220830-121421-ladsgroup.json
12:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
12:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33675 and previous config saved to /var/cache/conftool/dbconfig/20220830-121357-ladsgroup.json
12:04 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1004.eqiad.wmnet
12:01 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P33674 and previous config saved to /var/cache/conftool/dbconfig/20220830-115851-ladsgroup.json
11:53 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:52 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
11:52 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P33673 and previous config saved to /var/cache/conftool/dbconfig/20220830-114345-ladsgroup.json
11:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:36 moritzm: uploaded libxslt 1.1.29-2.1+deb9u2+wmf1 to apt.wikimedia.org
11:32 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host idp1002.wikimedia.org
11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33672 and previous config saved to /var/cache/conftool/dbconfig/20220830-112838-ladsgroup.json
11:24 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp1002.wikimedia.org
11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33671 and previous config saved to /var/cache/conftool/dbconfig/20220830-112117-ladsgroup.json
11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T316186)', diff saved to https://phabricator.wikimedia.org/P33670 and previous config saved to /var/cache/conftool/dbconfig/20220830-112048-ladsgroup.json
11:07 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P33669 and previous config saved to /var/cache/conftool/dbconfig/20220830-110542-ladsgroup.json
11:04 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P33668 and previous config saved to /var/cache/conftool/dbconfig/20220830-105036-ladsgroup.json
10:50 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling db2096 after maint', diff saved to https://phabricator.wikimedia.org/P33667 and previous config saved to /var/cache/conftool/dbconfig/20220830-104616-ladsgroup.json
10:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
10:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
10:39 btullis: committing updated switch configuration https://gerrit.wikimedia.org/r/c/operations/homer/public/+/826579
10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T316186)', diff saved to https://phabricator.wikimedia.org/P33666 and previous config saved to /var/cache/conftool/dbconfig/20220830-103530-ladsgroup.json
10:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T316186)', diff saved to https://phabricator.wikimedia.org/P33665 and previous config saved to /var/cache/conftool/dbconfig/20220830-103012-ladsgroup.json
10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling db1167 (T316186)', diff saved to https://phabricator.wikimedia.org/P33664 and previous config saved to /var/cache/conftool/dbconfig/20220830-102342-ladsgroup.json
10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T316186)', diff saved to https://phabricator.wikimedia.org/P33663 and previous config saved to /var/cache/conftool/dbconfig/20220830-102220-ladsgroup.json
10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
10:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
10:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
10:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
10:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch instance to plain disks, T311686
10:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch instance to plain disks, T311686
10:11 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
10:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
10:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
10:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
10:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
10:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
10:03 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1011 to pc1 master (duration: 03m 44s)
10:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
10:02 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
10:01 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
09:55 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
09:53 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@ff76338]: Add sd-alerts notifications to image_suggestions_weekly (duration: 02m 05s)
09:53 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host centrallog2002.codfw.wmnet
09:53 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
09:51 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@ff76338]: Add sd-alerts notifications to image_suggestions_weekly
09:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch instance to DRBD, T311686
09:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch instance to DRBD, T311686
09:41 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to old templatelinks fields in s6 (T312865) (duration: 03m 57s)
09:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:34 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc1 master (duration: 03m 50s)
09:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:31 moritzm: draining ganeti2022 for eventual reimage T311686
09:18 moritzm: installing perf updates on Bullseye hosts
09:12 moritzm: upgrading ganeti2027,ganeti2028 to 3.0.2 T312637
09:07 jynus: restart dbprov* hosts
08:58 _joe_: powercycling parse1002, blank console
08:58 moritzm: upgrading ganeti2010,ganeti2012,ganeti2024 to 3.0.2 T312637
08:53 moritzm: failover Ganeti master in codfw to ganeti2020 T311686
08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to current x1 codfw master', diff saved to https://phabricator.wikimedia.org/P33661 and previous config saved to /var/cache/conftool/dbconfig/20220830-084945-root.json
08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2096.codfw.wmnet with reason: Maintenance
08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2096.codfw.wmnet with reason: Maintenance
08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096 T316522', diff saved to https://phabricator.wikimedia.org/P33660 and previous config saved to /var/cache/conftool/dbconfig/20220830-083845-root.json
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2115 to x1 codfw primary T316522', diff saved to https://phabricator.wikimedia.org/P33659 and previous config saved to /var/cache/conftool/dbconfig/20220830-083654-root.json
08:36 marostegui: Starting x1 codfw failover from db2096 to db2115 - T316522
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2115 with weight 0 T316522', diff saved to https://phabricator.wikimedia.org/P33658 and previous config saved to /var/cache/conftool/dbconfig/20220830-083103-root.json
08:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: DC switchover x1 T316522
08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: DC switchover x1 T316522
08:24 vgutierrez: ATS: enforce per-request timeout globally (205 secs) - T315533
07:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:17 tgr: UTC morning deploy window done
07:17 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Declare mediawiki.accountcreation_block stream (T306018) (duration: 04m 11s)
07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
06:53 _joe_: running scap pull on parse1* T316611
06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T316186)', diff saved to https://phabricator.wikimedia.org/P33657 and previous config saved to /var/cache/conftool/dbconfig/20220830-063332-ladsgroup.json
06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T316186)', diff saved to https://phabricator.wikimedia.org/P33656 and previous config saved to /var/cache/conftool/dbconfig/20220830-062613-ladsgroup.json
06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
06:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T316186)', diff saved to https://phabricator.wikimedia.org/P33655 and previous config saved to /var/cache/conftool/dbconfig/20220830-062547-ladsgroup.json
06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T316186)', diff saved to https://phabricator.wikimedia.org/P33654 and previous config saved to /var/cache/conftool/dbconfig/20220830-061926-ladsgroup.json
06:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
06:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T316186)', diff saved to https://phabricator.wikimedia.org/P33653 and previous config saved to /var/cache/conftool/dbconfig/20220830-061901-ladsgroup.json
06:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1173.eqiad.wmnet with reason: Maintenance
06:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1173.eqiad.wmnet with reason: Maintenance
06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T316186)', diff saved to https://phabricator.wikimedia.org/P33652 and previous config saved to /var/cache/conftool/dbconfig/20220830-061243-ladsgroup.json
06:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
06:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T316186)', diff saved to https://phabricator.wikimedia.org/P33651 and previous config saved to /var/cache/conftool/dbconfig/20220830-061218-ladsgroup.json
06:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
06:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T316186)', diff saved to https://phabricator.wikimedia.org/P33650 and previous config saved to /var/cache/conftool/dbconfig/20220830-060554-ladsgroup.json
06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1173 T316110 T312984 T312863 T316186', diff saved to https://phabricator.wikimedia.org/P33649 and previous config saved to /var/cache/conftool/dbconfig/20220830-060543-ladsgroup.json
06:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
06:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T316186)', diff saved to https://phabricator.wikimedia.org/P33648 and previous config saved to /var/cache/conftool/dbconfig/20220830-060509-ladsgroup.json
06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1131 to s6 primary and set section read-write T316110', diff saved to https://phabricator.wikimedia.org/P33647 and previous config saved to /var/cache/conftool/dbconfig/20220830-060109-ladsgroup.json
06:00 Amir1: Starting s6 eqiad failover from db1173 to db1131 - T316110
05:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T316186)', diff saved to https://phabricator.wikimedia.org/P33645 and previous config saved to /var/cache/conftool/dbconfig/20220830-055948-ladsgroup.json
05:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
05:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
05:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
05:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T316186)', diff saved to https://phabricator.wikimedia.org/P33644 and previous config saved to /var/cache/conftool/dbconfig/20220830-055555-ladsgroup.json
05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T316186)', diff saved to https://phabricator.wikimedia.org/P33643 and previous config saved to /var/cache/conftool/dbconfig/20220830-054924-ladsgroup.json
05:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
05:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T316186)', diff saved to https://phabricator.wikimedia.org/P33642 and previous config saved to /var/cache/conftool/dbconfig/20220830-054859-ladsgroup.json
05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T316186)', diff saved to https://phabricator.wikimedia.org/P33641 and previous config saved to /var/cache/conftool/dbconfig/20220830-054242-ladsgroup.json
05:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
05:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T316186)', diff saved to https://phabricator.wikimedia.org/P33640 and previous config saved to /var/cache/conftool/dbconfig/20220830-054217-ladsgroup.json
05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T316186)', diff saved to https://phabricator.wikimedia.org/P33639 and previous config saved to /var/cache/conftool/dbconfig/20220830-053559-ladsgroup.json
05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T316186)', diff saved to https://phabricator.wikimedia.org/P33638 and previous config saved to /var/cache/conftool/dbconfig/20220830-053529-ladsgroup.json
05:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
05:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T316186)', diff saved to https://phabricator.wikimedia.org/P33637 and previous config saved to /var/cache/conftool/dbconfig/20220830-052930-ladsgroup.json
05:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T316110', diff saved to https://phabricator.wikimedia.org/P33636 and previous config saved to /var/cache/conftool/dbconfig/20220830-051106-ladsgroup.json
05:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s6 T316110
05:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s6 T316110
05:03 ryankemper: T306899 T316496 Deployed WCQS `0.3.115`. That should (hopefully) resolve these tickets.
05:01 ryankemper: [WCQS Deploy] Restarted `wcqs-updater` across all hosts: `sudo -E cumin 'A:wcqs-public' 'systemctl restart wcqs-updater'`
05:00 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2d34f5c] (wcqs): Deploy 0.3.115 to WCQS (duration: 02m 00s)
04:58 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2d34f5c] (wcqs): Deploy 0.3.115 to WCQS
04:58 ryankemper: [WCQS Deploy] Gearing up for deploy of wcqs `0.3.115`
04:58 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
04:58 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
04:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
04:45 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2d34f5c]: 0.3.115 (duration: 09m 01s)
04:37 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.115` on canary `wdqs1003`; proceeding to rest of fleet
04:36 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2d34f5c]: 0.3.115
04:35 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.115`. Pre-deploy tests passing on canary `wdqs1003`
03:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
03:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
02:29 ejegg: payments-wiki upgraded from dc6d899d to 80657b06
02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
02:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
01:04 TimStarling: setting scaling_governor=performance on all mediawiki servers, via puppet gerrit 826405
00:12 aaron@deploy1002: Finished deploy [performance/arc-lamp@40cb764]: T315056 (duration: 00m 07s)
00:12 aaron@deploy1002: Started deploy [performance/arc-lamp@40cb764]: T315056

2022-08-29

23:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
23:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
23:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
23:40 krinkle@deploy1002: Synchronized wmf-config/: I9f17d80d9d91 (duration: 03m 53s)
23:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
23:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
23:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
23:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
23:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
23:32 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I15a334 (duration: 03m 42s)
23:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
23:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
23:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
23:24 krinkle@deploy1002: Synchronized wmf-config/: I5e0e5a (duration: 03m 27s)
23:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
23:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
23:13 krinkle@deploy1002: Synchronized wmf-config/: Id9707d (duration: 03m 48s)
23:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
22:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
22:41 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@57fb704]: force re-deploy HEAD to attempt to get artifacts directory populated on an-airflow1001 (duration: 02m 01s)
22:40 tgr: UTC late backport window done
22:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
22:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
22:39 tgr@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/GrowthExperiments/extension.json: Backport: Fix WelcomeSurvey CentralAuthPostLoginRedirect hook (step 2) (duration: 03m 53s)
22:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
22:39 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@57fb704]: force re-deploy HEAD to attempt to get artifacts directory populated on an-airflow1001
22:16 ejegg: payments-wiki upgraded from a63b300e to dc6d899d
22:13 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@57fb704]: re-deploy HEAD to attempt to get artifacts directory populated on an-airflow1001 (duration: 00m 04s)
22:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@57fb704]: re-deploy HEAD to attempt to get artifacts directory populated on an-airflow1001
21:53 tgr@deploy1002: Synchronized static/images/project-logos: Config: Adjust width-height ratio of logos for bewikisource, euwikisource, cswikisource to fix display issue (T310961) (duration: 03m 59s)
21:48 tgr@deploy1002: Synchronized wmf-config/logos.php: Config: Adjust width-height ratio of logos for bewikisource, euwikisource, cswikisource to fix display issue (T310961) (duration: 03m 34s)
21:44 tgr@deploy1002: Synchronized logos/config.yaml: Config: Adjust width-height ratio of logos for bewikisource, euwikisource, cswikisource to fix display issue (T310961) (duration: 03m 45s)
21:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:08 cjming@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/GrowthExperiments/tests/selenium/specs/homepage.js: Backport: Temporarily disable change tag test (T316596) (duration: 03m 49s)
21:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:01 cjming@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/GrowthExperiments/includes/WelcomeSurveyHooks.php: Backport: Fix WelcomeSurvey CentralAuthPostLoginRedirect hook (step 1) (T315583 T316311) (duration: 03m 36s)
20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:53 cjming@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/ConfirmEdit/includes/Auth/CaptchaAuthenticationRequest.php: Backport: Restore auth request ID from before namespacing (T316410) (duration: 03m 45s)
20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:42 cjming@deploy1002: Synchronized php-1.39.0-wmf.26/skins/Vector: Backport: Fix site notice spacing (T315595) (duration: 03m 46s)
20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:31 cjming@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/DiscussionTools/maintenance: Backport: Fix boilerplate in maintenance scripts for WMF production (T316548) (duration: 03m 41s)
20:27 cjming@deploy1002: sync-file aborted: Backport: Fix boilerplate in maintenance scripts for WMF production (T316548) (duration: 00m 05s)
20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:14 bblack: Revert of cookie-related changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/827566/ pushing to all cp-text
20:14 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@57fb704]: Deploy mjolnir 1.1 for elasticsearch 7.x compatability (duration: 00m 24s)
20:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@57fb704]: Deploy mjolnir 1.1 for elasticsearch 7.x compatability
20:08 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Enable new Vector skin on select pages (take 2)" (T309973) (duration: 03m 34s)
20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:04 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@57fb704]: Deploy mjolnir 1.1 for elasticsearch 7.x compatability (duration: 00m 11s)
20:04 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@57fb704]: Deploy mjolnir 1.1 for elasticsearch 7.x compatability
19:34 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@5c0af35]: Update to work with elasticsearch 7.x (duration: 00m 54s)
19:33 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@5c0af35]: Update to work with elasticsearch 7.x
19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P33634 and previous config saved to /var/cache/conftool/dbconfig/20220829-192608-ladsgroup.json
19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P33633 and previous config saved to /var/cache/conftool/dbconfig/20220829-191950-ladsgroup.json
19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
19:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T316186)', diff saved to https://phabricator.wikimedia.org/P33632 and previous config saved to /var/cache/conftool/dbconfig/20220829-190444-ladsgroup.json
19:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
18:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
18:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T316186)', diff saved to https://phabricator.wikimedia.org/P33631 and previous config saved to /var/cache/conftool/dbconfig/20220829-185723-ladsgroup.json
18:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T316186)', diff saved to https://phabricator.wikimedia.org/P33630 and previous config saved to /var/cache/conftool/dbconfig/20220829-185659-ladsgroup.json
18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P33629 and previous config saved to /var/cache/conftool/dbconfig/20220829-184153-ladsgroup.json
18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P33628 and previous config saved to /var/cache/conftool/dbconfig/20220829-182646-ladsgroup.json
18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T316186)', diff saved to https://phabricator.wikimedia.org/P33627 and previous config saved to /var/cache/conftool/dbconfig/20220829-181140-ladsgroup.json
18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T316186)', diff saved to https://phabricator.wikimedia.org/P33626 and previous config saved to /var/cache/conftool/dbconfig/20220829-180421-ladsgroup.json
18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T316186)', diff saved to https://phabricator.wikimedia.org/P33625 and previous config saved to /var/cache/conftool/dbconfig/20220829-180358-ladsgroup.json
17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33624 and previous config saved to /var/cache/conftool/dbconfig/20220829-174851-ladsgroup.json
17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33623 and previous config saved to /var/cache/conftool/dbconfig/20220829-173345-ladsgroup.json
17:25 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/SecurePoll/includes/Pages/VoterEligibilityPage.php: 2d6c378: Add missing comma (T316150) (duration: 03m 47s)
17:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T316186)', diff saved to https://phabricator.wikimedia.org/P33622 and previous config saved to /var/cache/conftool/dbconfig/20220829-171839-ladsgroup.json
17:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
17:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T316186)', diff saved to https://phabricator.wikimedia.org/P33621 and previous config saved to /var/cache/conftool/dbconfig/20220829-171116-ladsgroup.json
17:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
17:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
17:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
17:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33620 and previous config saved to /var/cache/conftool/dbconfig/20220829-171035-ladsgroup.json
17:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
17:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
17:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
17:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
17:03 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on restbase[1031-1033].eqiad.wmnet with reason: New hosts - awaiting cassandra joins
17:03 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on restbase[1031-1033].eqiad.wmnet with reason: New hosts - awaiting cassandra joins
17:02 krinkle@deploy1002: Synchronized wmf-config/: I1f79f21cbf8 (duration: 03m 42s)
16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P33619 and previous config saved to /var/cache/conftool/dbconfig/20220829-165529-ladsgroup.json
16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P33618 and previous config saved to /var/cache/conftool/dbconfig/20220829-164022-ladsgroup.json
16:38 krinkle@deploy1002: Synchronized wmf-config/: I23c221 (duration: 03m 57s)
16:34 krinkle@deploy1002: sync-file aborted: (no justification provided) (duration: 00m 01s)
16:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
16:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
16:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33617 and previous config saved to /var/cache/conftool/dbconfig/20220829-162516-ladsgroup.json
16:24 claime: repooled wtp1034.eqiad.wmnet and depooled parse1001.eqiad.wmnet
16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33616 and previous config saved to /var/cache/conftool/dbconfig/20220829-161959-ladsgroup.json
16:12 claime: depooled wtp1034.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
16:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
16:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
16:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
16:08 claime: pooled parse1001.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
16:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1033.eqiad.wmnet with OS buster
16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P33615 and previous config saved to /var/cache/conftool/dbconfig/20220829-160452-ladsgroup.json
16:02 cgoubert@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=parsoid,name=parse1001.eqiad.wmnet
16:02 cgoubert@puppetmaster1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1001.eqiad.wmnet
15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P33614 and previous config saved to /var/cache/conftool/dbconfig/20220829-154946-ladsgroup.json
15:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
15:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33613 and previous config saved to /var/cache/conftool/dbconfig/20220829-153440-ladsgroup.json
15:31 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33612 and previous config saved to /var/cache/conftool/dbconfig/20220829-152741-ladsgroup.json
15:27 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33611 and previous config saved to /var/cache/conftool/dbconfig/20220829-152612-ladsgroup.json
15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
15:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T316186)', diff saved to https://phabricator.wikimedia.org/P33610 and previous config saved to /var/cache/conftool/dbconfig/20220829-152549-ladsgroup.json
15:14 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1033.eqiad.wmnet with OS buster
15:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1032.eqiad.wmnet with OS buster
15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P33609 and previous config saved to /var/cache/conftool/dbconfig/20220829-151042-ladsgroup.json
14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P33608 and previous config saved to /var/cache/conftool/dbconfig/20220829-145536-ladsgroup.json
14:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
14:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on restbase1031.eqiad.wmnet with reason: New host
14:41 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on restbase1031.eqiad.wmnet with reason: New host
14:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T316186)', diff saved to https://phabricator.wikimedia.org/P33607 and previous config saved to /var/cache/conftool/dbconfig/20220829-144030-ladsgroup.json
14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T316186)', diff saved to https://phabricator.wikimedia.org/P33606 and previous config saved to /var/cache/conftool/dbconfig/20220829-143319-ladsgroup.json
14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T316186)', diff saved to https://phabricator.wikimedia.org/P33605 and previous config saved to /var/cache/conftool/dbconfig/20220829-143255-ladsgroup.json
14:28 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS buster
14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P33604 and previous config saved to /var/cache/conftool/dbconfig/20220829-141749-ladsgroup.json
14:06 Lucas_WMDE: UTC afternoon backport+config window done
14:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/: Config: Remove unused SearchSettingsForSDC.php (2/2, no-op; syncing deleted file requires syncing entire directory AFAICT) (duration: 03m 37s)
14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P33603 and previous config saved to /var/cache/conftool/dbconfig/20220829-140243-ladsgroup.json
13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikibase.php: Config: Remove unused SearchSettingsForSDC.php (1/2, no-op) (duration: 03m 32s)
13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T316186)', diff saved to https://phabricator.wikimedia.org/P33602 and previous config saved to /var/cache/conftool/dbconfig/20220829-134736-ladsgroup.json
13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T316186)', diff saved to https://phabricator.wikimedia.org/P33601 and previous config saved to /var/cache/conftool/dbconfig/20220829-134014-ladsgroup.json
13:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
13:33 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable wgDiscussionToolsEnablePermalinksBackend on testwiki (T315353) (duration: 03m 48s)
13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:31 marostegui: Failover m5 master
13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:29 taavi@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php: Backport: persistRevisionThreadItems: Allow processing current revisions only (T315510) (duration: 03m 40s)
13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:23 taavi: taavi@mwmaint1002 ~ $ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki testwiki discussiontools
13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:21 taavi@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/SecurePoll/: T316150 (duration: 03m 44s)
13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:14 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Moving 1% of users to php 7.4 (duration: 04m 18s)
13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:12 vgutierrez: Increase roll-out of query-sorting to 75% - T314868
13:06 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
13:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
12:14 vgutierrez: rolling restart of ats-be fleet wide to apply "Hide non session cookies during cache lookup" - T316338 T316337
12:08 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host restbase1031.eqiad.wmnet with OS buster
12:03 hnowlan: joining restbase1031-a to cassandra cluster
12:03 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on restbase1031.eqiad.wmnet with reason: New host
12:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on restbase1031.eqiad.wmnet with reason: New host
11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33600 and previous config saved to /var/cache/conftool/dbconfig/20220829-115107-ladsgroup.json
11:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P33599 and previous config saved to /var/cache/conftool/dbconfig/20220829-113600-ladsgroup.json
11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
11:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1031.eqiad.wmnet with OS buster
11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P33598 and previous config saved to /var/cache/conftool/dbconfig/20220829-112054-ladsgroup.json
11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33597 and previous config saved to /var/cache/conftool/dbconfig/20220829-110548-ladsgroup.json
10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33596 and previous config saved to /var/cache/conftool/dbconfig/20220829-105928-ladsgroup.json
10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33595 and previous config saved to /var/cache/conftool/dbconfig/20220829-105904-ladsgroup.json
10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P33593 and previous config saved to /var/cache/conftool/dbconfig/20220829-104358-ladsgroup.json
10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P33592 and previous config saved to /var/cache/conftool/dbconfig/20220829-102851-ladsgroup.json
10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33591 and previous config saved to /var/cache/conftool/dbconfig/20220829-101345-ladsgroup.json
10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33590 and previous config saved to /var/cache/conftool/dbconfig/20220829-101029-ladsgroup.json
10:09 vgutierrez: test trafficserver: Hide non session cookies during cache lookup in drmrs - T316338 T316337
09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P33589 and previous config saved to /var/cache/conftool/dbconfig/20220829-095523-ladsgroup.json
09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P33587 and previous config saved to /var/cache/conftool/dbconfig/20220829-094017-ladsgroup.json
09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33586 and previous config saved to /var/cache/conftool/dbconfig/20220829-092511-ladsgroup.json
09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33585 and previous config saved to /var/cache/conftool/dbconfig/20220829-092005-ladsgroup.json
09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33584 and previous config saved to /var/cache/conftool/dbconfig/20220829-091840-ladsgroup.json
09:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
09:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33583 and previous config saved to /var/cache/conftool/dbconfig/20220829-091816-ladsgroup.json
09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
09:10 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
09:10 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
09:03 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P33582 and previous config saved to /var/cache/conftool/dbconfig/20220829-090310-ladsgroup.json
08:55 vgutierrez: test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338 T316337
08:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P33581 and previous config saved to /var/cache/conftool/dbconfig/20220829-084804-ladsgroup.json
08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33580 and previous config saved to /var/cache/conftool/dbconfig/20220829-083258-ladsgroup.json
08:31 marostegui: Failover m2 from db1159 to db1164 - T316202
08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33579 and previous config saved to /var/cache/conftool/dbconfig/20220829-082643-ladsgroup.json
08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P33578 and previous config saved to /var/cache/conftool/dbconfig/20220829-081136-ladsgroup.json
08:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:05 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Moving 0.1% of users to php 7.4 (duration: 03m 52s)
08:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:58 vgutierrez: Increase roll-out of query-sorting to 50% - T314868
07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P33577 and previous config saved to /var/cache/conftool/dbconfig/20220829-075630-ladsgroup.json
07:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
07:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33576 and previous config saved to /var/cache/conftool/dbconfig/20220829-074124-ladsgroup.json
07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33575 and previous config saved to /var/cache/conftool/dbconfig/20220829-073516-ladsgroup.json
07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33574 and previous config saved to /var/cache/conftool/dbconfig/20220829-073354-ladsgroup.json
07:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
07:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33573 and previous config saved to /var/cache/conftool/dbconfig/20220829-073330-ladsgroup.json
07:30 marostegui: Failover m3-master
07:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P33572 and previous config saved to /var/cache/conftool/dbconfig/20220829-071824-ladsgroup.json
07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2133,2160].codfw.wmnet,db[1117,1159,1164].eqiad.wmnet with reason: Switchover m2 T316202
07:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2133,2160].codfw.wmnet,db[1117,1159,1164].eqiad.wmnet with reason: Switchover m2 T316202
07:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 88b3ce8: Revert "testwiki: Growth: Assign enrollasmentor to *" (T310905, T314414) (duration: 03m 32s)
07:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 20d6238: cswiki: fix extendedconfirmed permission for bot group (duration: 03m 43s)
07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P33571 and previous config saved to /var/cache/conftool/dbconfig/20220829-070318-ladsgroup.json
06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33570 and previous config saved to /var/cache/conftool/dbconfig/20220829-064811-ladsgroup.json
06:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33569 and previous config saved to /var/cache/conftool/dbconfig/20220829-064154-ladsgroup.json
06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33568 and previous config saved to /var/cache/conftool/dbconfig/20220829-064113-ladsgroup.json
06:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
06:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
06:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P33567 and previous config saved to /var/cache/conftool/dbconfig/20220829-062607-ladsgroup.json
06:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
06:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
06:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
06:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
06:22 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to old templatelinks fields in commons (T312865) (duration: 03m 43s)
06:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P33566 and previous config saved to /var/cache/conftool/dbconfig/20220829-061100-ladsgroup.json
05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33565 and previous config saved to /var/cache/conftool/dbconfig/20220829-055554-ladsgroup.json
05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33564 and previous config saved to /var/cache/conftool/dbconfig/20220829-054939-ladsgroup.json
05:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
05:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
05:44 hashar: Restarted Gerrit for 3.4.5 upgrade
05:40 hashar@deploy1002: Finished deploy [gerrit/gerrit@f1a820b]: Gerrit to 3.4.5 on gerrit1001 (duration: 00m 09s)
05:40 hashar@deploy1002: Started deploy [gerrit/gerrit@f1a820b]: Gerrit to 3.4.5 on gerrit1001
05:37 hashar@deploy1002: Finished deploy [gerrit/gerrit@f1a820b]: Gerrit to 3.4.5 on gerrit2002 (duration: 00m 11s)
05:36 hashar@deploy1002: Started deploy [gerrit/gerrit@f1a820b]: Gerrit to 3.4.5 on gerrit2002
05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust weights on s1 T316481', diff saved to https://phabricator.wikimedia.org/P33563 and previous config saved to /var/cache/conftool/dbconfig/20220829-051206-marostegui.json
05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2103 as master in dbctl T316481', diff saved to https://phabricator.wikimedia.org/P33562 and previous config saved to /var/cache/conftool/dbconfig/20220829-051020-marostegui.json

2022-08-28

21:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P33561 and previous config saved to /var/cache/conftool/dbconfig/20220828-210336-ladsgroup.json
21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P33560 and previous config saved to /var/cache/conftool/dbconfig/20220828-210235-ladsgroup.json
20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P33559 and previous config saved to /var/cache/conftool/dbconfig/20220828-204729-ladsgroup.json
20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T316186)', diff saved to https://phabricator.wikimedia.org/P33558 and previous config saved to /var/cache/conftool/dbconfig/20220828-203223-ladsgroup.json
20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T316186)', diff saved to https://phabricator.wikimedia.org/P33557 and previous config saved to /var/cache/conftool/dbconfig/20220828-202701-ladsgroup.json
20:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
20:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T316186)', diff saved to https://phabricator.wikimedia.org/P33556 and previous config saved to /var/cache/conftool/dbconfig/20220828-202638-ladsgroup.json
20:18 ori: mw1411, mw1413, mw1419, mw1429, mw1431, mw1433: set energy-performance preference to 0 via 'x86_energy_perf_policy --hwp-epp 0' T315398
20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P33555 and previous config saved to /var/cache/conftool/dbconfig/20220828-201131-ladsgroup.json
19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P33554 and previous config saved to /var/cache/conftool/dbconfig/20220828-195625-ladsgroup.json
19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T316186)', diff saved to https://phabricator.wikimedia.org/P33553 and previous config saved to /var/cache/conftool/dbconfig/20220828-194119-ladsgroup.json
19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T316186)', diff saved to https://phabricator.wikimedia.org/P33552 and previous config saved to /var/cache/conftool/dbconfig/20220828-193500-ladsgroup.json
19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
19:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
19:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33551 and previous config saved to /var/cache/conftool/dbconfig/20220828-192705-ladsgroup.json
19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33550 and previous config saved to /var/cache/conftool/dbconfig/20220828-192550-ladsgroup.json
19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33549 and previous config saved to /var/cache/conftool/dbconfig/20220828-192042-ladsgroup.json
19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33548 and previous config saved to /var/cache/conftool/dbconfig/20220828-192016-ladsgroup.json
19:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
19:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T316186)', diff saved to https://phabricator.wikimedia.org/P33547 and previous config saved to /var/cache/conftool/dbconfig/20220828-191951-ladsgroup.json
19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T316186)', diff saved to https://phabricator.wikimedia.org/P33546 and previous config saved to /var/cache/conftool/dbconfig/20220828-191440-ladsgroup.json
19:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
19:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T316186)', diff saved to https://phabricator.wikimedia.org/P33545 and previous config saved to /var/cache/conftool/dbconfig/20220828-191414-ladsgroup.json
19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T316186)', diff saved to https://phabricator.wikimedia.org/P33544 and previous config saved to /var/cache/conftool/dbconfig/20220828-190849-ladsgroup.json
19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
19:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T316186)', diff saved to https://phabricator.wikimedia.org/P33543 and previous config saved to /var/cache/conftool/dbconfig/20220828-190824-ladsgroup.json
19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T316186)', diff saved to https://phabricator.wikimedia.org/P33542 and previous config saved to /var/cache/conftool/dbconfig/20220828-190303-ladsgroup.json
19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
19:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T316186)', diff saved to https://phabricator.wikimedia.org/P33541 and previous config saved to /var/cache/conftool/dbconfig/20220828-190238-ladsgroup.json
18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T316186)', diff saved to https://phabricator.wikimedia.org/P33540 and previous config saved to /var/cache/conftool/dbconfig/20220828-185606-ladsgroup.json
18:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
18:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T316186)', diff saved to https://phabricator.wikimedia.org/P33539 and previous config saved to /var/cache/conftool/dbconfig/20220828-185536-ladsgroup.json
18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T316186)', diff saved to https://phabricator.wikimedia.org/P33538 and previous config saved to /var/cache/conftool/dbconfig/20220828-185022-ladsgroup.json
18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T316186)', diff saved to https://phabricator.wikimedia.org/P33537 and previous config saved to /var/cache/conftool/dbconfig/20220828-184542-ladsgroup.json
18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T316186)', diff saved to https://phabricator.wikimedia.org/P33536 and previous config saved to /var/cache/conftool/dbconfig/20220828-183915-ladsgroup.json
18:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
18:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T316186)', diff saved to https://phabricator.wikimedia.org/P33535 and previous config saved to /var/cache/conftool/dbconfig/20220828-183850-ladsgroup.json
18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T316186)', diff saved to https://phabricator.wikimedia.org/P33534 and previous config saved to /var/cache/conftool/dbconfig/20220828-183226-ladsgroup.json
18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T316186)', diff saved to https://phabricator.wikimedia.org/P33533 and previous config saved to /var/cache/conftool/dbconfig/20220828-183156-ladsgroup.json
18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T316186)', diff saved to https://phabricator.wikimedia.org/P33532 and previous config saved to /var/cache/conftool/dbconfig/20220828-182630-ladsgroup.json
18:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
18:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33531 and previous config saved to /var/cache/conftool/dbconfig/20220828-182605-ladsgroup.json
18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33530 and previous config saved to /var/cache/conftool/dbconfig/20220828-182350-ladsgroup.json
18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33529 and previous config saved to /var/cache/conftool/dbconfig/20220828-181830-ladsgroup.json
18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33528 and previous config saved to /var/cache/conftool/dbconfig/20220828-181805-ladsgroup.json
18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
18:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T316186)', diff saved to https://phabricator.wikimedia.org/P33527 and previous config saved to /var/cache/conftool/dbconfig/20220828-181421-ladsgroup.json
18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T316186)', diff saved to https://phabricator.wikimedia.org/P33526 and previous config saved to /var/cache/conftool/dbconfig/20220828-180751-ladsgroup.json
18:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
18:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T316186)', diff saved to https://phabricator.wikimedia.org/P33525 and previous config saved to /var/cache/conftool/dbconfig/20220828-180725-ladsgroup.json
18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T316186)', diff saved to https://phabricator.wikimedia.org/P33524 and previous config saved to /var/cache/conftool/dbconfig/20220828-180108-ladsgroup.json
18:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
18:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T316186)', diff saved to https://phabricator.wikimedia.org/P33523 and previous config saved to /var/cache/conftool/dbconfig/20220828-180042-ladsgroup.json
17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T316186)', diff saved to https://phabricator.wikimedia.org/P33522 and previous config saved to /var/cache/conftool/dbconfig/20220828-175311-ladsgroup.json
17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
17:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T316186)', diff saved to https://phabricator.wikimedia.org/P33521 and previous config saved to /var/cache/conftool/dbconfig/20220828-175246-ladsgroup.json
17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T316186)', diff saved to https://phabricator.wikimedia.org/P33520 and previous config saved to /var/cache/conftool/dbconfig/20220828-174655-ladsgroup.json
17:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
17:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T316186)', diff saved to https://phabricator.wikimedia.org/P33519 and previous config saved to /var/cache/conftool/dbconfig/20220828-174630-ladsgroup.json
17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T316186)', diff saved to https://phabricator.wikimedia.org/P33518 and previous config saved to /var/cache/conftool/dbconfig/20220828-174059-ladsgroup.json
17:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
17:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
17:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling failed', diff saved to https://phabricator.wikimedia.org/P33517 and previous config saved to /var/cache/conftool/dbconfig/20220828-174002-ladsgroup.json
17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T316186)', diff saved to https://phabricator.wikimedia.org/P33516 and previous config saved to /var/cache/conftool/dbconfig/20220828-173304-ladsgroup.json
17:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
17:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T316186)', diff saved to https://phabricator.wikimedia.org/P33515 and previous config saved to /var/cache/conftool/dbconfig/20220828-173241-ladsgroup.json
17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P33514 and previous config saved to /var/cache/conftool/dbconfig/20220828-171734-ladsgroup.json
17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P33513 and previous config saved to /var/cache/conftool/dbconfig/20220828-170228-ladsgroup.json
16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T316186)', diff saved to https://phabricator.wikimedia.org/P33512 and previous config saved to /var/cache/conftool/dbconfig/20220828-164722-ladsgroup.json
16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T316186)', diff saved to https://phabricator.wikimedia.org/P33511 and previous config saved to /var/cache/conftool/dbconfig/20220828-164211-ladsgroup.json
16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T316186)', diff saved to https://phabricator.wikimedia.org/P33510 and previous config saved to /var/cache/conftool/dbconfig/20220828-164004-ladsgroup.json
16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T316186)', diff saved to https://phabricator.wikimedia.org/P33509 and previous config saved to /var/cache/conftool/dbconfig/20220828-163447-ladsgroup.json
16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33508 and previous config saved to /var/cache/conftool/dbconfig/20220828-163211-ladsgroup.json
16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33507 and previous config saved to /var/cache/conftool/dbconfig/20220828-162906-ladsgroup.json
16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33506 and previous config saved to /var/cache/conftool/dbconfig/20220828-162349-ladsgroup.json
16:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
16:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33505 and previous config saved to /var/cache/conftool/dbconfig/20220828-162324-ladsgroup.json
16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P33504 and previous config saved to /var/cache/conftool/dbconfig/20220828-160818-ladsgroup.json
15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P33503 and previous config saved to /var/cache/conftool/dbconfig/20220828-155312-ladsgroup.json
15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33502 and previous config saved to /var/cache/conftool/dbconfig/20220828-153806-ladsgroup.json
15:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33501 and previous config saved to /var/cache/conftool/dbconfig/20220828-153349-ladsgroup.json
15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P33499 and previous config saved to /var/cache/conftool/dbconfig/20220828-150336-ladsgroup.json
14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33498 and previous config saved to /var/cache/conftool/dbconfig/20220828-144830-ladsgroup.json
14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33497 and previous config saved to /var/cache/conftool/dbconfig/20220828-144319-ladsgroup.json
14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33496 and previous config saved to /var/cache/conftool/dbconfig/20220828-144257-ladsgroup.json
14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33495 and previous config saved to /var/cache/conftool/dbconfig/20220828-144232-ladsgroup.json
14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P33494 and previous config saved to /var/cache/conftool/dbconfig/20220828-142726-ladsgroup.json
14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P33493 and previous config saved to /var/cache/conftool/dbconfig/20220828-141220-ladsgroup.json
13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33492 and previous config saved to /var/cache/conftool/dbconfig/20220828-135713-ladsgroup.json
13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33491 and previous config saved to /var/cache/conftool/dbconfig/20220828-135158-ladsgroup.json
13:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33490 and previous config saved to /var/cache/conftool/dbconfig/20220828-135133-ladsgroup.json
13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P33489 and previous config saved to /var/cache/conftool/dbconfig/20220828-133627-ladsgroup.json
13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P33488 and previous config saved to /var/cache/conftool/dbconfig/20220828-132120-ladsgroup.json
13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33487 and previous config saved to /var/cache/conftool/dbconfig/20220828-130614-ladsgroup.json
13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33486 and previous config saved to /var/cache/conftool/dbconfig/20220828-130059-ladsgroup.json
13:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
13:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T316186)', diff saved to https://phabricator.wikimedia.org/P33485 and previous config saved to /var/cache/conftool/dbconfig/20220828-130033-ladsgroup.json
12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P33484 and previous config saved to /var/cache/conftool/dbconfig/20220828-124527-ladsgroup.json
12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P33483 and previous config saved to /var/cache/conftool/dbconfig/20220828-123021-ladsgroup.json
12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T316186)', diff saved to https://phabricator.wikimedia.org/P33482 and previous config saved to /var/cache/conftool/dbconfig/20220828-121515-ladsgroup.json
12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T316186)', diff saved to https://phabricator.wikimedia.org/P33481 and previous config saved to /var/cache/conftool/dbconfig/20220828-121000-ladsgroup.json
12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
12:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
12:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T316186)', diff saved to https://phabricator.wikimedia.org/P33480 and previous config saved to /var/cache/conftool/dbconfig/20220828-120931-ladsgroup.json
11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P33479 and previous config saved to /var/cache/conftool/dbconfig/20220828-115424-ladsgroup.json
11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P33478 and previous config saved to /var/cache/conftool/dbconfig/20220828-113918-ladsgroup.json
11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T316186)', diff saved to https://phabricator.wikimedia.org/P33477 and previous config saved to /var/cache/conftool/dbconfig/20220828-112412-ladsgroup.json
11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T316186)', diff saved to https://phabricator.wikimedia.org/P33476 and previous config saved to /var/cache/conftool/dbconfig/20220828-111857-ladsgroup.json
11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33475 and previous config saved to /var/cache/conftool/dbconfig/20220828-111832-ladsgroup.json
11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P33474 and previous config saved to /var/cache/conftool/dbconfig/20220828-110326-ladsgroup.json
10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P33473 and previous config saved to /var/cache/conftool/dbconfig/20220828-104820-ladsgroup.json
10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33472 and previous config saved to /var/cache/conftool/dbconfig/20220828-103314-ladsgroup.json
10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33471 and previous config saved to /var/cache/conftool/dbconfig/20220828-102800-ladsgroup.json
10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T316186)', diff saved to https://phabricator.wikimedia.org/P33470 and previous config saved to /var/cache/conftool/dbconfig/20220828-102423-ladsgroup.json
10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P33469 and previous config saved to /var/cache/conftool/dbconfig/20220828-100917-ladsgroup.json
09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P33468 and previous config saved to /var/cache/conftool/dbconfig/20220828-095411-ladsgroup.json
09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T316186)', diff saved to https://phabricator.wikimedia.org/P33467 and previous config saved to /var/cache/conftool/dbconfig/20220828-093904-ladsgroup.json
09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T316186)', diff saved to https://phabricator.wikimedia.org/P33466 and previous config saved to /var/cache/conftool/dbconfig/20220828-093346-ladsgroup.json
09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33465 and previous config saved to /var/cache/conftool/dbconfig/20220828-082851-ladsgroup.json
08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P33464 and previous config saved to /var/cache/conftool/dbconfig/20220828-081344-ladsgroup.json
07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P33463 and previous config saved to /var/cache/conftool/dbconfig/20220828-075838-ladsgroup.json
07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33462 and previous config saved to /var/cache/conftool/dbconfig/20220828-074332-ladsgroup.json
07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33461 and previous config saved to /var/cache/conftool/dbconfig/20220828-074116-ladsgroup.json
07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P33460 and previous config saved to /var/cache/conftool/dbconfig/20220828-072610-ladsgroup.json
07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P33459 and previous config saved to /var/cache/conftool/dbconfig/20220828-071103-ladsgroup.json
06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33458 and previous config saved to /var/cache/conftool/dbconfig/20220828-065557-ladsgroup.json
06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33457 and previous config saved to /var/cache/conftool/dbconfig/20220828-064952-ladsgroup.json
06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33456 and previous config saved to /var/cache/conftool/dbconfig/20220828-064920-ladsgroup.json
06:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
06:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2113 (T316186)', diff saved to https://phabricator.wikimedia.org/P33455 and previous config saved to /var/cache/conftool/dbconfig/20220828-064855-ladsgroup.json
06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2113', diff saved to https://phabricator.wikimedia.org/P33454 and previous config saved to /var/cache/conftool/dbconfig/20220828-063348-ladsgroup.json
06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2113', diff saved to https://phabricator.wikimedia.org/P33453 and previous config saved to /var/cache/conftool/dbconfig/20220828-061842-ladsgroup.json
06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2113 (T316186)', diff saved to https://phabricator.wikimedia.org/P33452 and previous config saved to /var/cache/conftool/dbconfig/20220828-060336-ladsgroup.json
05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2113 (T316186)', diff saved to https://phabricator.wikimedia.org/P33451 and previous config saved to /var/cache/conftool/dbconfig/20220828-055821-ladsgroup.json
05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T316186)', diff saved to https://phabricator.wikimedia.org/P33450 and previous config saved to /var/cache/conftool/dbconfig/20220828-055756-ladsgroup.json
05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P33449 and previous config saved to /var/cache/conftool/dbconfig/20220828-054249-ladsgroup.json
05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P33448 and previous config saved to /var/cache/conftool/dbconfig/20220828-052743-ladsgroup.json
05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T316186)', diff saved to https://phabricator.wikimedia.org/P33447 and previous config saved to /var/cache/conftool/dbconfig/20220828-051237-ladsgroup.json
05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T316186)', diff saved to https://phabricator.wikimedia.org/P33446 and previous config saved to /var/cache/conftool/dbconfig/20220828-050729-ladsgroup.json
05:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
05:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T316186)', diff saved to https://phabricator.wikimedia.org/P33445 and previous config saved to /var/cache/conftool/dbconfig/20220828-050704-ladsgroup.json
04:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P33444 and previous config saved to /var/cache/conftool/dbconfig/20220828-045157-ladsgroup.json
04:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P33443 and previous config saved to /var/cache/conftool/dbconfig/20220828-043651-ladsgroup.json
04:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T316186)', diff saved to https://phabricator.wikimedia.org/P33442 and previous config saved to /var/cache/conftool/dbconfig/20220828-042145-ladsgroup.json
04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T316186)', diff saved to https://phabricator.wikimedia.org/P33441 and previous config saved to /var/cache/conftool/dbconfig/20220828-041622-ladsgroup.json
04:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
04:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
04:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
04:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
04:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
04:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T316186)', diff saved to https://phabricator.wikimedia.org/P33440 and previous config saved to /var/cache/conftool/dbconfig/20220828-041231-ladsgroup.json
03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P33439 and previous config saved to /var/cache/conftool/dbconfig/20220828-035725-ladsgroup.json
03:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P33438 and previous config saved to /var/cache/conftool/dbconfig/20220828-034219-ladsgroup.json
03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T316186)', diff saved to https://phabricator.wikimedia.org/P33437 and previous config saved to /var/cache/conftool/dbconfig/20220828-032713-ladsgroup.json
03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T316186)', diff saved to https://phabricator.wikimedia.org/P33436 and previous config saved to /var/cache/conftool/dbconfig/20220828-032202-ladsgroup.json
03:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
03:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T316186)', diff saved to https://phabricator.wikimedia.org/P33435 and previous config saved to /var/cache/conftool/dbconfig/20220828-032137-ladsgroup.json
03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P33434 and previous config saved to /var/cache/conftool/dbconfig/20220828-030631-ladsgroup.json
02:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P33433 and previous config saved to /var/cache/conftool/dbconfig/20220828-025124-ladsgroup.json
02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T316186)', diff saved to https://phabricator.wikimedia.org/P33432 and previous config saved to /var/cache/conftool/dbconfig/20220828-023618-ladsgroup.json
02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T316186)', diff saved to https://phabricator.wikimedia.org/P33431 and previous config saved to /var/cache/conftool/dbconfig/20220828-023111-ladsgroup.json
02:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
02:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
02:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
02:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
02:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T316186)', diff saved to https://phabricator.wikimedia.org/P33430 and previous config saved to /var/cache/conftool/dbconfig/20220828-022620-ladsgroup.json
02:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P33429 and previous config saved to /var/cache/conftool/dbconfig/20220828-021114-ladsgroup.json
01:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P33428 and previous config saved to /var/cache/conftool/dbconfig/20220828-015608-ladsgroup.json
01:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T316186)', diff saved to https://phabricator.wikimedia.org/P33427 and previous config saved to /var/cache/conftool/dbconfig/20220828-014101-ladsgroup.json
01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T316186)', diff saved to https://phabricator.wikimedia.org/P33426 and previous config saved to /var/cache/conftool/dbconfig/20220828-013558-ladsgroup.json
01:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
01:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T316186)', diff saved to https://phabricator.wikimedia.org/P33425 and previous config saved to /var/cache/conftool/dbconfig/20220828-013534-ladsgroup.json
01:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P33424 and previous config saved to /var/cache/conftool/dbconfig/20220828-012028-ladsgroup.json
01:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P33423 and previous config saved to /var/cache/conftool/dbconfig/20220828-010522-ladsgroup.json
00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T316186)', diff saved to https://phabricator.wikimedia.org/P33422 and previous config saved to /var/cache/conftool/dbconfig/20220828-005015-ladsgroup.json
00:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T316186)', diff saved to https://phabricator.wikimedia.org/P33421 and previous config saved to /var/cache/conftool/dbconfig/20220828-004410-ladsgroup.json
00:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
00:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
00:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
00:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
00:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33420 and previous config saved to /var/cache/conftool/dbconfig/20220828-004329-ladsgroup.json
00:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P33419 and previous config saved to /var/cache/conftool/dbconfig/20220828-002823-ladsgroup.json
00:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P33418 and previous config saved to /var/cache/conftool/dbconfig/20220828-001317-ladsgroup.json

2022-08-27

23:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33417 and previous config saved to /var/cache/conftool/dbconfig/20220827-235810-ladsgroup.json
23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33416 and previous config saved to /var/cache/conftool/dbconfig/20220827-235556-ladsgroup.json
23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P33415 and previous config saved to /var/cache/conftool/dbconfig/20220827-234050-ladsgroup.json
23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P33414 and previous config saved to /var/cache/conftool/dbconfig/20220827-232544-ladsgroup.json
23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33413 and previous config saved to /var/cache/conftool/dbconfig/20220827-231038-ladsgroup.json
23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33412 and previous config saved to /var/cache/conftool/dbconfig/20220827-230339-ladsgroup.json
23:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33411 and previous config saved to /var/cache/conftool/dbconfig/20220827-230214-ladsgroup.json
23:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
23:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T316186)', diff saved to https://phabricator.wikimedia.org/P33410 and previous config saved to /var/cache/conftool/dbconfig/20220827-230150-ladsgroup.json
22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P33408 and previous config saved to /var/cache/conftool/dbconfig/20220827-223137-ladsgroup.json
22:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149', diff saved to https://phabricator.wikimedia.org/P33407 and previous config saved to /var/cache/conftool/dbconfig/20220827-221749-ladsgroup.json
22:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2149.codfw.wmnet with reason: Sad disk
22:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2149.codfw.wmnet with reason: Sad disk
22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T316186)', diff saved to https://phabricator.wikimedia.org/P33406 and previous config saved to /var/cache/conftool/dbconfig/20220827-221631-ladsgroup.json
22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T316186)', diff saved to https://phabricator.wikimedia.org/P33405 and previous config saved to /var/cache/conftool/dbconfig/20220827-221118-ladsgroup.json
22:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
22:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33404 and previous config saved to /var/cache/conftool/dbconfig/20220827-205809-ladsgroup.json
20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P33403 and previous config saved to /var/cache/conftool/dbconfig/20220827-204303-ladsgroup.json
20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P33402 and previous config saved to /var/cache/conftool/dbconfig/20220827-202757-ladsgroup.json
20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33401 and previous config saved to /var/cache/conftool/dbconfig/20220827-201250-ladsgroup.json
20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33400 and previous config saved to /var/cache/conftool/dbconfig/20220827-200639-ladsgroup.json
20:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
20:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
20:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
20:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T316186)', diff saved to https://phabricator.wikimedia.org/P33399 and previous config saved to /var/cache/conftool/dbconfig/20220827-200559-ladsgroup.json
19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P33398 and previous config saved to /var/cache/conftool/dbconfig/20220827-195053-ladsgroup.json
19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P33397 and previous config saved to /var/cache/conftool/dbconfig/20220827-193546-ladsgroup.json
19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T316186)', diff saved to https://phabricator.wikimedia.org/P33396 and previous config saved to /var/cache/conftool/dbconfig/20220827-192040-ladsgroup.json
19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T316186)', diff saved to https://phabricator.wikimedia.org/P33395 and previous config saved to /var/cache/conftool/dbconfig/20220827-191515-ladsgroup.json
19:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
19:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33394 and previous config saved to /var/cache/conftool/dbconfig/20220827-191450-ladsgroup.json
18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P33393 and previous config saved to /var/cache/conftool/dbconfig/20220827-185944-ladsgroup.json
18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P33392 and previous config saved to /var/cache/conftool/dbconfig/20220827-184438-ladsgroup.json
18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33391 and previous config saved to /var/cache/conftool/dbconfig/20220827-182931-ladsgroup.json
18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33390 and previous config saved to /var/cache/conftool/dbconfig/20220827-182408-ladsgroup.json
18:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
18:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T316186)', diff saved to https://phabricator.wikimedia.org/P33389 and previous config saved to /var/cache/conftool/dbconfig/20220827-182343-ladsgroup.json
18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P33388 and previous config saved to /var/cache/conftool/dbconfig/20220827-180836-ladsgroup.json
17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P33387 and previous config saved to /var/cache/conftool/dbconfig/20220827-175330-ladsgroup.json
17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T316186)', diff saved to https://phabricator.wikimedia.org/P33386 and previous config saved to /var/cache/conftool/dbconfig/20220827-173824-ladsgroup.json
17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T316186)', diff saved to https://phabricator.wikimedia.org/P33385 and previous config saved to /var/cache/conftool/dbconfig/20220827-173305-ladsgroup.json
17:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
17:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T316186)', diff saved to https://phabricator.wikimedia.org/P33384 and previous config saved to /var/cache/conftool/dbconfig/20220827-173240-ladsgroup.json
17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P33383 and previous config saved to /var/cache/conftool/dbconfig/20220827-171734-ladsgroup.json
17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P33382 and previous config saved to /var/cache/conftool/dbconfig/20220827-170227-ladsgroup.json
16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T316186)', diff saved to https://phabricator.wikimedia.org/P33381 and previous config saved to /var/cache/conftool/dbconfig/20220827-164721-ladsgroup.json
16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T316186)', diff saved to https://phabricator.wikimedia.org/P33380 and previous config saved to /var/cache/conftool/dbconfig/20220827-164156-ladsgroup.json
16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T316186)', diff saved to https://phabricator.wikimedia.org/P33379 and previous config saved to /var/cache/conftool/dbconfig/20220827-163528-ladsgroup.json
16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P33378 and previous config saved to /var/cache/conftool/dbconfig/20220827-162022-ladsgroup.json
16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P33377 and previous config saved to /var/cache/conftool/dbconfig/20220827-160516-ladsgroup.json
15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T316186)', diff saved to https://phabricator.wikimedia.org/P33376 and previous config saved to /var/cache/conftool/dbconfig/20220827-155010-ladsgroup.json
15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T316186)', diff saved to https://phabricator.wikimedia.org/P33375 and previous config saved to /var/cache/conftool/dbconfig/20220827-154452-ladsgroup.json
15:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
15:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T316186)', diff saved to https://phabricator.wikimedia.org/P33374 and previous config saved to /var/cache/conftool/dbconfig/20220827-154410-ladsgroup.json
15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P33373 and previous config saved to /var/cache/conftool/dbconfig/20220827-152903-ladsgroup.json
15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P33372 and previous config saved to /var/cache/conftool/dbconfig/20220827-151357-ladsgroup.json
14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T316186)', diff saved to https://phabricator.wikimedia.org/P33371 and previous config saved to /var/cache/conftool/dbconfig/20220827-145851-ladsgroup.json
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T316186)', diff saved to https://phabricator.wikimedia.org/P33370 and previous config saved to /var/cache/conftool/dbconfig/20220827-145224-ladsgroup.json
14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T316186)', diff saved to https://phabricator.wikimedia.org/P33369 and previous config saved to /var/cache/conftool/dbconfig/20220827-145201-ladsgroup.json
14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P33368 and previous config saved to /var/cache/conftool/dbconfig/20220827-143654-ladsgroup.json
14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P33367 and previous config saved to /var/cache/conftool/dbconfig/20220827-142148-ladsgroup.json
14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T316186)', diff saved to https://phabricator.wikimedia.org/P33366 and previous config saved to /var/cache/conftool/dbconfig/20220827-140642-ladsgroup.json
13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T316186)', diff saved to https://phabricator.wikimedia.org/P33365 and previous config saved to /var/cache/conftool/dbconfig/20220827-135719-ladsgroup.json
13:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
13:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T316186)', diff saved to https://phabricator.wikimedia.org/P33364 and previous config saved to /var/cache/conftool/dbconfig/20220827-135655-ladsgroup.json
13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P33363 and previous config saved to /var/cache/conftool/dbconfig/20220827-134149-ladsgroup.json
13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P33362 and previous config saved to /var/cache/conftool/dbconfig/20220827-132643-ladsgroup.json
13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T316186)', diff saved to https://phabricator.wikimedia.org/P33361 and previous config saved to /var/cache/conftool/dbconfig/20220827-131136-ladsgroup.json
12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T316186)', diff saved to https://phabricator.wikimedia.org/P33360 and previous config saved to /var/cache/conftool/dbconfig/20220827-121121-ladsgroup.json
12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
12:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
12:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33359 and previous config saved to /var/cache/conftool/dbconfig/20220827-120059-ladsgroup.json
11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P33358 and previous config saved to /var/cache/conftool/dbconfig/20220827-114552-ladsgroup.json
11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P33357 and previous config saved to /var/cache/conftool/dbconfig/20220827-113046-ladsgroup.json
11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33356 and previous config saved to /var/cache/conftool/dbconfig/20220827-111540-ladsgroup.json
10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33355 and previous config saved to /var/cache/conftool/dbconfig/20220827-101523-ladsgroup.json
10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
10:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T316186)', diff saved to https://phabricator.wikimedia.org/P33354 and previous config saved to /var/cache/conftool/dbconfig/20220827-101459-ladsgroup.json
09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P33353 and previous config saved to /var/cache/conftool/dbconfig/20220827-095953-ladsgroup.json
09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P33352 and previous config saved to /var/cache/conftool/dbconfig/20220827-094446-ladsgroup.json
09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T316186)', diff saved to https://phabricator.wikimedia.org/P33351 and previous config saved to /var/cache/conftool/dbconfig/20220827-092940-ladsgroup.json
08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T316186)', diff saved to https://phabricator.wikimedia.org/P33350 and previous config saved to /var/cache/conftool/dbconfig/20220827-082924-ladsgroup.json
08:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
08:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
01:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T316186)', diff saved to https://phabricator.wikimedia.org/P33349 and previous config saved to /var/cache/conftool/dbconfig/20220827-014831-ladsgroup.json
01:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P33348 and previous config saved to /var/cache/conftool/dbconfig/20220827-013325-ladsgroup.json
01:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P33347 and previous config saved to /var/cache/conftool/dbconfig/20220827-011819-ladsgroup.json
01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T316186)', diff saved to https://phabricator.wikimedia.org/P33346 and previous config saved to /var/cache/conftool/dbconfig/20220827-010313-ladsgroup.json
00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T316186)', diff saved to https://phabricator.wikimedia.org/P33345 and previous config saved to /var/cache/conftool/dbconfig/20220827-005555-ladsgroup.json
00:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
00:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
00:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
00:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T316186)', diff saved to https://phabricator.wikimedia.org/P33344 and previous config saved to /var/cache/conftool/dbconfig/20220827-005525-ladsgroup.json
00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P33343 and previous config saved to /var/cache/conftool/dbconfig/20220827-004019-ladsgroup.json
00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P33342 and previous config saved to /var/cache/conftool/dbconfig/20220827-002513-ladsgroup.json
00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T316186)', diff saved to https://phabricator.wikimedia.org/P33341 and previous config saved to /var/cache/conftool/dbconfig/20220827-001006-ladsgroup.json
00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T316186)', diff saved to https://phabricator.wikimedia.org/P33340 and previous config saved to /var/cache/conftool/dbconfig/20220827-000442-ladsgroup.json
00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33339 and previous config saved to /var/cache/conftool/dbconfig/20220827-000415-ladsgroup.json

2022-08-26

23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P33338 and previous config saved to /var/cache/conftool/dbconfig/20220826-234908-ladsgroup.json
23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P33337 and previous config saved to /var/cache/conftool/dbconfig/20220826-233402-ladsgroup.json
23:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33336 and previous config saved to /var/cache/conftool/dbconfig/20220826-231856-ladsgroup.json
23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33335 and previous config saved to /var/cache/conftool/dbconfig/20220826-231540-ladsgroup.json
23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P33334 and previous config saved to /var/cache/conftool/dbconfig/20220826-230033-ladsgroup.json
22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P33333 and previous config saved to /var/cache/conftool/dbconfig/20220826-224527-ladsgroup.json
22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33332 and previous config saved to /var/cache/conftool/dbconfig/20220826-223021-ladsgroup.json
22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33331 and previous config saved to /var/cache/conftool/dbconfig/20220826-222409-ladsgroup.json
22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33330 and previous config saved to /var/cache/conftool/dbconfig/20220826-222345-ladsgroup.json
22:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
22:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T316186)', diff saved to https://phabricator.wikimedia.org/P33329 and previous config saved to /var/cache/conftool/dbconfig/20220826-222320-ladsgroup.json
22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P33328 and previous config saved to /var/cache/conftool/dbconfig/20220826-220814-ladsgroup.json
21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P33327 and previous config saved to /var/cache/conftool/dbconfig/20220826-215307-ladsgroup.json
21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T316186)', diff saved to https://phabricator.wikimedia.org/P33326 and previous config saved to /var/cache/conftool/dbconfig/20220826-213801-ladsgroup.json
21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 (T316186)', diff saved to https://phabricator.wikimedia.org/P33325 and previous config saved to /var/cache/conftool/dbconfig/20220826-213140-ladsgroup.json
21:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T316186)', diff saved to https://phabricator.wikimedia.org/P33324 and previous config saved to /var/cache/conftool/dbconfig/20220826-213115-ladsgroup.json
21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P33323 and previous config saved to /var/cache/conftool/dbconfig/20220826-211608-ladsgroup.json
21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P33322 and previous config saved to /var/cache/conftool/dbconfig/20220826-210102-ladsgroup.json
20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T316186)', diff saved to https://phabricator.wikimedia.org/P33321 and previous config saved to /var/cache/conftool/dbconfig/20220826-204555-ladsgroup.json
20:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T316186)', diff saved to https://phabricator.wikimedia.org/P33320 and previous config saved to /var/cache/conftool/dbconfig/20220826-203935-ladsgroup.json
20:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
20:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
20:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T316186)', diff saved to https://phabricator.wikimedia.org/P33319 and previous config saved to /var/cache/conftool/dbconfig/20220826-203910-ladsgroup.json
20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P33318 and previous config saved to /var/cache/conftool/dbconfig/20220826-202404-ladsgroup.json
20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P33316 and previous config saved to /var/cache/conftool/dbconfig/20220826-200858-ladsgroup.json
19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T316186)', diff saved to https://phabricator.wikimedia.org/P33315 and previous config saved to /var/cache/conftool/dbconfig/20220826-195351-ladsgroup.json
19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T316186)', diff saved to https://phabricator.wikimedia.org/P33314 and previous config saved to /var/cache/conftool/dbconfig/20220826-194734-ladsgroup.json
19:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
19:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T316186)', diff saved to https://phabricator.wikimedia.org/P33313 and previous config saved to /var/cache/conftool/dbconfig/20220826-194709-ladsgroup.json
19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P33312 and previous config saved to /var/cache/conftool/dbconfig/20220826-193203-ladsgroup.json
19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P33311 and previous config saved to /var/cache/conftool/dbconfig/20220826-191657-ladsgroup.json
19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T316186)', diff saved to https://phabricator.wikimedia.org/P33310 and previous config saved to /var/cache/conftool/dbconfig/20220826-190151-ladsgroup.json
18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T316186)', diff saved to https://phabricator.wikimedia.org/P33309 and previous config saved to /var/cache/conftool/dbconfig/20220826-185527-ladsgroup.json
18:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33308 and previous config saved to /var/cache/conftool/dbconfig/20220826-185502-ladsgroup.json
18:40 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@c5f46a4]: (no justification provided) (duration: 00m 10s)
18:40 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@c5f46a4]: (no justification provided)
18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P33307 and previous config saved to /var/cache/conftool/dbconfig/20220826-183956-ladsgroup.json
18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P33306 and previous config saved to /var/cache/conftool/dbconfig/20220826-182450-ladsgroup.json
18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33305 and previous config saved to /var/cache/conftool/dbconfig/20220826-180943-ladsgroup.json
18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33304 and previous config saved to /var/cache/conftool/dbconfig/20220826-180223-ladsgroup.json
18:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
18:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33303 and previous config saved to /var/cache/conftool/dbconfig/20220826-180157-ladsgroup.json
17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P33302 and previous config saved to /var/cache/conftool/dbconfig/20220826-174651-ladsgroup.json
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P33301 and previous config saved to /var/cache/conftool/dbconfig/20220826-173144-ladsgroup.json
17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33300 and previous config saved to /var/cache/conftool/dbconfig/20220826-171638-ladsgroup.json
17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33299 and previous config saved to /var/cache/conftool/dbconfig/20220826-170911-ladsgroup.json
17:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
17:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T316186)', diff saved to https://phabricator.wikimedia.org/P33298 and previous config saved to /var/cache/conftool/dbconfig/20220826-170538-ladsgroup.json
16:56 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@5d95fe5]: Add job for MediaWiki history dumps. (duration: 00m 13s)
16:56 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@5d95fe5]: Add job for MediaWiki history dumps.
16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P33297 and previous config saved to /var/cache/conftool/dbconfig/20220826-165032-ladsgroup.json
16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P33296 and previous config saved to /var/cache/conftool/dbconfig/20220826-163525-ladsgroup.json
16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T316186)', diff saved to https://phabricator.wikimedia.org/P33295 and previous config saved to /var/cache/conftool/dbconfig/20220826-162019-ladsgroup.json
15:50 jynus: rolling restart of ms-backup1001,2, ms-backup2001,2
15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T316186)', diff saved to https://phabricator.wikimedia.org/P33293 and previous config saved to /var/cache/conftool/dbconfig/20220826-152003-ladsgroup.json
15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33292 and previous config saved to /var/cache/conftool/dbconfig/20220826-151921-ladsgroup.json
15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P33291 and previous config saved to /var/cache/conftool/dbconfig/20220826-150415-ladsgroup.json
14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P33290 and previous config saved to /var/cache/conftool/dbconfig/20220826-144908-ladsgroup.json
14:38 jynus: rolling restart of backup1004-9, backup2004-9
14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33289 and previous config saved to /var/cache/conftool/dbconfig/20220826-143402-ladsgroup.json
14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33288 and previous config saved to /var/cache/conftool/dbconfig/20220826-142945-ladsgroup.json
14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P33286 and previous config saved to /var/cache/conftool/dbconfig/20220826-141438-ladsgroup.json
13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P33285 and previous config saved to /var/cache/conftool/dbconfig/20220826-135932-ladsgroup.json
13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33284 and previous config saved to /var/cache/conftool/dbconfig/20220826-134426-ladsgroup.json
13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33283 and previous config saved to /var/cache/conftool/dbconfig/20220826-133318-ladsgroup.json
13:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33281 and previous config saved to /var/cache/conftool/dbconfig/20220826-132817-root.json
13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33280 and previous config saved to /var/cache/conftool/dbconfig/20220826-132751-ladsgroup.json
13:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
13:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33279 and previous config saved to /var/cache/conftool/dbconfig/20220826-132304-ladsgroup.json
13:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33278 and previous config saved to /var/cache/conftool/dbconfig/20220826-131312-root.json
13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P33277 and previous config saved to /var/cache/conftool/dbconfig/20220826-130756-ladsgroup.json
12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33276 and previous config saved to /var/cache/conftool/dbconfig/20220826-125808-root.json
12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P33275 and previous config saved to /var/cache/conftool/dbconfig/20220826-125250-ladsgroup.json
12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33274 and previous config saved to /var/cache/conftool/dbconfig/20220826-124303-root.json
12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33273 and previous config saved to /var/cache/conftool/dbconfig/20220826-123743-ladsgroup.json
12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33272 and previous config saved to /var/cache/conftool/dbconfig/20220826-122758-root.json
12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33271 and previous config saved to /var/cache/conftool/dbconfig/20220826-122527-ladsgroup.json
12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33270 and previous config saved to /var/cache/conftool/dbconfig/20220826-121253-root.json
12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P33269 and previous config saved to /var/cache/conftool/dbconfig/20220826-121021-ladsgroup.json
11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33268 and previous config saved to /var/cache/conftool/dbconfig/20220826-115748-root.json
11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P33267 and previous config saved to /var/cache/conftool/dbconfig/20220826-115514-ladsgroup.json
11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33266 and previous config saved to /var/cache/conftool/dbconfig/20220826-114243-root.json
11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33265 and previous config saved to /var/cache/conftool/dbconfig/20220826-114008-ladsgroup.json
11:37 moritzm: installing intel-microcode updates on stretch hosts
11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33264 and previous config saved to /var/cache/conftool/dbconfig/20220826-113511-ladsgroup.json
11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33263 and previous config saved to /var/cache/conftool/dbconfig/20220826-113347-ladsgroup.json
11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
11:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T316186)', diff saved to https://phabricator.wikimedia.org/P33262 and previous config saved to /var/cache/conftool/dbconfig/20220826-112946-ladsgroup.json
11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33261 and previous config saved to /var/cache/conftool/dbconfig/20220826-112739-root.json
11:19 moritzm: uploaded intel-microcode 3.20220510.1~wmf9u1 to apt.wikimedia.org
11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P33260 and previous config saved to /var/cache/conftool/dbconfig/20220826-111440-ladsgroup.json
11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 1%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33259 and previous config saved to /var/cache/conftool/dbconfig/20220826-111234-root.json
10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P33258 and previous config saved to /var/cache/conftool/dbconfig/20220826-105934-ladsgroup.json
10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T316186)', diff saved to https://phabricator.wikimedia.org/P33257 and previous config saved to /var/cache/conftool/dbconfig/20220826-104427-ladsgroup.json
10:44 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T316186)', diff saved to https://phabricator.wikimedia.org/P33256 and previous config saved to /var/cache/conftool/dbconfig/20220826-103707-ladsgroup.json
10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
10:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
10:33 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33255 and previous config saved to /var/cache/conftool/dbconfig/20220826-102510-ladsgroup.json
10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33254 and previous config saved to /var/cache/conftool/dbconfig/20220826-102334-ladsgroup.json
10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33253 and previous config saved to /var/cache/conftool/dbconfig/20220826-102117-ladsgroup.json
10:13 vgutierrez: stop testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/826785 in cp6016
10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P33252 and previous config saved to /var/cache/conftool/dbconfig/20220826-100611-ladsgroup.json
09:56 vgutierrez: testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/826785 in cp6016
09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P33251 and previous config saved to /var/cache/conftool/dbconfig/20220826-095104-ladsgroup.json
09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33250 and previous config saved to /var/cache/conftool/dbconfig/20220826-093558-ladsgroup.json
09:33 vgutierrez: disable origin coalescing in cp6007 and cp6008 - T315911
09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33249 and previous config saved to /var/cache/conftool/dbconfig/20220826-093051-ladsgroup.json
09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33248 and previous config saved to /var/cache/conftool/dbconfig/20220826-093034-ladsgroup.json
09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33247 and previous config saved to /var/cache/conftool/dbconfig/20220826-093000-ladsgroup.json
09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P33246 and previous config saved to /var/cache/conftool/dbconfig/20220826-091454-ladsgroup.json
08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P33245 and previous config saved to /var/cache/conftool/dbconfig/20220826-085947-ladsgroup.json
08:47 vgutierrez: Increase roll-out of query-sorting to 30% - T314868
08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33244 and previous config saved to /var/cache/conftool/dbconfig/20220826-084441-ladsgroup.json
08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33243 and previous config saved to /var/cache/conftool/dbconfig/20220826-083424-ladsgroup.json
08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2025.codfw.wmnet to cluster codfw and group D
08:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2025.codfw.wmnet to cluster codfw and group D
08:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P33242 and previous config saved to /var/cache/conftool/dbconfig/20220826-081918-ladsgroup.json
08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P33241 and previous config saved to /var/cache/conftool/dbconfig/20220826-080411-ladsgroup.json
08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2025.codfw.wmnet with OS bullseye
07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33240 and previous config saved to /var/cache/conftool/dbconfig/20220826-074905-ladsgroup.json
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33239 and previous config saved to /var/cache/conftool/dbconfig/20220826-074801-root.json
07:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2025.codfw.wmnet with reason: host reimage
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33238 and previous config saved to /var/cache/conftool/dbconfig/20220826-074434-root.json
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33237 and previous config saved to /var/cache/conftool/dbconfig/20220826-074412-root.json
07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33236 and previous config saved to /var/cache/conftool/dbconfig/20220826-074252-ladsgroup.json
07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33235 and previous config saved to /var/cache/conftool/dbconfig/20220826-074140-root.json
07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33234 and previous config saved to /var/cache/conftool/dbconfig/20220826-074126-ladsgroup.json
07:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2025.codfw.wmnet with reason: host reimage
07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
07:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33233 and previous config saved to /var/cache/conftool/dbconfig/20220826-074052-ladsgroup.json
07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33232 and previous config saved to /var/cache/conftool/dbconfig/20220826-073256-root.json
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33231 and previous config saved to /var/cache/conftool/dbconfig/20220826-072929-root.json
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33230 and previous config saved to /var/cache/conftool/dbconfig/20220826-072908-root.json
07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33229 and previous config saved to /var/cache/conftool/dbconfig/20220826-072635-root.json
07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P33228 and previous config saved to /var/cache/conftool/dbconfig/20220826-072545-ladsgroup.json
07:24 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS bullseye
07:23 vgutierrez: Increase roll-out of query-sorting to 15% - T314868
07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33227 and previous config saved to /var/cache/conftool/dbconfig/20220826-071751-root.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33226 and previous config saved to /var/cache/conftool/dbconfig/20220826-071424-root.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33225 and previous config saved to /var/cache/conftool/dbconfig/20220826-071403-root.json
07:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33224 and previous config saved to /var/cache/conftool/dbconfig/20220826-071131-root.json
07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P33223 and previous config saved to /var/cache/conftool/dbconfig/20220826-071039-ladsgroup.json
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33222 and previous config saved to /var/cache/conftool/dbconfig/20220826-070247-root.json
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33221 and previous config saved to /var/cache/conftool/dbconfig/20220826-065919-root.json
06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33220 and previous config saved to /var/cache/conftool/dbconfig/20220826-065858-root.json
06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33219 and previous config saved to /var/cache/conftool/dbconfig/20220826-065626-root.json
06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33218 and previous config saved to /var/cache/conftool/dbconfig/20220826-065533-ladsgroup.json
06:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33217 and previous config saved to /var/cache/conftool/dbconfig/20220826-065217-ladsgroup.json
06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33216 and previous config saved to /var/cache/conftool/dbconfig/20220826-064742-root.json
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33215 and previous config saved to /var/cache/conftool/dbconfig/20220826-064414-root.json
06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33214 and previous config saved to /var/cache/conftool/dbconfig/20220826-064353-root.json
06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33213 and previous config saved to /var/cache/conftool/dbconfig/20220826-064121-root.json
06:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P33212 and previous config saved to /var/cache/conftool/dbconfig/20220826-063711-ladsgroup.json
06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33211 and previous config saved to /var/cache/conftool/dbconfig/20220826-063237-root.json
06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33210 and previous config saved to /var/cache/conftool/dbconfig/20220826-062910-root.json
06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33209 and previous config saved to /var/cache/conftool/dbconfig/20220826-062849-root.json
06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33208 and previous config saved to /var/cache/conftool/dbconfig/20220826-062616-root.json
06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P33207 and previous config saved to /var/cache/conftool/dbconfig/20220826-062205-ladsgroup.json
06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33206 and previous config saved to /var/cache/conftool/dbconfig/20220826-061732-root.json
06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33205 and previous config saved to /var/cache/conftool/dbconfig/20220826-061405-root.json
06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33204 and previous config saved to /var/cache/conftool/dbconfig/20220826-061344-root.json
06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33203 and previous config saved to /var/cache/conftool/dbconfig/20220826-061112-root.json
06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P33202 and previous config saved to /var/cache/conftool/dbconfig/20220826-060734-ladsgroup.json
06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33201 and previous config saved to /var/cache/conftool/dbconfig/20220826-060658-ladsgroup.json
06:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 2%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33200 and previous config saved to /var/cache/conftool/dbconfig/20220826-060227-root.json
06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33199 and previous config saved to /var/cache/conftool/dbconfig/20220826-060203-ladsgroup.json
06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33198 and previous config saved to /var/cache/conftool/dbconfig/20220826-060146-ladsgroup.json
06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
05:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33197 and previous config saved to /var/cache/conftool/dbconfig/20220826-055900-root.json
05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33196 and previous config saved to /var/cache/conftool/dbconfig/20220826-055839-root.json
05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33195 and previous config saved to /var/cache/conftool/dbconfig/20220826-055607-root.json
05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P33194 and previous config saved to /var/cache/conftool/dbconfig/20220826-055553-ladsgroup.json
05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P33193 and previous config saved to /var/cache/conftool/dbconfig/20220826-055420-ladsgroup.json
05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P33192 and previous config saved to /var/cache/conftool/dbconfig/20220826-055229-ladsgroup.json
05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33191 and previous config saved to /var/cache/conftool/dbconfig/20220826-054722-root.json
05:47 marostegui: Failover m2-master
05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33190 and previous config saved to /var/cache/conftool/dbconfig/20220826-054356-root.json
05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33189 and previous config saved to /var/cache/conftool/dbconfig/20220826-054334-root.json
05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33188 and previous config saved to /var/cache/conftool/dbconfig/20220826-054102-root.json
05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P33186 and previous config saved to /var/cache/conftool/dbconfig/20220826-054048-ladsgroup.json
05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P33185 and previous config saved to /var/cache/conftool/dbconfig/20220826-054023-root.json
05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1194 for the first time in s7 T313569', diff saved to https://phabricator.wikimedia.org/P33184 and previous config saved to /var/cache/conftool/dbconfig/20220826-053954-marostegui.json
05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P33183 and previous config saved to /var/cache/conftool/dbconfig/20220826-053915-ladsgroup.json
05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P33182 and previous config saved to /var/cache/conftool/dbconfig/20220826-053724-ladsgroup.json
05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1193 to dbctl T313569', diff saved to https://phabricator.wikimedia.org/P33181 and previous config saved to /var/cache/conftool/dbconfig/20220826-052715-marostegui.json
05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P33180 and previous config saved to /var/cache/conftool/dbconfig/20220826-052544-ladsgroup.json
05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P33179 and previous config saved to /var/cache/conftool/dbconfig/20220826-052410-ladsgroup.json
05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1192 to dbctl T313569', diff saved to https://phabricator.wikimedia.org/P33178 and previous config saved to /var/cache/conftool/dbconfig/20220826-052233-marostegui.json
05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P33177 and previous config saved to /var/cache/conftool/dbconfig/20220826-052219-ladsgroup.json
05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1185 for the first time in s5 T313569', diff saved to https://phabricator.wikimedia.org/P33176 and previous config saved to /var/cache/conftool/dbconfig/20220826-051721-marostegui.json
05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P33175 and previous config saved to /var/cache/conftool/dbconfig/20220826-051039-ladsgroup.json
05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P33174 and previous config saved to /var/cache/conftool/dbconfig/20220826-050906-ladsgroup.json
05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling for maintenance', diff saved to https://phabricator.wikimedia.org/P33173 and previous config saved to /var/cache/conftool/dbconfig/20220826-050652-ladsgroup.json
05:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
05:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T312160)', diff saved to https://phabricator.wikimedia.org/P33172 and previous config saved to /var/cache/conftool/dbconfig/20220826-003819-ladsgroup.json
00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P33171 and previous config saved to /var/cache/conftool/dbconfig/20220826-002313-ladsgroup.json
00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P33170 and previous config saved to /var/cache/conftool/dbconfig/20220826-000807-ladsgroup.json

2022-08-25

23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T312160)', diff saved to https://phabricator.wikimedia.org/P33169 and previous config saved to /var/cache/conftool/dbconfig/20220825-235300-ladsgroup.json
22:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33168 and previous config saved to /var/cache/conftool/dbconfig/20220825-223805-ladsgroup.json
22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P33167 and previous config saved to /var/cache/conftool/dbconfig/20220825-222259-ladsgroup.json
22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T312160)', diff saved to https://phabricator.wikimedia.org/P33165 and previous config saved to /var/cache/conftool/dbconfig/20220825-220937-ladsgroup.json
22:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2131.codfw.wmnet with reason: Maintenance
22:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2131.codfw.wmnet with reason: Maintenance
22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P33164 and previous config saved to /var/cache/conftool/dbconfig/20220825-220753-ladsgroup.json
21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33163 and previous config saved to /var/cache/conftool/dbconfig/20220825-215247-ladsgroup.json
21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33162 and previous config saved to /var/cache/conftool/dbconfig/20220825-214722-ladsgroup.json
21:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
21:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33161 and previous config saved to /var/cache/conftool/dbconfig/20220825-214649-ladsgroup.json
21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P33160 and previous config saved to /var/cache/conftool/dbconfig/20220825-213143-ladsgroup.json
21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P33159 and previous config saved to /var/cache/conftool/dbconfig/20220825-211637-ladsgroup.json
21:12 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
21:02 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33158 and previous config saved to /var/cache/conftool/dbconfig/20220825-210130-ladsgroup.json
20:56 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
20:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:47 urbanecm: UTC late B&C window done
20:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1aafdf0: cswiki: Add extendedconfirmed group/protection level (T316283) (duration: 03m 42s)
20:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2067.codfw.wmnet
20:45 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2067.codfw.wmnet
20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:39 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/VisualEditor/: 223e81f: Update VE core submodule to master (d4c438548; T316219) (duration: 03m 42s)
20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:35 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.26/skins/Timeless/: ba0e981: Hide new associatedPages navigation items (T316196) (duration: 03m 41s)
20:33 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:31 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.26/skins/Vector/resources/skins.vector.styles/layouts/screen.less: fe3382e: Add clearfix to .mw-body-subheader (T316134, T316095) (duration: 03m 25s)
20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33157 and previous config saved to /var/cache/conftool/dbconfig/20220825-202716-ladsgroup.json
20:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
20:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33156 and previous config saved to /var/cache/conftool/dbconfig/20220825-202647-ladsgroup.json
20:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f37eff3: Make DiscussionTools autotopicsub also opt-out on A/B test wikis (T314693) (duration: 03m 37s)
20:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
20:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T312160)', diff saved to https://phabricator.wikimedia.org/P33155 and previous config saved to /var/cache/conftool/dbconfig/20220825-201756-ladsgroup.json
20:17 urbanecm: [urbanecm@deploy1002 ~]$ rm /var/lock/scap.operations_mediawiki-config.lock # connection to deploy1002 handled, to let me re-sync
20:14 urandom: re-rebooting ms-be2067 to "fix" disk enumeration(?) -- T314049
20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:11 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P33154 and previous config saved to /var/cache/conftool/dbconfig/20220825-201141-ladsgroup.json
20:07 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P33153 and previous config saved to /var/cache/conftool/dbconfig/20220825-200250-ladsgroup.json
19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P33152 and previous config saved to /var/cache/conftool/dbconfig/20220825-195635-ladsgroup.json
19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P33151 and previous config saved to /var/cache/conftool/dbconfig/20220825-194744-ladsgroup.json
19:42 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33150 and previous config saved to /var/cache/conftool/dbconfig/20220825-194129-ladsgroup.json
19:41 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
19:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices1003
19:37 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:36 urandom: rebooting ms-be2067 to "fix" disk enumeration(?) -- T314049
19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33149 and previous config saved to /var/cache/conftool/dbconfig/20220825-193513-ladsgroup.json
19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T316186)', diff saved to https://phabricator.wikimedia.org/P33148 and previous config saved to /var/cache/conftool/dbconfig/20220825-193430-ladsgroup.json
19:33 andrew@cumin1001: START - Cookbook sre.dns.netbox
19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T312160)', diff saved to https://phabricator.wikimedia.org/P33147 and previous config saved to /var/cache/conftool/dbconfig/20220825-193238-ladsgroup.json
19:29 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1003
19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P33146 and previous config saved to /var/cache/conftool/dbconfig/20220825-191924-ladsgroup.json
19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P33145 and previous config saved to /var/cache/conftool/dbconfig/20220825-190417-ladsgroup.json
18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T316186)', diff saved to https://phabricator.wikimedia.org/P33144 and previous config saved to /var/cache/conftool/dbconfig/20220825-184911-ladsgroup.json
18:48 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
18:48 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@d00af45]: bump elasticsearch-hadoop to 7.10.2 (duration: 02m 07s)
18:47 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
18:45 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@d00af45]: bump elasticsearch-hadoop to 7.10.2
18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T316186)', diff saved to https://phabricator.wikimedia.org/P33143 and previous config saved to /var/cache/conftool/dbconfig/20220825-184301-ladsgroup.json
18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T316186)', diff saved to https://phabricator.wikimedia.org/P33142 and previous config saved to /var/cache/conftool/dbconfig/20220825-184233-ladsgroup.json
18:36 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
18:36 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
18:35 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
18:34 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
18:34 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync
18:33 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: sync
18:33 ottomata: rolling restart of eventgate-analytics-external to pick up retroactive schema change for android schemas in T316047
18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P33141 and previous config saved to /var/cache/conftool/dbconfig/20220825-182727-ladsgroup.json
18:19 dancy@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
18:18 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
18:18 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
18:13 dancy@deploy1002: Installation of scap version "4.15.0" completed for 557 hosts
18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P33140 and previous config saved to /var/cache/conftool/dbconfig/20220825-181221-ladsgroup.json
18:11 dancy@deploy1002: Installing scap version "4.15.0" for 557 hosts
18:11 dancy@deploy1002: install-world aborted: (duration: 00m 02s)
17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T316186)', diff saved to https://phabricator.wikimedia.org/P33139 and previous config saved to /var/cache/conftool/dbconfig/20220825-175715-ladsgroup.json
17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T316186)', diff saved to https://phabricator.wikimedia.org/P33138 and previous config saved to /var/cache/conftool/dbconfig/20220825-174946-ladsgroup.json
17:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
17:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2115 (T312160)', diff saved to https://phabricator.wikimedia.org/P33137 and previous config saved to /var/cache/conftool/dbconfig/20220825-174826-ladsgroup.json
17:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2115.codfw.wmnet with reason: Maintenance
17:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2115.codfw.wmnet with reason: Maintenance
17:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
17:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33136 and previous config saved to /var/cache/conftool/dbconfig/20220825-173731-ladsgroup.json
17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P33135 and previous config saved to /var/cache/conftool/dbconfig/20220825-172225-ladsgroup.json
17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:10 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:10 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:09 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:09 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:08 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P33133 and previous config saved to /var/cache/conftool/dbconfig/20220825-170719-ladsgroup.json
16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33132 and previous config saved to /var/cache/conftool/dbconfig/20220825-165213-ladsgroup.json
16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33131 and previous config saved to /var/cache/conftool/dbconfig/20220825-164556-ladsgroup.json
16:40 urandom: shutting down ms-be2067.codfw.wmnet for backplane replacement -- T314049
16:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: backplane replacement
16:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: backplane replacement
16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P33130 and previous config saved to /var/cache/conftool/dbconfig/20220825-163050-ladsgroup.json
16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P33129 and previous config saved to /var/cache/conftool/dbconfig/20220825-161544-ladsgroup.json
16:07 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
16:07 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
16:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
16:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120 (T312160)', diff saved to https://phabricator.wikimedia.org/P33128 and previous config saved to /var/cache/conftool/dbconfig/20220825-160250-ladsgroup.json
16:00 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33127 and previous config saved to /var/cache/conftool/dbconfig/20220825-160036-ladsgroup.json
16:00 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33126 and previous config saved to /var/cache/conftool/dbconfig/20220825-155529-ladsgroup.json
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33125 and previous config saved to /var/cache/conftool/dbconfig/20220825-155506-ladsgroup.json
15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P33124 and previous config saved to /var/cache/conftool/dbconfig/20220825-155401-ladsgroup.json
15:52 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
15:52 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
15:50 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
15:50 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120', diff saved to https://phabricator.wikimedia.org/P33123 and previous config saved to /var/cache/conftool/dbconfig/20220825-154743-ladsgroup.json
15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P33122 and previous config saved to /var/cache/conftool/dbconfig/20220825-154438-ladsgroup.json
15:42 jynus: restart backup1002 (interrupted before), backup1003, backup2003
15:41 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
15:41 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120', diff saved to https://phabricator.wikimedia.org/P33121 and previous config saved to /var/cache/conftool/dbconfig/20220825-153237-ladsgroup.json
15:31 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
15:31 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T316186)', diff saved to https://phabricator.wikimedia.org/P33120 and previous config saved to /var/cache/conftool/dbconfig/20220825-152932-ladsgroup.json
15:27 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 20s)
15:26 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T316186)', diff saved to https://phabricator.wikimedia.org/P33119 and previous config saved to /var/cache/conftool/dbconfig/20220825-152417-ladsgroup.json
15:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
15:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
15:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
15:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
15:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120 (T312160)', diff saved to https://phabricator.wikimedia.org/P33118 and previous config saved to /var/cache/conftool/dbconfig/20220825-151731-ladsgroup.json
14:44 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
14:43 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
14:42 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
14:42 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
14:36 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
14:36 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
14:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Remove node for eventual reimage, T311686
14:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Remove node for eventual reimage, T311686
14:32 vgutierrez: enable origin coalescing in ats-be@cp600[78] [expect crashes] - T315911
14:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
14:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
14:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1004.eqiad.wmnet
14:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1004.eqiad.wmnet
14:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
14:15 claime: finished rebooting people1003 (people.wikimedia.org)
14:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
14:13 claime: rebooting people1003 (people.wikimedia.org)
14:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T316186)', diff saved to https://phabricator.wikimedia.org/P33117 and previous config saved to /var/cache/conftool/dbconfig/20220825-140915-ladsgroup.json
14:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
13:57 hashar@deploy1002: Finished scap: Backport for CX3 Build 0.2.0+20220825 (T309986 T301222) (duration: 24m 56s)
13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P33116 and previous config saved to /var/cache/conftool/dbconfig/20220825-135408-ladsgroup.json
13:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1003.eqiad.wmnet
13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1120 (T312160)', diff saved to https://phabricator.wikimedia.org/P33115 and previous config saved to /var/cache/conftool/dbconfig/20220825-134318-ladsgroup.json
13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1120.eqiad.wmnet with reason: Maintenance
13:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1120.eqiad.wmnet with reason: Maintenance
13:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1003.eqiad.wmnet
13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P33114 and previous config saved to /var/cache/conftool/dbconfig/20220825-133902-ladsgroup.json
13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:32 hashar@deploy1002: Started scap: Backport for CX3 Build 0.2.0+20220825 (T309986 T301222)
13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T316186)', diff saved to https://phabricator.wikimedia.org/P33113 and previous config saved to /var/cache/conftool/dbconfig/20220825-132356-ladsgroup.json
13:19 vgutierrez: disable origin coalescing in ats-be globally - T315911
13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T316186)', diff saved to https://phabricator.wikimedia.org/P33112 and previous config saved to /var/cache/conftool/dbconfig/20220825-131735-ladsgroup.json
13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P33111 and previous config saved to /var/cache/conftool/dbconfig/20220825-130950-ladsgroup.json
13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T316186)', diff saved to https://phabricator.wikimedia.org/P33110 and previous config saved to /var/cache/conftool/dbconfig/20220825-130235-ladsgroup.json
13:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
13:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
13:00 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
13:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P33109 and previous config saved to /var/cache/conftool/dbconfig/20220825-125806-ladsgroup.json
12:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet
12:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
12:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1002.eqiad.wmnet
12:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1001.eqiad.wmnet
12:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
12:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
12:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
12:46 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db2114.codfw.wmnet
12:45 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.26 refs T314187
12:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1001.eqiad.wmnet
12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.reboot-single for host db2114.codfw.wmnet
12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T316186)', diff saved to https://phabricator.wikimedia.org/P33108 and previous config saved to /var/cache/conftool/dbconfig/20220825-123448-ladsgroup.json
12:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Testing a script
12:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Testing a script
12:06 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 11 days, 0:00:00 on ms-fe1012.eqiad.wmnet with reason: known depooled, left for investigation
12:06 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 11 days, 0:00:00 on ms-fe1012.eqiad.wmnet with reason: known depooled, left for investigation
11:57 godog: roll-restart swift-proxy on thanos-fe* and ms-fe* (not ms-fe1012)
11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
11:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
11:40 godog: depool ms-fe1012, leave swift-proxy alone for investigation
11:32 godog: restart swift-proxy on ms-fe1010
11:29 marostegui: Failover m1-master
11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P33106 and previous config saved to /var/cache/conftool/dbconfig/20220825-110448-ladsgroup.json
10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P33105 and previous config saved to /var/cache/conftool/dbconfig/20220825-104942-ladsgroup.json
10:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P33104 and previous config saved to /var/cache/conftool/dbconfig/20220825-103436-ladsgroup.json
10:23 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P33103 and previous config saved to /var/cache/conftool/dbconfig/20220825-101930-ladsgroup.json
10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
10:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T312160)', diff saved to https://phabricator.wikimedia.org/P33102 and previous config saved to /var/cache/conftool/dbconfig/20220825-100915-ladsgroup.json
10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
10:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P33100 and previous config saved to /var/cache/conftool/dbconfig/20220825-100010-root.json
09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P33099 and previous config saved to /var/cache/conftool/dbconfig/20220825-095942-ladsgroup.json
09:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P33098 and previous config saved to /var/cache/conftool/dbconfig/20220825-095611-ladsgroup.json
09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P33097 and previous config saved to /var/cache/conftool/dbconfig/20220825-095408-ladsgroup.json
09:51 moritzm: installing libxslt security updates on bullseye
09:50 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
09:49 jynus: restart backup1002, backup2002
09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P33096 and previous config saved to /var/cache/conftool/dbconfig/20220825-094646-ladsgroup.json
09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33095 and previous config saved to /var/cache/conftool/dbconfig/20220825-094438-root.json
09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33094 and previous config saved to /var/cache/conftool/dbconfig/20220825-094401-root.json
09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33093 and previous config saved to /var/cache/conftool/dbconfig/20220825-094353-root.json
09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33092 and previous config saved to /var/cache/conftool/dbconfig/20220825-094345-root.json
09:39 marostegui: Reboot stand by dbproxy hosts
09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P33091 and previous config saved to /var/cache/conftool/dbconfig/20220825-093902-ladsgroup.json
09:35 jynus: restart backup2001
09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P33090 and previous config saved to /var/cache/conftool/dbconfig/20220825-093140-ladsgroup.json
09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33089 and previous config saved to /var/cache/conftool/dbconfig/20220825-092933-root.json
09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33088 and previous config saved to /var/cache/conftool/dbconfig/20220825-092856-root.json
09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33087 and previous config saved to /var/cache/conftool/dbconfig/20220825-092848-root.json
09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33086 and previous config saved to /var/cache/conftool/dbconfig/20220825-092840-root.json
09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T312160)', diff saved to https://phabricator.wikimedia.org/P33085 and previous config saved to /var/cache/conftool/dbconfig/20220825-092356-ladsgroup.json
09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T314041)', diff saved to https://phabricator.wikimedia.org/P33084 and previous config saved to /var/cache/conftool/dbconfig/20220825-091633-ladsgroup.json
09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P33083 and previous config saved to /var/cache/conftool/dbconfig/20220825-091448-root.json
09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T314041)', diff saved to https://phabricator.wikimedia.org/P33082 and previous config saved to /var/cache/conftool/dbconfig/20220825-091447-ladsgroup.json
09:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33081 and previous config saved to /var/cache/conftool/dbconfig/20220825-091428-root.json
09:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33080 and previous config saved to /var/cache/conftool/dbconfig/20220825-091351-root.json
09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33079 and previous config saved to /var/cache/conftool/dbconfig/20220825-091344-root.json
09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33078 and previous config saved to /var/cache/conftool/dbconfig/20220825-091336-root.json
09:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T314041)', diff saved to https://phabricator.wikimedia.org/P33077 and previous config saved to /var/cache/conftool/dbconfig/20220825-091325-ladsgroup.json
09:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:05 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.26 refs T314187 (duration: 03m 30s)
09:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:02 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.26 refs T314187
09:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P33075 and previous config saved to /var/cache/conftool/dbconfig/20220825-085943-root.json
08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33074 and previous config saved to /var/cache/conftool/dbconfig/20220825-085924-root.json
08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33073 and previous config saved to /var/cache/conftool/dbconfig/20220825-085847-root.json
08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33072 and previous config saved to /var/cache/conftool/dbconfig/20220825-085839-root.json
08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33071 and previous config saved to /var/cache/conftool/dbconfig/20220825-085831-root.json
08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P33070 and previous config saved to /var/cache/conftool/dbconfig/20220825-085819-ladsgroup.json
08:54 moritzm: installing curl security updates on bullseye
08:50 moritzm: installing gnutls28 security updates on bullseye
08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P33069 and previous config saved to /var/cache/conftool/dbconfig/20220825-084438-root.json
08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33068 and previous config saved to /var/cache/conftool/dbconfig/20220825-084419-root.json
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33067 and previous config saved to /var/cache/conftool/dbconfig/20220825-084342-root.json
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33066 and previous config saved to /var/cache/conftool/dbconfig/20220825-084334-root.json
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33065 and previous config saved to /var/cache/conftool/dbconfig/20220825-084326-root.json
08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P33064 and previous config saved to /var/cache/conftool/dbconfig/20220825-084313-ladsgroup.json
08:39 jynus: restarting backupmon1001
08:30 marostegui: Failover m1 from db1164 to db1195 - T315864
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P33063 and previous config saved to /var/cache/conftool/dbconfig/20220825-082933-root.json
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33062 and previous config saved to /var/cache/conftool/dbconfig/20220825-082915-root.json
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33061 and previous config saved to /var/cache/conftool/dbconfig/20220825-082837-root.json
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33060 and previous config saved to /var/cache/conftool/dbconfig/20220825-082830-root.json
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33059 and previous config saved to /var/cache/conftool/dbconfig/20220825-082821-root.json
08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T314041)', diff saved to https://phabricator.wikimedia.org/P33058 and previous config saved to /var/cache/conftool/dbconfig/20220825-082807-ladsgroup.json
08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T314041)', diff saved to https://phabricator.wikimedia.org/P33057 and previous config saved to /var/cache/conftool/dbconfig/20220825-082621-ladsgroup.json
08:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
08:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T314041)', diff saved to https://phabricator.wikimedia.org/P33056 and previous config saved to /var/cache/conftool/dbconfig/20220825-082559-ladsgroup.json
08:23 vgutierrez: Increase roll-out of query-sorting to 5% - T314868
08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P33055 and previous config saved to /var/cache/conftool/dbconfig/20220825-081429-root.json
08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33054 and previous config saved to /var/cache/conftool/dbconfig/20220825-081410-root.json
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33053 and previous config saved to /var/cache/conftool/dbconfig/20220825-081333-root.json
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33052 and previous config saved to /var/cache/conftool/dbconfig/20220825-081325-root.json
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33051 and previous config saved to /var/cache/conftool/dbconfig/20220825-081316-root.json
08:13 jynus: stopping bacula services on backup1001 T315864
08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P33050 and previous config saved to /var/cache/conftool/dbconfig/20220825-081053-ladsgroup.json
08:09 marostegui: Reboot db1195 for kernel upgrade T315864
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P33049 and previous config saved to /var/cache/conftool/dbconfig/20220825-075924-root.json
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33048 and previous config saved to /var/cache/conftool/dbconfig/20220825-075905-root.json
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33047 and previous config saved to /var/cache/conftool/dbconfig/20220825-075828-root.json
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33046 and previous config saved to /var/cache/conftool/dbconfig/20220825-075820-root.json
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33045 and previous config saved to /var/cache/conftool/dbconfig/20220825-075811-root.json
07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P33044 and previous config saved to /var/cache/conftool/dbconfig/20220825-075547-ladsgroup.json
07:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Switchover m1 T315864
07:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Switchover m1 T315864
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33042 and previous config saved to /var/cache/conftool/dbconfig/20220825-074400-root.json
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33041 and previous config saved to /var/cache/conftool/dbconfig/20220825-074323-root.json
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33040 and previous config saved to /var/cache/conftool/dbconfig/20220825-074315-root.json
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33039 and previous config saved to /var/cache/conftool/dbconfig/20220825-074307-root.json
07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T312160)', diff saved to https://phabricator.wikimedia.org/P33038 and previous config saved to /var/cache/conftool/dbconfig/20220825-074220-ladsgroup.json
07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1137.eqiad.wmnet with reason: Maintenance
07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1137.eqiad.wmnet with reason: Maintenance
07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T314041)', diff saved to https://phabricator.wikimedia.org/P33037 and previous config saved to /var/cache/conftool/dbconfig/20220825-074041-ladsgroup.json
07:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T314041)', diff saved to https://phabricator.wikimedia.org/P33036 and previous config saved to /var/cache/conftool/dbconfig/20220825-073855-ladsgroup.json
07:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
07:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T314041)', diff saved to https://phabricator.wikimedia.org/P33035 and previous config saved to /var/cache/conftool/dbconfig/20220825-073834-ladsgroup.json
07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:36 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1012 to pc2 master T315526 (duration: 03m 39s)
07:34 marostegui: Promote pc1012 back as pc2 master T315526
07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33034 and previous config saved to /var/cache/conftool/dbconfig/20220825-072340-root.json
07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P33033 and previous config saved to /var/cache/conftool/dbconfig/20220825-072327-ladsgroup.json
07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33032 and previous config saved to /var/cache/conftool/dbconfig/20220825-070835-root.json
07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P33031 and previous config saved to /var/cache/conftool/dbconfig/20220825-070821-ladsgroup.json
06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33030 and previous config saved to /var/cache/conftool/dbconfig/20220825-065331-root.json
06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T314041)', diff saved to https://phabricator.wikimedia.org/P33029 and previous config saved to /var/cache/conftool/dbconfig/20220825-065315-ladsgroup.json
06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T314041)', diff saved to https://phabricator.wikimedia.org/P33028 and previous config saved to /var/cache/conftool/dbconfig/20220825-065128-ladsgroup.json
06:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
06:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
06:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
06:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33027 and previous config saved to /var/cache/conftool/dbconfig/20220825-063826-root.json
06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
06:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
06:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
06:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
06:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maint on s4 old master
06:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maint on s4 old master
06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1138 T315419', diff saved to https://phabricator.wikimedia.org/P33026 and previous config saved to /var/cache/conftool/dbconfig/20220825-062852-ladsgroup.json
06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1160 to s4 primary and set section read-write T315419', diff saved to https://phabricator.wikimedia.org/P33025 and previous config saved to /var/cache/conftool/dbconfig/20220825-062425-ladsgroup.json
06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T315419', diff saved to https://phabricator.wikimedia.org/P33024 and previous config saved to /var/cache/conftool/dbconfig/20220825-062353-ladsgroup.json
06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33023 and previous config saved to /var/cache/conftool/dbconfig/20220825-062321-root.json
06:22 Amir1: Starting s4 eqiad failover from db1138 to db1160 - T315419
06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33022 and previous config saved to /var/cache/conftool/dbconfig/20220825-060816-root.json
06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P33020 and previous config saved to /var/cache/conftool/dbconfig/20220825-060601-root.json
05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1191 with minimal weight in s7 T313569', diff saved to https://phabricator.wikimedia.org/P33019 and previous config saved to /var/cache/conftool/dbconfig/20220825-055057-root.json
05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1191 to dbctl T313569', diff saved to https://phabricator.wikimedia.org/P33018 and previous config saved to /var/cache/conftool/dbconfig/20220825-055038-marostegui.json
05:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
05:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
05:46 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.26/includes/page/Article.php: Backport: Display page namespace with spaces instead of underscores when page doesn't exist (T316092) (duration: 03m 32s)
05:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
05:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1190 with minimal weight in s4 T313569', diff saved to https://phabricator.wikimedia.org/P33017 and previous config saved to /var/cache/conftool/dbconfig/20220825-053310-root.json
05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1190 to dbctl T313569', diff saved to https://phabricator.wikimedia.org/P33016 and previous config saved to /var/cache/conftool/dbconfig/20220825-053253-marostegui.json
05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1160 with weight 0 T315419', diff saved to https://phabricator.wikimedia.org/P33015 and previous config saved to /var/cache/conftool/dbconfig/20220825-052415-ladsgroup.json
05:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T315419
05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s4 T315419
05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1188 with minimal weight in s2 T313569', diff saved to https://phabricator.wikimedia.org/P33013 and previous config saved to /var/cache/conftool/dbconfig/20220825-051754-root.json
05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1188 to dbctl T313569', diff saved to https://phabricator.wikimedia.org/P33012 and previous config saved to /var/cache/conftool/dbconfig/20220825-051737-marostegui.json
05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1186 with minimal weight in s1 T313569', diff saved to https://phabricator.wikimedia.org/P33011 and previous config saved to /var/cache/conftool/dbconfig/20220825-051155-root.json
05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1186 to dbctl', diff saved to https://phabricator.wikimedia.org/P33010 and previous config saved to /var/cache/conftool/dbconfig/20220825-051130-marostegui.json
05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P33008 and previous config saved to /var/cache/conftool/dbconfig/20220825-050713-root.json
05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T314041)', diff saved to https://phabricator.wikimedia.org/P33007 and previous config saved to /var/cache/conftool/dbconfig/20220825-050539-ladsgroup.json
04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P33006 and previous config saved to /var/cache/conftool/dbconfig/20220825-045033-ladsgroup.json
04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P33005 and previous config saved to /var/cache/conftool/dbconfig/20220825-043527-ladsgroup.json
04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T314041)', diff saved to https://phabricator.wikimedia.org/P33004 and previous config saved to /var/cache/conftool/dbconfig/20220825-042020-ladsgroup.json
04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T314041)', diff saved to https://phabricator.wikimedia.org/P33003 and previous config saved to /var/cache/conftool/dbconfig/20220825-041833-ladsgroup.json
04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
04:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T314041)', diff saved to https://phabricator.wikimedia.org/P33002 and previous config saved to /var/cache/conftool/dbconfig/20220825-041812-ladsgroup.json
04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P33001 and previous config saved to /var/cache/conftool/dbconfig/20220825-040306-ladsgroup.json
03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P33000 and previous config saved to /var/cache/conftool/dbconfig/20220825-034759-ladsgroup.json
03:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T314041)', diff saved to https://phabricator.wikimedia.org/P32999 and previous config saved to /var/cache/conftool/dbconfig/20220825-033253-ladsgroup.json
03:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T314041)', diff saved to https://phabricator.wikimedia.org/P32998 and previous config saved to /var/cache/conftool/dbconfig/20220825-033107-ladsgroup.json
03:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
03:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32997 and previous config saved to /var/cache/conftool/dbconfig/20220825-033045-ladsgroup.json
03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P32996 and previous config saved to /var/cache/conftool/dbconfig/20220825-031539-ladsgroup.json
03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P32995 and previous config saved to /var/cache/conftool/dbconfig/20220825-030033-ladsgroup.json
02:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32994 and previous config saved to /var/cache/conftool/dbconfig/20220825-024527-ladsgroup.json
02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32993 and previous config saved to /var/cache/conftool/dbconfig/20220825-024339-ladsgroup.json
02:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
02:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32992 and previous config saved to /var/cache/conftool/dbconfig/20220825-024318-ladsgroup.json
02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P32991 and previous config saved to /var/cache/conftool/dbconfig/20220825-022812-ladsgroup.json
02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P32990 and previous config saved to /var/cache/conftool/dbconfig/20220825-021306-ladsgroup.json
01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32989 and previous config saved to /var/cache/conftool/dbconfig/20220825-015800-ladsgroup.json
01:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32988 and previous config saved to /var/cache/conftool/dbconfig/20220825-015612-ladsgroup.json
01:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
01:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T314041)', diff saved to https://phabricator.wikimedia.org/P32987 and previous config saved to /var/cache/conftool/dbconfig/20220825-015550-ladsgroup.json
01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P32986 and previous config saved to /var/cache/conftool/dbconfig/20220825-014044-ladsgroup.json
01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P32985 and previous config saved to /var/cache/conftool/dbconfig/20220825-012538-ladsgroup.json
01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T314041)', diff saved to https://phabricator.wikimedia.org/P32984 and previous config saved to /var/cache/conftool/dbconfig/20220825-011032-ladsgroup.json
01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T314041)', diff saved to https://phabricator.wikimedia.org/P32983 and previous config saved to /var/cache/conftool/dbconfig/20220825-010845-ladsgroup.json
01:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
01:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T314041)', diff saved to https://phabricator.wikimedia.org/P32982 and previous config saved to /var/cache/conftool/dbconfig/20220825-010824-ladsgroup.json
00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P32981 and previous config saved to /var/cache/conftool/dbconfig/20220825-005318-ladsgroup.json
00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P32980 and previous config saved to /var/cache/conftool/dbconfig/20220825-003812-ladsgroup.json
00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T314041)', diff saved to https://phabricator.wikimedia.org/P32979 and previous config saved to /var/cache/conftool/dbconfig/20220825-002306-ladsgroup.json
00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T314041)', diff saved to https://phabricator.wikimedia.org/P32978 and previous config saved to /var/cache/conftool/dbconfig/20220825-002120-ladsgroup.json
00:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
00:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
00:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
00:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T314041)', diff saved to https://phabricator.wikimedia.org/P32977 and previous config saved to /var/cache/conftool/dbconfig/20220825-001949-ladsgroup.json
00:15 ejegg: fundraising scheduled jobs re-enabled
00:08 eileen: config revision changed from ab95bc89 to 2d10cc5f
00:08 eileen: civicrm upgraded from ff9b377d to a31c7590
00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32976 and previous config saved to /var/cache/conftool/dbconfig/20220825-000443-ladsgroup.json

2022-08-24

23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32975 and previous config saved to /var/cache/conftool/dbconfig/20220824-234937-ladsgroup.json
23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T314041)', diff saved to https://phabricator.wikimedia.org/P32974 and previous config saved to /var/cache/conftool/dbconfig/20220824-233431-ladsgroup.json
23:33 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restarting to apply OpenJDK 8u342 - eevans@cumin1001
23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T314041)', diff saved to https://phabricator.wikimedia.org/P32973 and previous config saved to /var/cache/conftool/dbconfig/20220824-233046-ladsgroup.json
23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32972 and previous config saved to /var/cache/conftool/dbconfig/20220824-233025-ladsgroup.json
23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32971 and previous config saved to /var/cache/conftool/dbconfig/20220824-231519-ladsgroup.json
23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32970 and previous config saved to /var/cache/conftool/dbconfig/20220824-230013-ladsgroup.json
22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32969 and previous config saved to /var/cache/conftool/dbconfig/20220824-224507-ladsgroup.json
22:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32968 and previous config saved to /var/cache/conftool/dbconfig/20220824-224214-ladsgroup.json
22:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
22:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T314041)', diff saved to https://phabricator.wikimedia.org/P32967 and previous config saved to /var/cache/conftool/dbconfig/20220824-224153-ladsgroup.json
22:37 ryankemper: [Elastic] We're back to green in `cloudelastic-chi`, so cloudelastic is back to fully healthy
22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32966 and previous config saved to /var/cache/conftool/dbconfig/20220824-222646-ladsgroup.json
22:20 ryankemper: [Elastic] We've got the cloudelastic instances all back up. A bunch of shard recoveries ongoing; currently the cluster is red. It might go all the way back to green; hard to say until the shard recoveries complete.
22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32965 and previous config saved to /var/cache/conftool/dbconfig/20220824-221140-ladsgroup.json
21:58 ryankemper: [Elastic] `ryankemper@cloudelastic1003:~$ sudo systemctl restart elasticsearch_6@cloudelastic-chi-eqiad.service`, 1003 was also oom-killed: `[4165984.362182] Out of memory: Killed process 3759 (java) total-vm:2277062348kB, anon-rss:61648756kB, file-rss:0kB, shmem-rss:0kB, UID:113 pgtables:1448136kB oom_score_adj:0`
21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T314041)', diff saved to https://phabricator.wikimedia.org/P32964 and previous config saved to /var/cache/conftool/dbconfig/20220824-215634-ladsgroup.json
21:54 ryankemper: [Elastic] `ryankemper@cloudelastic1004:~$ sudo systemctl restart elasticsearch_6@cloudelastic-chi-eqiad.service` Restarting 1004's chi eqiad, it died due to `Aug 24 21:43:21 cloudelastic1004 systemd[1]: elasticsearch_6@cloudelastic-chi-eqiad.service: Main process exited, code=killed, status=9/KILL`
21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T314041)', diff saved to https://phabricator.wikimedia.org/P32963 and previous config saved to /var/cache/conftool/dbconfig/20220824-215143-ladsgroup.json
21:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
21:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
21:51 eileen: civicrm upgraded from 632d5f5f to ff9b377d
21:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
21:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
21:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T314041)', diff saved to https://phabricator.wikimedia.org/P32962 and previous config saved to /var/cache/conftool/dbconfig/20220824-215025-ladsgroup.json
21:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 6 hosts with reason: T316159
21:48 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 6 hosts with reason: T316159
21:48 eileen: config revision changed from c2aa4158 to ab95bc89
21:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32961 and previous config saved to /var/cache/conftool/dbconfig/20220824-213519-ladsgroup.json
21:23 dzahn@cumin2002: conftool action : set/weight=25; selector: name=mw134[1-8].eqiad.wmnet,cluster=api_appserver
21:22 dzahn@cumin2002: conftool action : set/weight=25; selector: name=mw133[1-9].eqiad.wmnet,cluster=api_appserver
21:22 dzahn@cumin2002: conftool action : set/weight=25; selector: name=mw133[1-9].eqiad.wmnet,cluster=appserver
21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32959 and previous config saved to /var/cache/conftool/dbconfig/20220824-212013-ladsgroup.json
21:20 mutante: setting weight to 25 (from 30) for appservers and API servers in the range mw1307 through mw1348 because they are of an older hardware type (not changing weights of jobrunners/videoscalers even if in this range) (T304800)
21:18 dzahn@cumin2002: conftool action : set/weight=25; selector: name=mw132[1-9].eqiad.wmnet
21:15 dzahn@cumin2002: conftool action : set/weight=25; selector: name=mw131[2-7].eqiad.wmnet
21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T314041)', diff saved to https://phabricator.wikimedia.org/P32958 and previous config saved to /var/cache/conftool/dbconfig/20220824-210507-ladsgroup.json
21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T314041)', diff saved to https://phabricator.wikimedia.org/P32957 and previous config saved to /var/cache/conftool/dbconfig/20220824-210216-ladsgroup.json
21:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
21:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32956 and previous config saved to /var/cache/conftool/dbconfig/20220824-210155-ladsgroup.json
20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32955 and previous config saved to /var/cache/conftool/dbconfig/20220824-204649-ladsgroup.json
20:44 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restarting to apply OpenJDK 8u342 - eevans@cumin1001
20:40 mutante: otrs1001 - systemctl reset failed
20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32954 and previous config saved to /var/cache/conftool/dbconfig/20220824-203143-ladsgroup.json
20:21 ejegg: updated standalone SmashPig deploy from 13e9e9cc to 11ba0a1b
20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32953 and previous config saved to /var/cache/conftool/dbconfig/20220824-201637-ladsgroup.json
20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32952 and previous config saved to /var/cache/conftool/dbconfig/20220824-201344-ladsgroup.json
20:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
20:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
20:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
20:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T314041)', diff saved to https://phabricator.wikimedia.org/P32951 and previous config saved to /var/cache/conftool/dbconfig/20220824-201224-ladsgroup.json
19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32950 and previous config saved to /var/cache/conftool/dbconfig/20220824-195717-ladsgroup.json
19:55 ejegg: civicrm upgraded from edfe2f16 to 632d5f5f
19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32949 and previous config saved to /var/cache/conftool/dbconfig/20220824-194211-ladsgroup.json
19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T314041)', diff saved to https://phabricator.wikimedia.org/P32948 and previous config saved to /var/cache/conftool/dbconfig/20220824-192705-ladsgroup.json
19:23 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/GeoCrumbs/includes/Hooks.php: Backport: Convert page title to variant properly (T316085) (duration: 02m 50s)
19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T314041)', diff saved to https://phabricator.wikimedia.org/P32947 and previous config saved to /var/cache/conftool/dbconfig/20220824-192119-ladsgroup.json
19:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
19:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
19:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
19:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P32946 and previous config saved to /var/cache/conftool/dbconfig/20220824-191943-ladsgroup.json
19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32945 and previous config saved to /var/cache/conftool/dbconfig/20220824-190437-ladsgroup.json
18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32944 and previous config saved to /var/cache/conftool/dbconfig/20220824-184931-ladsgroup.json
18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P32943 and previous config saved to /var/cache/conftool/dbconfig/20220824-183425-ladsgroup.json
17:46 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to apply OpenJDK 8u342 - eevans@cumin1001
17:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2096.codfw.wmnet with reason: Maintenance
17:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2096.codfw.wmnet with reason: Maintenance
17:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1103.eqiad.wmnet with reason: Maintenance
17:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1103.eqiad.wmnet with reason: Maintenance
17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P32942 and previous config saved to /var/cache/conftool/dbconfig/20220824-173409-ladsgroup.json
17:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
17:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
17:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
17:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
17:06 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1009.eqiad.wmnet with OS bullseye
16:51 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.26/skins/Vector/resources/mediawiki.less.legacy/mediawiki.skin.variables.less: Backport: Vector legacy no longer imports variables from Vector modern (T213778) (duration: 02m 52s)
16:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
16:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
16:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
16:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
16:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
16:30 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
16:26 mutante: mwmaint1002 systemctl start mediawiki_job_initsitestats T315121
16:17 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1009.eqiad.wmnet with OS bullseye
16:15 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1007.eqiad.wmnet with OS bullseye
16:05 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
16:00 hashar: Restarted CI Jenkins, Release Jenkins, Gerrit replica and Gerrit
15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T312975)', diff saved to https://phabricator.wikimedia.org/P32941 and previous config saved to /var/cache/conftool/dbconfig/20220824-151445-ladsgroup.json
15:12 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to apply OpenJDK 8u342 - eevans@cumin1001
15:04 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Restarting to canary OpenJDK 8u342 - eevans@cumin1001
15:01 btullis: restarting pybal on lvs1019
14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P32940 and previous config saved to /var/cache/conftool/dbconfig/20220824-145939-ladsgroup.json
14:57 btullis: restarting pybal on lvs1020
14:55 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Restarting to canary OpenJDK 8u342 - eevans@cumin1001
14:48 moritzm: powercycling krb2002
14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P32939 and previous config saved to /var/cache/conftool/dbconfig/20220824-144432-ladsgroup.json
14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T314041)', diff saved to https://phabricator.wikimedia.org/P32938 and previous config saved to /var/cache/conftool/dbconfig/20220824-143923-ladsgroup.json
14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T312975)', diff saved to https://phabricator.wikimedia.org/P32937 and previous config saved to /var/cache/conftool/dbconfig/20220824-142926-ladsgroup.json
14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2115 (T312975)', diff saved to https://phabricator.wikimedia.org/P32936 and previous config saved to /var/cache/conftool/dbconfig/20220824-142715-ladsgroup.json
14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
14:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
14:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T312975)', diff saved to https://phabricator.wikimedia.org/P32935 and previous config saved to /var/cache/conftool/dbconfig/20220824-142623-ladsgroup.json
14:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1185.eqiad.wmnet with OS bullseye
14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32934 and previous config saved to /var/cache/conftool/dbconfig/20220824-142416-ladsgroup.json
14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
14:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P32933 and previous config saved to /var/cache/conftool/dbconfig/20220824-141117-ladsgroup.json
14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32932 and previous config saved to /var/cache/conftool/dbconfig/20220824-140910-ladsgroup.json
14:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P32931 and previous config saved to /var/cache/conftool/dbconfig/20220824-135611-ladsgroup.json
13:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T314041)', diff saved to https://phabricator.wikimedia.org/P32930 and previous config saved to /var/cache/conftool/dbconfig/20220824-135404-ladsgroup.json
13:49 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "Group 1 wikis to 1.39.0-wmf.26" # T316085 T314187
13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T314041)', diff saved to https://phabricator.wikimedia.org/P32929 and previous config saved to /var/cache/conftool/dbconfig/20220824-134118-ladsgroup.json
13:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T312975)', diff saved to https://phabricator.wikimedia.org/P32928 and previous config saved to /var/cache/conftool/dbconfig/20220824-134104-ladsgroup.json
13:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T314041)', diff saved to https://phabricator.wikimedia.org/P32927 and previous config saved to /var/cache/conftool/dbconfig/20220824-134057-ladsgroup.json
13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T312975)', diff saved to https://phabricator.wikimedia.org/P32926 and previous config saved to /var/cache/conftool/dbconfig/20220824-133953-ladsgroup.json
13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T312975)', diff saved to https://phabricator.wikimedia.org/P32925 and previous config saved to /var/cache/conftool/dbconfig/20220824-133932-ladsgroup.json
13:31 taavi: taavi@mwmaint1002 ~ $ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki mediawikiwiki "Africa Wikimedia Developers Project" "African Wikimedia Technical Community" "Taavi" --reason "per request phab:T316066" # T316066
13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repooling after cloning db1191', diff saved to https://phabricator.wikimedia.org/P32924 and previous config saved to /var/cache/conftool/dbconfig/20220824-132908-root.json
13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P32919 and previous config saved to /var/cache/conftool/dbconfig/20220824-130920-ladsgroup.json
12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repooling after cloning db1191', diff saved to https://phabricator.wikimedia.org/P32918 and previous config saved to /var/cache/conftool/dbconfig/20220824-125858-root.json
12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T314041)', diff saved to https://phabricator.wikimedia.org/P32917 and previous config saved to /var/cache/conftool/dbconfig/20220824-125537-ladsgroup.json
12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T312975)', diff saved to https://phabricator.wikimedia.org/P32916 and previous config saved to /var/cache/conftool/dbconfig/20220824-125414-ladsgroup.json
12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T312975)', diff saved to https://phabricator.wikimedia.org/P32915 and previous config saved to /var/cache/conftool/dbconfig/20220824-125003-ladsgroup.json
12:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1137.eqiad.wmnet with reason: Maintenance
12:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1137.eqiad.wmnet with reason: Maintenance
12:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
12:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
12:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
12:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120 (T312975)', diff saved to https://phabricator.wikimedia.org/P32914 and previous config saved to /var/cache/conftool/dbconfig/20220824-124905-ladsgroup.json
12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repooling after cloning db1191', diff saved to https://phabricator.wikimedia.org/P32913 and previous config saved to /var/cache/conftool/dbconfig/20220824-124354-root.json
12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T314041)', diff saved to https://phabricator.wikimedia.org/P32912 and previous config saved to /var/cache/conftool/dbconfig/20220824-124346-ladsgroup.json
12:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
12:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
12:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
12:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
12:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120', diff saved to https://phabricator.wikimedia.org/P32911 and previous config saved to /var/cache/conftool/dbconfig/20220824-123358-ladsgroup.json
12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repooling after cloning db1191', diff saved to https://phabricator.wikimedia.org/P32910 and previous config saved to /var/cache/conftool/dbconfig/20220824-122848-root.json
12:24 moritzm: installing containerd security updates
12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120', diff saved to https://phabricator.wikimedia.org/P32909 and previous config saved to /var/cache/conftool/dbconfig/20220824-121852-ladsgroup.json
12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling after cloning db1191', diff saved to https://phabricator.wikimedia.org/P32908 and previous config saved to /var/cache/conftool/dbconfig/20220824-121343-root.json
12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120 (T312975)', diff saved to https://phabricator.wikimedia.org/P32907 and previous config saved to /var/cache/conftool/dbconfig/20220824-120346-ladsgroup.json
12:01 Amir1: killed refresh links-recomm scripts in rowiki, cswiki, simplewiki, frwiki (T299021)
11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1120 (T312975)', diff saved to https://phabricator.wikimedia.org/P32906 and previous config saved to /var/cache/conftool/dbconfig/20220824-115935-ladsgroup.json
11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1120.eqiad.wmnet with reason: Maintenance
11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1120.eqiad.wmnet with reason: Maintenance
11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
11:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
11:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
11:42 klausman@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching ml-cache*: Rolling restart to activate new JRE - klausman@cumin1001
11:38 slyngs: Migrate mdadm array checks to systemd timers. Gerrit: 819577
11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32905 and previous config saved to /var/cache/conftool/dbconfig/20220824-112938-root.json
11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32904 and previous config saved to /var/cache/conftool/dbconfig/20220824-111433-root.json
11:07 klausman@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching ml-cache*: Rolling restart to activate new JRE - klausman@cumin1001
10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32903 and previous config saved to /var/cache/conftool/dbconfig/20220824-105928-root.json
10:52 vgutierrez: disable origin coalescing in ats@cp600[78] - T315911
10:46 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
10:46 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32902 and previous config saved to /var/cache/conftool/dbconfig/20220824-104424-root.json
10:36 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
10:35 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
10:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
10:32 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 10%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32901 and previous config saved to /var/cache/conftool/dbconfig/20220824-102919-root.json
10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 5%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32900 and previous config saved to /var/cache/conftool/dbconfig/20220824-101414-root.json
09:46 vgutierrez: Restart incremental roll-out of query-sorting at 1% - T314868
08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32899 and previous config saved to /var/cache/conftool/dbconfig/20220824-085902-root.json
08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32898 and previous config saved to /var/cache/conftool/dbconfig/20220824-085639-root.json
08:49 jayme: jayme@builder-envoy-03:~$ sudo apt-get remove --purge linux-image-4.19.0-6-amd64-dbg linux-image-4.19.0-14-amd64-dbg
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32897 and previous config saved to /var/cache/conftool/dbconfig/20220824-084357-root.json
08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32896 and previous config saved to /var/cache/conftool/dbconfig/20220824-084134-root.json
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32895 and previous config saved to /var/cache/conftool/dbconfig/20220824-082852-root.json
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P32893 and previous config saved to /var/cache/conftool/dbconfig/20220824-082809-root.json
08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32892 and previous config saved to /var/cache/conftool/dbconfig/20220824-082630-root.json
08:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:19 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.26 refs T314187 (duration: 02m 46s)
08:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:16 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.26 refs T314187
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32891 and previous config saved to /var/cache/conftool/dbconfig/20220824-081347-root.json
08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32890 and previous config saved to /var/cache/conftool/dbconfig/20220824-081125-root.json
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32888 and previous config saved to /var/cache/conftool/dbconfig/20220824-075955-root.json
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32887 and previous config saved to /var/cache/conftool/dbconfig/20220824-075946-root.json
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P32886 and previous config saved to /var/cache/conftool/dbconfig/20220824-075927-root.json
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32885 and previous config saved to /var/cache/conftool/dbconfig/20220824-075843-root.json
07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32884 and previous config saved to /var/cache/conftool/dbconfig/20220824-075620-root.json
07:47 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc2 master T315526 (duration: 02m 48s)
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32883 and previous config saved to /var/cache/conftool/dbconfig/20220824-074451-root.json
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32882 and previous config saved to /var/cache/conftool/dbconfig/20220824-074441-root.json
07:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:41 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc2 master T315526 (duration: 03m 03s)
07:40 marostegui: Promote pc1014 to pc2 master T315526
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32880 and previous config saved to /var/cache/conftool/dbconfig/20220824-072946-root.json
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32879 and previous config saved to /var/cache/conftool/dbconfig/20220824-072937-root.json
07:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32878 and previous config saved to /var/cache/conftool/dbconfig/20220824-071441-root.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32877 and previous config saved to /var/cache/conftool/dbconfig/20220824-071432-root.json
07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:13 tgr: UTC morning deploys done
07:12 tgr@deploy1002: Synchronized wmf-config: Config: Drop unused wgGECampaignPattern (duration: 02m 57s)
07:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32876 and previous config saved to /var/cache/conftool/dbconfig/20220824-065937-root.json
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32875 and previous config saved to /var/cache/conftool/dbconfig/20220824-065927-root.json
06:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
06:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
06:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
06:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
06:50 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/FlaggedRevs/frontend/FlaggedRevsUIHooks.php: Backport: Changes list filter: don't add fields that are already in the query (T316026) (duration: 02m 57s)
06:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
06:46 hashar: Restarted Gerrit to enable replication configuration autoloading
06:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
06:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32874 and previous config saved to /var/cache/conftool/dbconfig/20220824-064432-root.json
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32873 and previous config saved to /var/cache/conftool/dbconfig/20220824-064423-root.json
06:42 marostegui: dbmaint x1 codfw T312574
06:41 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/FlaggedRevs/frontend/FlaggedRevsUIHooks.php: Backport: Changes list filter: don't add fields that are already in the query (T316026) (duration: 03m 07s)
06:37 marostegui: dbmaint s3 T312160
06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32872 and previous config saved to /var/cache/conftool/dbconfig/20220824-062927-root.json
06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32871 and previous config saved to /var/cache/conftool/dbconfig/20220824-062918-root.json
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P32869 and previous config saved to /var/cache/conftool/dbconfig/20220824-061532-root.json
06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32868 and previous config saved to /var/cache/conftool/dbconfig/20220824-061422-root.json
06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32867 and previous config saved to /var/cache/conftool/dbconfig/20220824-061413-root.json
05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32866 and previous config saved to /var/cache/conftool/dbconfig/20220824-055918-root.json
05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32865 and previous config saved to /var/cache/conftool/dbconfig/20220824-055909-root.json
05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P32863 and previous config saved to /var/cache/conftool/dbconfig/20220824-054719-root.json
05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32862 and previous config saved to /var/cache/conftool/dbconfig/20220824-054404-root.json
05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1187 with minimal weight', diff saved to https://phabricator.wikimedia.org/P32861 and previous config saved to /var/cache/conftool/dbconfig/20220824-054018-root.json
05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1189 with minimal weight', diff saved to https://phabricator.wikimedia.org/P32860 and previous config saved to /var/cache/conftool/dbconfig/20220824-053434-root.json
05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Move db2180 from s4 to s6', diff saved to https://phabricator.wikimedia.org/P32859 and previous config saved to /var/cache/conftool/dbconfig/20220824-053311-root.json
05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1189 with minimal weight', diff saved to https://phabricator.wikimedia.org/P32858 and previous config saved to /var/cache/conftool/dbconfig/20220824-053141-root.json

2022-08-23

22:31 mutante: mwmaint1002 - find /var/lib/puppet/clientbucket -type f -size +100M -delete
22:16 dancy@deploy1002: Testing. Ignore
21:19 wfan: Updateing di-config from e447ff7c to 3c27af23
21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:08 urbanecm@deploy1002: Synchronized wmf-config/abusefilter.php: 8fb3575: trwikiquote: Enable block feature of abusefilter (T315736) (duration: 02m 57s)
20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
19:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T315604
19:34 bking@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T315604
18:15 hashar: Restarting CI Jenkins
18:07 ejegg: payments-wiki upgraded from fb50c013 to a63b300e
17:41 hashar: Stopping Gerrit
17:39 hashar@deploy1002: Finished deploy [gerrit/gerrit@cb7edfb]: Revert Gerrit from 3.4.5 to 3.4.4 # T315942 (duration: 00m 08s)
17:39 hashar@deploy1002: Started deploy [gerrit/gerrit@cb7edfb]: Revert Gerrit from 3.4.5 to 3.4.4 # T315942
17:39 hashar@deploy1002: Finished deploy [gerrit/gerrit@e11e6a7]: Revert Gerrit from 3.4.5 to 3.4.4 # T315942 (duration: 00m 04s)
17:39 hashar@deploy1002: Started deploy [gerrit/gerrit@e11e6a7]: Revert Gerrit from 3.4.5 to 3.4.4 # T315942
17:37 cwhite: restart tcpircbot T257861
17:33 inflatador: 'bking@cumin starting thanos-swift cleanup for wdqs T316031'
17:21 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/includes/specials/SpecialRecentChangesLinked.php: Backport: SpecialRecentChangesLinked: Pass query builder instead of SQL (duration: 03m 32s)
17:17 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: rdbms: Switch to getConnectionInternal() in getPrimaryPos() (duration: 03m 27s)
17:00 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.26/includes/specials/SpecialRecentChangesLinked.php: Backport: SpecialRecentChangesLinked: Pass query builder instead of SQL (duration: 03m 34s)
16:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
16:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
16:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
16:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
16:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
16:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
16:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
16:03 krinkle@deploy1002: Synchronized wmf-config/: Ifd90ae (duration: 03m 18s)
16:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
16:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
16:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
16:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
15:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
15:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32854 and previous config saved to /var/cache/conftool/dbconfig/20220823-155013-root.json
15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
15:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
15:44 krinkle@deploy1002: Synchronized wmf-config/: I1c5b05 (duration: 03m 25s)
15:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
15:41 mutante: gerrit - service restart
15:36 mutante: gerrit2002 - service restart
15:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32853 and previous config saved to /var/cache/conftool/dbconfig/20220823-153508-root.json
15:30 sbassett: Deployed security patch for T307278 to wmf.25
15:25 sbassett: Deployed security patch for T307278 to wmf.26
15:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32852 and previous config saved to /var/cache/conftool/dbconfig/20220823-152003-root.json
15:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dave Pifke out of all services on: 774 hosts
15:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dave Pifke out of all services on: 774 hosts
15:08 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dave Pifke out of all services on: 1238 hosts
15:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dave Pifke out of all services on: 1238 hosts
15:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Erin Yener out of all services on: 774 hosts
15:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Erin Yener out of all services on: 774 hosts
15:06 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Erin Yener out of all services on: 1238 hosts
15:06 mutante: gerrit - service restart - T315942 - added sshd.enableDeprecatedKexAlgorithms = true
15:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Erin Yener out of all services on: 1238 hosts
15:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32851 and previous config saved to /var/cache/conftool/dbconfig/20220823-150459-root.json
15:04 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Effeietsanders out of all services on: 1238 hosts
15:04 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Effeietsanders out of all services on: 1238 hosts
15:04 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Effeietsanders out of all services on: 774 hosts
15:02 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Effeietsanders out of all services on: 774 hosts
15:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
14:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32850 and previous config saved to /var/cache/conftool/dbconfig/20220823-144954-root.json
14:49 pt1979@cumin2002: START - Cookbook sre.dns.netbox
14:48 moritzm: installing libtirpc security updates
14:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bcef1d5: Start writing to cuc_actor everywhere (T233004) (duration: 03m 18s)
14:39 krinkle@deploy1002: Synchronized wmf-config/redis.php: Ib99479 (duration: 03m 47s)
14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32849 and previous config saved to /var/cache/conftool/dbconfig/20220823-143450-root.json
14:21 marostegui: Run schema change on db1160 T303603
14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P32848 and previous config saved to /var/cache/conftool/dbconfig/20220823-142011-root.json
14:03 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
14:02 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:01 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:00 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
13:59 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:58 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b3b9e0a: Start writing to cuc_actor on s8 (T233004) (duration: 03m 31s)
13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Maintenance
12:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Maintenance
12:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
12:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312972)', diff saved to https://phabricator.wikimedia.org/P32847 and previous config saved to /var/cache/conftool/dbconfig/20220823-125824-marostegui.json
12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P32846 and previous config saved to /var/cache/conftool/dbconfig/20220823-124317-marostegui.json
12:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2019.codfw.wmnet to cluster codfw and group B
12:39 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2019.codfw.wmnet to cluster codfw and group B
12:33 vgutierrez: Incremental roll-out of query-sorting (15%) - T314868
12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P32845 and previous config saved to /var/cache/conftool/dbconfig/20220823-122811-marostegui.json
12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312972)', diff saved to https://phabricator.wikimedia.org/P32844 and previous config saved to /var/cache/conftool/dbconfig/20220823-121305-marostegui.json
12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T312972)', diff saved to https://phabricator.wikimedia.org/P32843 and previous config saved to /var/cache/conftool/dbconfig/20220823-121159-marostegui.json
12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32842 and previous config saved to /var/cache/conftool/dbconfig/20220823-121055-marostegui.json
11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P32841 and previous config saved to /var/cache/conftool/dbconfig/20220823-115549-marostegui.json
11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32840 and previous config saved to /var/cache/conftool/dbconfig/20220823-114220-root.json
11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P32839 and previous config saved to /var/cache/conftool/dbconfig/20220823-114043-marostegui.json
11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32838 and previous config saved to /var/cache/conftool/dbconfig/20220823-112715-root.json
11:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32837 and previous config saved to /var/cache/conftool/dbconfig/20220823-112537-marostegui.json
11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32836 and previous config saved to /var/cache/conftool/dbconfig/20220823-112430-marostegui.json
11:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
11:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312972)', diff saved to https://phabricator.wikimedia.org/P32835 and previous config saved to /var/cache/conftool/dbconfig/20220823-112408-marostegui.json
11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32834 and previous config saved to /var/cache/conftool/dbconfig/20220823-111210-root.json
11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32833 and previous config saved to /var/cache/conftool/dbconfig/20220823-111139-root.json
11:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P32832 and previous config saved to /var/cache/conftool/dbconfig/20220823-110902-marostegui.json
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32831 and previous config saved to /var/cache/conftool/dbconfig/20220823-105706-root.json
10:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32830 and previous config saved to /var/cache/conftool/dbconfig/20220823-105634-root.json
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P32829 and previous config saved to /var/cache/conftool/dbconfig/20220823-105356-marostegui.json
10:49 btullis@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: cluster=dse-k8s,service=kubemaster
10:46 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32828 and previous config saved to /var/cache/conftool/dbconfig/20220823-104201-root.json
10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 60%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32827 and previous config saved to /var/cache/conftool/dbconfig/20220823-104126-root.json
10:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312972)', diff saved to https://phabricator.wikimedia.org/P32826 and previous config saved to /var/cache/conftool/dbconfig/20220823-103850-marostegui.json
10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T312972)', diff saved to https://phabricator.wikimedia.org/P32825 and previous config saved to /var/cache/conftool/dbconfig/20220823-103742-marostegui.json
10:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
10:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
10:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
10:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312972)', diff saved to https://phabricator.wikimedia.org/P32824 and previous config saved to /var/cache/conftool/dbconfig/20220823-103704-marostegui.json
10:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32823 and previous config saved to /var/cache/conftool/dbconfig/20220823-102657-root.json
10:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32822 and previous config saved to /var/cache/conftool/dbconfig/20220823-102622-root.json
10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P32821 and previous config saved to /var/cache/conftool/dbconfig/20220823-102158-marostegui.json
10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 40%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32820 and previous config saved to /var/cache/conftool/dbconfig/20220823-101117-root.json
10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32819 and previous config saved to /var/cache/conftool/dbconfig/20220823-101048-root.json
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P32818 and previous config saved to /var/cache/conftool/dbconfig/20220823-100652-marostegui.json
10:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
10:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:56 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.26 refs T314187
09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 30%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32817 and previous config saved to /var/cache/conftool/dbconfig/20220823-095613-root.json
09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32816 and previous config saved to /var/cache/conftool/dbconfig/20220823-095543-root.json
09:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312972)', diff saved to https://phabricator.wikimedia.org/P32815 and previous config saved to /var/cache/conftool/dbconfig/20220823-095146-marostegui.json
09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T312972)', diff saved to https://phabricator.wikimedia.org/P32814 and previous config saved to /var/cache/conftool/dbconfig/20220823-095039-marostegui.json
09:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
09:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312972)', diff saved to https://phabricator.wikimedia.org/P32813 and previous config saved to /var/cache/conftool/dbconfig/20220823-095018-marostegui.json
09:49 hashar@deploy1002: Pruned MediaWiki: 1.39.0-wmf.23 (duration: 02m 20s)
09:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:42 XioNoX: add NAT rule for frdev1002 on pfw3-eqiad - T315579
09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 20%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32812 and previous config saved to /var/cache/conftool/dbconfig/20220823-094108-root.json
09:41 hashar@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.26 refs T314187 (duration: 35m 32s)
09:40 vgutierrez: Incremental roll-out of query-sorting (5%) - T314868
09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32811 and previous config saved to /var/cache/conftool/dbconfig/20220823-094039-root.json
09:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P32810 and previous config saved to /var/cache/conftool/dbconfig/20220823-093512-marostegui.json
09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32809 and previous config saved to /var/cache/conftool/dbconfig/20220823-092603-root.json
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P32808 and previous config saved to /var/cache/conftool/dbconfig/20220823-092534-root.json
09:23 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
09:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
09:21 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
09:21 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P32807 and previous config saved to /var/cache/conftool/dbconfig/20220823-092006-marostegui.json
09:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 8%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32806 and previous config saved to /var/cache/conftool/dbconfig/20220823-091059-root.json
09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32805 and previous config saved to /var/cache/conftool/dbconfig/20220823-091029-root.json
09:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:05 hashar@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.26 refs T314187
09:05 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312972)', diff saved to https://phabricator.wikimedia.org/P32804 and previous config saved to /var/cache/conftool/dbconfig/20220823-090500-marostegui.json
09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T312972)', diff saved to https://phabricator.wikimedia.org/P32803 and previous config saved to /var/cache/conftool/dbconfig/20220823-090353-marostegui.json
09:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
09:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312972)', diff saved to https://phabricator.wikimedia.org/P32802 and previous config saved to /var/cache/conftool/dbconfig/20220823-090332-marostegui.json
08:56 hashar@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/Translate/extension.json: Backport: Add declarations for TranslatablePage in extension.json (T315889) (duration: 03m 39s)
08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32801 and previous config saved to /var/cache/conftool/dbconfig/20220823-085554-root.json
08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32800 and previous config saved to /var/cache/conftool/dbconfig/20220823-085525-root.json
08:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P32799 and previous config saved to /var/cache/conftool/dbconfig/20220823-084826-marostegui.json
08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
08:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2019.codfw.wmnet to cluster codfw and group B
08:44 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2019.codfw.wmnet to cluster codfw and group B
08:41 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
08:41 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 2%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32798 and previous config saved to /var/cache/conftool/dbconfig/20220823-084050-root.json
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P32797 and previous config saved to /var/cache/conftool/dbconfig/20220823-084020-root.json
08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P32796 and previous config saved to /var/cache/conftool/dbconfig/20220823-083319-marostegui.json
08:33 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable message bundle on MetaWiki for WikiLearn (T311587) (duration: 03m 27s)
08:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P32794 and previous config saved to /var/cache/conftool/dbconfig/20220823-082605-root.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32793 and previous config saved to /var/cache/conftool/dbconfig/20220823-082545-root.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32792 and previous config saved to /var/cache/conftool/dbconfig/20220823-082515-root.json
08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P32790 and previous config saved to /var/cache/conftool/dbconfig/20220823-082336-root.json
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P32789 and previous config saved to /var/cache/conftool/dbconfig/20220823-082215-root.json
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312972)', diff saved to https://phabricator.wikimedia.org/P32788 and previous config saved to /var/cache/conftool/dbconfig/20220823-081813-marostegui.json
08:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T312972)', diff saved to https://phabricator.wikimedia.org/P32787 and previous config saved to /var/cache/conftool/dbconfig/20220823-081706-marostegui.json
08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32786 and previous config saved to /var/cache/conftool/dbconfig/20220823-081645-marostegui.json
08:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32785 and previous config saved to /var/cache/conftool/dbconfig/20220823-080710-root.json
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P32784 and previous config saved to /var/cache/conftool/dbconfig/20220823-080139-marostegui.json
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P32783 and previous config saved to /var/cache/conftool/dbconfig/20220823-074633-marostegui.json
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32781 and previous config saved to /var/cache/conftool/dbconfig/20220823-073127-marostegui.json
07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32780 and previous config saved to /var/cache/conftool/dbconfig/20220823-073020-marostegui.json
07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
07:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
07:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2019.codfw.wmnet with OS bullseye
07:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T312972)', diff saved to https://phabricator.wikimedia.org/P32779 and previous config saved to /var/cache/conftool/dbconfig/20220823-072943-marostegui.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P32778 and previous config saved to /var/cache/conftool/dbconfig/20220823-071437-marostegui.json
07:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2019.codfw.wmnet with reason: host reimage
07:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2019.codfw.wmnet with reason: host reimage
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P32777 and previous config saved to /var/cache/conftool/dbconfig/20220823-065931-marostegui.json
06:50 kart_: Updated cxserver to 2022-08-22-093815-production (T308248, T308371)
06:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2019.codfw.wmnet with OS bullseye
06:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:45 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:45 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T312972)', diff saved to https://phabricator.wikimedia.org/P32776 and previous config saved to /var/cache/conftool/dbconfig/20220823-064425-marostegui.json
06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T312972)', diff saved to https://phabricator.wikimedia.org/P32775 and previous config saved to /var/cache/conftool/dbconfig/20220823-064318-marostegui.json
06:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
06:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T312972)', diff saved to https://phabricator.wikimedia.org/P32774 and previous config saved to /var/cache/conftool/dbconfig/20220823-064257-marostegui.json
06:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2019.codfw.wmnet with reason: Remove node for eventual reimage, T311686
06:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2019.codfw.wmnet with reason: Remove node for eventual reimage, T311686
06:39 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:38 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P32773 and previous config saved to /var/cache/conftool/dbconfig/20220823-062751-marostegui.json
06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P32772 and previous config saved to /var/cache/conftool/dbconfig/20220823-061245-marostegui.json
05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T312972)', diff saved to https://phabricator.wikimedia.org/P32771 and previous config saved to /var/cache/conftool/dbconfig/20220823-055739-marostegui.json
05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T312972)', diff saved to https://phabricator.wikimedia.org/P32770 and previous config saved to /var/cache/conftool/dbconfig/20220823-053929-marostegui.json
05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
05:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
05:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T312972)', diff saved to https://phabricator.wikimedia.org/P32769 and previous config saved to /var/cache/conftool/dbconfig/20220823-053852-marostegui.json
05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P32768 and previous config saved to /var/cache/conftool/dbconfig/20220823-052346-marostegui.json
05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P32767 and previous config saved to /var/cache/conftool/dbconfig/20220823-050840-marostegui.json
04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T312972)', diff saved to https://phabricator.wikimedia.org/P32765 and previous config saved to /var/cache/conftool/dbconfig/20220823-045334-marostegui.json
04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131', diff saved to https://phabricator.wikimedia.org/P32764 and previous config saved to /var/cache/conftool/dbconfig/20220823-045322-root.json
04:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T312972)', diff saved to https://phabricator.wikimedia.org/P32763 and previous config saved to /var/cache/conftool/dbconfig/20220823-045227-marostegui.json
04:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
04:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
03:00 TimStarling: on wtp1025,wtp1027,wtp1029,wtp1031,wtp1033,wtp1035: set scaling_governor to performance T315398
02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
01:41 TimStarling: on mw1411, mw1413, mw1419, mw1429, mw1431, mw1433: set energy_performance_preference to balance_performance T315398
01:11 TimStarling: on mw1411, mw1413, mw1419, mw1429, mw1431, mw1433: set scaling_governor to powersave and energy_performance_preference to performance
00:09 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1187.eqiad.wmnet with OS bullseye

2022-08-22

23:55 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1187.eqiad.wmnet with reason: host reimage
23:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1187.eqiad.wmnet with reason: host reimage
23:39 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1187.eqiad.wmnet with OS bullseye
23:10 tstarling@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(appservers|api)-ro
23:04 TimStarling: Re-enable multi-DC mode on testwiki, test2wiki and mediawiki.org
21:56 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
21:55 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
21:46 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
21:45 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
21:26 sbassett: Deployed security fix for T310763
21:17 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
21:17 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
21:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf2002.codfw.wmnet with OS bullseye
21:06 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host db1185.eqiad.wmnet
21:04 pt1979@cumin1001: START - Cookbook sre.hosts.dhcp for host db1185.eqiad.wmnet
21:02 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1185.eqiad.wmnet with OS bullseye
21:01 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
20:59 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1185.eqiad.wmnet with OS bullseye
20:59 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
20:59 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1195.eqiad.wmnet with OS bullseye
20:58 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
20:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
20:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
20:51 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.25/skins/Vector/: e0ff763: Layout: Restore disabling of max width on certain pages (T315460) (duration: 03m 37s)
20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2002.codfw.wmnet with OS bullseye
20:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf2001.codfw.wmnet with OS bullseye
20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
20:04 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@5ac442f]: Use instance specific HDFS cache on analytics (duration: 00m 40s)
20:03 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@5ac442f]: Use instance specific HDFS cache on analytics
19:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2001.codfw.wmnet with OS bullseye
19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-wf2002']
19:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-wf2002']
19:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-wf2001']
19:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-wf2001']
19:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf2002.mgmt.codfw.wmnet with reboot policy FORCED
19:11 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics_test@5ac442f]: Use instance specific HDFS cache on analytics_test (duration: 00m 17s)
19:11 xcollazo@deploy1002: Started deploy [airflow-dags/analytics_test@5ac442f]: Use instance specific HDFS cache on analytics_test
19:04 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics_test@9edd1ab]: Use instance specific HDFS cache on analytics_test (duration: 00m 05s)
19:04 xcollazo@deploy1002: Started deploy [airflow-dags/analytics_test@9edd1ab]: Use instance specific HDFS cache on analytics_test
18:59 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@5ac442f]: Use instance specific HDFS cache on platform_eng (duration: 00m 10s)
18:59 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@5ac442f]: Use instance specific HDFS cache on platform_eng
18:54 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc-wf2002.mgmt.codfw.wmnet with reboot policy FORCED
18:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf2001.mgmt.codfw.wmnet with reboot policy FORCED
18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc-wf2001.mgmt.codfw.wmnet with reboot policy FORCED
18:26 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
18:19 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-wf2002
18:19 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-wf2002
18:18 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-wf2001
18:18 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-wf2001
18:12 vgutierrez: disable origin coalescing in ats@cp601[56] - T315911
17:15 damilare: payments-wiki upgraded from f9f91f1f to fb50c013
15:52 XioNoX: un-drain ulsfo-codfw circuit for Lumen hot cut - T300716
15:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32759 and previous config saved to /var/cache/conftool/dbconfig/20220822-152000-root.json
15:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32758 and previous config saved to /var/cache/conftool/dbconfig/20220822-150456-root.json
14:54 XioNoX: drain ulsfo-codfw circuit for Lumen hot cut - T300716
14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2144', diff saved to https://phabricator.wikimedia.org/P32757 and previous config saved to /var/cache/conftool/dbconfig/20220822-145040-marostegui.json
14:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 60%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32756 and previous config saved to /var/cache/conftool/dbconfig/20220822-144951-root.json
14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32755 and previous config saved to /var/cache/conftool/dbconfig/20220822-144943-root.json
14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Restore x2 weight', diff saved to https://phabricator.wikimedia.org/P32754 and previous config saved to /var/cache/conftool/dbconfig/20220822-144937-marostegui.json
14:38 moritzm: draining ganeti2019 for reimage T311686
14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2144 to x2 primary T315853', diff saved to https://phabricator.wikimedia.org/P32752 and previous config saved to /var/cache/conftool/dbconfig/20220822-143243-root.json
14:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32751 and previous config saved to /var/cache/conftool/dbconfig/20220822-143212-root.json
14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32750 and previous config saved to /var/cache/conftool/dbconfig/20220822-143040-root.json
14:24 marostegui: Starting x2 codfw failover from db2144 to db2142 - T315853
14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2142 with weight 0 T313811', diff saved to https://phabricator.wikimedia.org/P32749 and previous config saved to /var/cache/conftool/dbconfig/20220822-142312-marostegui.json
14:22 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T313811
14:22 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T313811
14:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 40%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32748 and previous config saved to /var/cache/conftool/dbconfig/20220822-141708-root.json
14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32747 and previous config saved to /var/cache/conftool/dbconfig/20220822-141535-root.json
14:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 30%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32746 and previous config saved to /var/cache/conftool/dbconfig/20220822-140203-root.json
14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32745 and previous config saved to /var/cache/conftool/dbconfig/20220822-140030-root.json
13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs[1014-1016].eqiad.wmnet
13:48 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs[1014-1016].eqiad.wmnet
13:46 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 20%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32744 and previous config saved to /var/cache/conftool/dbconfig/20220822-134658-root.json
13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32743 and previous config saved to /var/cache/conftool/dbconfig/20220822-134526-root.json
13:44 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
13:39 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
13:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wdqs[1014-1016].eqiad.wmnet with reason: T314890
13:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on wdqs[1014-1016].eqiad.wmnet with reason: T314890
13:37 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
13:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32742 and previous config saved to /var/cache/conftool/dbconfig/20220822-133154-root.json
13:31 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:31 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
13:31 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32741 and previous config saved to /var/cache/conftool/dbconfig/20220822-133021-root.json
13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P32740 and previous config saved to /var/cache/conftool/dbconfig/20220822-132808-root.json
13:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:25 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
13:17 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
13:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 8%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32738 and previous config saved to /var/cache/conftool/dbconfig/20220822-131649-root.json
13:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:13 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/includes: Backport: SiteStats: Make sure initSiteStats.php re-distribute values (T315693) (duration: 03m 32s)
13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
13:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
13:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312972)', diff saved to https://phabricator.wikimedia.org/P32737 and previous config saved to /var/cache/conftool/dbconfig/20220822-130732-marostegui.json
13:03 jynus: disabled backup scheduling for backup1002, backup2002 T315864
13:01 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32735 and previous config saved to /var/cache/conftool/dbconfig/20220822-130144-root.json
12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P32734 and previous config saved to /var/cache/conftool/dbconfig/20220822-125226-marostegui.json
12:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
12:48 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
12:46 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 2%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32732 and previous config saved to /var/cache/conftool/dbconfig/20220822-124640-root.json
12:45 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
12:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P32731 and previous config saved to /var/cache/conftool/dbconfig/20220822-123720-marostegui.json
12:33 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32730 and previous config saved to /var/cache/conftool/dbconfig/20220822-123135-root.json
12:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2006.wikimedia.org
12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312972)', diff saved to https://phabricator.wikimedia.org/P32729 and previous config saved to /var/cache/conftool/dbconfig/20220822-122214-marostegui.json
12:20 jayme: kubernetes1016:~$ sudo systemctl reset-failed ifup@ens13.service - T273026
12:20 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2006.wikimedia.org
12:20 moritzm: fix up network config for ldap-replica2006 T273026
12:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
12:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
12:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
12:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1023 for reboot T315542', diff saved to https://phabricator.wikimedia.org/P32728 and previous config saved to /var/cache/conftool/dbconfig/20220822-121401-root.json
12:13 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es5 T315542 (duration: 03m 18s)
12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1024 to es5 primary T315542', diff saved to https://phabricator.wikimedia.org/P32727 and previous config saved to /var/cache/conftool/dbconfig/20220822-120611-root.json
12:05 marostegui: Starting es5 eqiad failover from es1023 to es1024 - T315542
12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1024 with weight 10 T315542', diff saved to https://phabricator.wikimedia.org/P32726 and previous config saved to /var/cache/conftool/dbconfig/20220822-120141-root.json
12:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
11:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
11:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
11:51 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es5 T315542 (duration: 03m 08s)
11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Switchover es5 T315542
11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Switchover es5 T315542
11:36 moritzm: installing libdatetime-timezone-perl updates from SUA update
11:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32725 and previous config saved to /var/cache/conftool/dbconfig/20220822-113352-root.json
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T312972)', diff saved to https://phabricator.wikimedia.org/P32724 and previous config saved to /var/cache/conftool/dbconfig/20220822-112829-marostegui.json
11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
11:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312972)', diff saved to https://phabricator.wikimedia.org/P32723 and previous config saved to /var/cache/conftool/dbconfig/20220822-112808-marostegui.json
11:25 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dse-k8s-ctrl1001.eqiad.wmnet
11:18 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32722 and previous config saved to /var/cache/conftool/dbconfig/20220822-111847-root.json
11:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P32721 and previous config saved to /var/cache/conftool/dbconfig/20220822-111301-marostegui.json
11:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 60%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32720 and previous config saved to /var/cache/conftool/dbconfig/20220822-110342-root.json
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P32719 and previous config saved to /var/cache/conftool/dbconfig/20220822-105755-marostegui.json
10:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32718 and previous config saved to /var/cache/conftool/dbconfig/20220822-104838-root.json
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312972)', diff saved to https://phabricator.wikimedia.org/P32717 and previous config saved to /var/cache/conftool/dbconfig/20220822-104249-marostegui.json
10:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 40%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32716 and previous config saved to /var/cache/conftool/dbconfig/20220822-103333-root.json
10:18 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 30%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32715 and previous config saved to /var/cache/conftool/dbconfig/20220822-101828-root.json
10:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 20%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32714 and previous config saved to /var/cache/conftool/dbconfig/20220822-100324-root.json
10:00 vgutierrez: Incremental roll-out of query-sorting (1%) - T314868
09:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster1001.eqiad.wmnet
09:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:41 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
09:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:38 XioNoX: push new policy on pfw3-eqiad - T315578
09:36 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
09:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 8%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32709 and previous config saved to /var/cache/conftool/dbconfig/20220822-093314-root.json
09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P32708 and previous config saved to /var/cache/conftool/dbconfig/20220822-092706-marostegui.json
09:18 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32706 and previous config saved to /var/cache/conftool/dbconfig/20220822-091810-root.json
09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P32705 and previous config saved to /var/cache/conftool/dbconfig/20220822-091200-marostegui.json
09:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 2%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32704 and previous config saved to /var/cache/conftool/dbconfig/20220822-090305-root.json
08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312972)', diff saved to https://phabricator.wikimedia.org/P32703 and previous config saved to /var/cache/conftool/dbconfig/20220822-085654-marostegui.json
08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T312972)', diff saved to https://phabricator.wikimedia.org/P32702 and previous config saved to /var/cache/conftool/dbconfig/20220822-085014-marostegui.json
08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
08:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312972)', diff saved to https://phabricator.wikimedia.org/P32701 and previous config saved to /var/cache/conftool/dbconfig/20220822-084942-marostegui.json
08:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32700 and previous config saved to /var/cache/conftool/dbconfig/20220822-084800-root.json
08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 ', diff saved to https://phabricator.wikimedia.org/P32699 and previous config saved to /var/cache/conftool/dbconfig/20220822-084359-root.json
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32698 and previous config saved to /var/cache/conftool/dbconfig/20220822-084335-root.json
08:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P32697 and previous config saved to /var/cache/conftool/dbconfig/20220822-083436-marostegui.json
08:33 moritzm: powercycling wdqs1014 (unresponsive via botched wdqs-categories process
08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 for reboot T310485', diff saved to https://phabricator.wikimedia.org/P32696 and previous config saved to /var/cache/conftool/dbconfig/20220822-083341-root.json
08:32 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es4 T315540 (duration: 03m 17s)
08:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32695 and previous config saved to /var/cache/conftool/dbconfig/20220822-082958-root.json
08:29 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Introducing variables for php 7.4 migration (duration: 03m 39s)
08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1021 to es4 primary T315540', diff saved to https://phabricator.wikimedia.org/P32694 and previous config saved to /var/cache/conftool/dbconfig/20220822-082208-root.json
08:21 marostegui: Starting es4 eqiad failover from es1020 to es1021 - T315540
08:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P32693 and previous config saved to /var/cache/conftool/dbconfig/20220822-081930-marostegui.json
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1021 with weight 10 T315540', diff saved to https://phabricator.wikimedia.org/P32692 and previous config saved to /var/cache/conftool/dbconfig/20220822-081817-root.json
08:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Switchover es4 T315540
08:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Switchover es4 T315540
08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32691 and previous config saved to /var/cache/conftool/dbconfig/20220822-081453-root.json
08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:11 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es4 T315540 (duration: 03m 35s)
08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312972)', diff saved to https://phabricator.wikimedia.org/P32690 and previous config saved to /var/cache/conftool/dbconfig/20220822-080424-marostegui.json
08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32689 and previous config saved to /var/cache/conftool/dbconfig/20220822-080020-root.json
08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32688 and previous config saved to /var/cache/conftool/dbconfig/20220822-080012-root.json
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32687 and previous config saved to /var/cache/conftool/dbconfig/20220822-075949-root.json
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32686 and previous config saved to /var/cache/conftool/dbconfig/20220822-075941-root.json
07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2182 to dbctl T311494', diff saved to https://phabricator.wikimedia.org/P32685 and previous config saved to /var/cache/conftool/dbconfig/20220822-075359-marostegui.json
07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32684 and previous config saved to /var/cache/conftool/dbconfig/20220822-074515-root.json
07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32683 and previous config saved to /var/cache/conftool/dbconfig/20220822-074507-root.json
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P32682 and previous config saved to /var/cache/conftool/dbconfig/20220822-074443-root.json
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32681 and previous config saved to /var/cache/conftool/dbconfig/20220822-074437-root.json
07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32677 and previous config saved to /var/cache/conftool/dbconfig/20220822-073010-root.json
07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32676 and previous config saved to /var/cache/conftool/dbconfig/20220822-073002-root.json
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32675 and previous config saved to /var/cache/conftool/dbconfig/20220822-072938-root.json
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32674 and previous config saved to /var/cache/conftool/dbconfig/20220822-072932-root.json
07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T312972)', diff saved to https://phabricator.wikimedia.org/P32673 and previous config saved to /var/cache/conftool/dbconfig/20220822-072339-marostegui.json
07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
07:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32672 and previous config saved to /var/cache/conftool/dbconfig/20220822-071506-root.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32671 and previous config saved to /var/cache/conftool/dbconfig/20220822-071458-root.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32670 and previous config saved to /var/cache/conftool/dbconfig/20220822-071433-root.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32669 and previous config saved to /var/cache/conftool/dbconfig/20220822-071427-root.json
07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2181 to dbctl T311494', diff saved to https://phabricator.wikimedia.org/P32668 and previous config saved to /var/cache/conftool/dbconfig/20220822-071153-marostegui.json
07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 7 hosts with reason: Maintenance
07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 7 hosts with reason: Maintenance
07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312972)', diff saved to https://phabricator.wikimedia.org/P32667 and previous config saved to /var/cache/conftool/dbconfig/20220822-070804-marostegui.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32666 and previous config saved to /var/cache/conftool/dbconfig/20220822-070001-root.json
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32665 and previous config saved to /var/cache/conftool/dbconfig/20220822-065953-root.json
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P32664 and previous config saved to /var/cache/conftool/dbconfig/20220822-065929-root.json
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32663 and previous config saved to /var/cache/conftool/dbconfig/20220822-065923-root.json
06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32662 and previous config saved to /var/cache/conftool/dbconfig/20220822-065258-marostegui.json
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32661 and previous config saved to /var/cache/conftool/dbconfig/20220822-064457-root.json
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32660 and previous config saved to /var/cache/conftool/dbconfig/20220822-064448-root.json
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32659 and previous config saved to /var/cache/conftool/dbconfig/20220822-064424-root.json
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32658 and previous config saved to /var/cache/conftool/dbconfig/20220822-064418-root.json
06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 db1142 db1096', diff saved to https://phabricator.wikimedia.org/P32657 and previous config saved to /var/cache/conftool/dbconfig/20220822-063857-root.json
06:38 marostegui: Install 10.4.26 on db1119, db1142, db1096 T315411
06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32656 and previous config saved to /var/cache/conftool/dbconfig/20220822-063752-marostegui.json
06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2180 to dbctl T311494', diff saved to https://phabricator.wikimedia.org/P32655 and previous config saved to /var/cache/conftool/dbconfig/20220822-063533-marostegui.json
06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312972)', diff saved to https://phabricator.wikimedia.org/P32654 and previous config saved to /var/cache/conftool/dbconfig/20220822-062246-marostegui.json
06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T312972)', diff saved to https://phabricator.wikimedia.org/P32653 and previous config saved to /var/cache/conftool/dbconfig/20220822-061600-marostegui.json
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2179 to dbctl T311494', diff saved to https://phabricator.wikimedia.org/P32652 and previous config saved to /var/cache/conftool/dbconfig/20220822-061553-marostegui.json
06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
06:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2178 to dbctl T311494', diff saved to https://phabricator.wikimedia.org/P32651 and previous config saved to /var/cache/conftool/dbconfig/20220822-055446-marostegui.json
00:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
00:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
00:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
00:25 tstarling@deploy1002: Synchronized php-1.39.0-wmf.25/includes/objectcache/SqlBagOStuff.php: fix modtoken comparison T315271 (duration: 03m 45s)

2022-08-21

14:36 Krinkle: krinkle@mwmaint1002 foreachwikiindblist 'all - small' deleteEqualMessages.php
14:33 Krinkle: krinkle@mwmaint1002 foreachwikiindblist 'small - closed' deleteEqualMessages.php
12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db[1111,1127,1132].eqiad.wmnet with reason: 10.6 being 10.6
12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db[1111,1127,1132].eqiad.wmnet with reason: 10.6 being 10.6
12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool 10.6 hosts', diff saved to https://phabricator.wikimedia.org/P32649 and previous config saved to /var/cache/conftool/dbconfig/20220821-123038-ladsgroup.json
12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P32648 and previous config saved to /var/cache/conftool/dbconfig/20220821-121140-root.json
09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T314041)', diff saved to https://phabricator.wikimedia.org/P32647 and previous config saved to /var/cache/conftool/dbconfig/20220821-092727-ladsgroup.json
09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32646 and previous config saved to /var/cache/conftool/dbconfig/20220821-091221-ladsgroup.json
08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32645 and previous config saved to /var/cache/conftool/dbconfig/20220821-085716-ladsgroup.json
08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T314041)', diff saved to https://phabricator.wikimedia.org/P32644 and previous config saved to /var/cache/conftool/dbconfig/20220821-084209-ladsgroup.json
04:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T314041)', diff saved to https://phabricator.wikimedia.org/P32643 and previous config saved to /var/cache/conftool/dbconfig/20220821-042415-ladsgroup.json
04:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
04:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32642 and previous config saved to /var/cache/conftool/dbconfig/20220821-033020-ladsgroup.json
03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32641 and previous config saved to /var/cache/conftool/dbconfig/20220821-031514-ladsgroup.json
03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32640 and previous config saved to /var/cache/conftool/dbconfig/20220821-030008-ladsgroup.json
02:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32639 and previous config saved to /var/cache/conftool/dbconfig/20220821-024502-ladsgroup.json
01:35 rzl@cumin2002: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P32638 and previous config saved to /var/cache/conftool/dbconfig/20220821-013504-rzl.json

2022-08-20

22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32637 and previous config saved to /var/cache/conftool/dbconfig/20220820-221826-ladsgroup.json
22:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
22:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
17:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
17:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
17:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
17:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32636 and previous config saved to /var/cache/conftool/dbconfig/20220820-173723-ladsgroup.json
17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32635 and previous config saved to /var/cache/conftool/dbconfig/20220820-172217-ladsgroup.json
17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32634 and previous config saved to /var/cache/conftool/dbconfig/20220820-170711-ladsgroup.json
16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32633 and previous config saved to /var/cache/conftool/dbconfig/20220820-165203-ladsgroup.json
11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32632 and previous config saved to /var/cache/conftool/dbconfig/20220820-115816-ladsgroup.json
11:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32631 and previous config saved to /var/cache/conftool/dbconfig/20220820-115755-ladsgroup.json
11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32630 and previous config saved to /var/cache/conftool/dbconfig/20220820-114249-ladsgroup.json
11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32629 and previous config saved to /var/cache/conftool/dbconfig/20220820-112744-ladsgroup.json
11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32628 and previous config saved to /var/cache/conftool/dbconfig/20220820-111238-ladsgroup.json
06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32627 and previous config saved to /var/cache/conftool/dbconfig/20220820-065528-ladsgroup.json
06:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
06:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T314041)', diff saved to https://phabricator.wikimedia.org/P32626 and previous config saved to /var/cache/conftool/dbconfig/20220820-065507-ladsgroup.json
06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32625 and previous config saved to /var/cache/conftool/dbconfig/20220820-064001-ladsgroup.json
06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32624 and previous config saved to /var/cache/conftool/dbconfig/20220820-062455-ladsgroup.json
06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T314041)', diff saved to https://phabricator.wikimedia.org/P32623 and previous config saved to /var/cache/conftool/dbconfig/20220820-060949-ladsgroup.json
01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T314041)', diff saved to https://phabricator.wikimedia.org/P32622 and previous config saved to /var/cache/conftool/dbconfig/20220820-012602-ladsgroup.json
01:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
01:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance

2022-08-19

23:37 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on phab2002.codfw.wmnet with reason: new host in setup
23:37 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on phab2002.codfw.wmnet with reason: new host in setup
23:35 mutante: phab2002 - service phd: stopped phabricator_logmail: disabled, phabricator dumps: disabled, systemd::sysuser: not used (all via Hiera switches) - T280597
23:33 mutante: phab2002 - re-enabled puppet, sshd config ListenAddress fixed by puppet gerrit:824797 - now has phabricator prod role but without LVS/git-ssh - no more error in puppet run - T280597
23:02 mutante: phab2002 - disable puppet, fix sshd_config, restart sshd
20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
18:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
18:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
18:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T312972)', diff saved to https://phabricator.wikimedia.org/P32621 and previous config saved to /var/cache/conftool/dbconfig/20220819-182835-marostegui.json
18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32620 and previous config saved to /var/cache/conftool/dbconfig/20220819-181329-marostegui.json
17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32619 and previous config saved to /var/cache/conftool/dbconfig/20220819-175823-marostegui.json
17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T312972)', diff saved to https://phabricator.wikimedia.org/P32618 and previous config saved to /var/cache/conftool/dbconfig/20220819-174317-marostegui.json
17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T312972)', diff saved to https://phabricator.wikimedia.org/P32617 and previous config saved to /var/cache/conftool/dbconfig/20220819-171052-marostegui.json
17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32616 and previous config saved to /var/cache/conftool/dbconfig/20220819-171031-marostegui.json
16:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32615 and previous config saved to /var/cache/conftool/dbconfig/20220819-165525-marostegui.json
16:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32614 and previous config saved to /var/cache/conftool/dbconfig/20220819-164019-marostegui.json
16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32613 and previous config saved to /var/cache/conftool/dbconfig/20220819-162513-marostegui.json
16:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32612 and previous config saved to /var/cache/conftool/dbconfig/20220819-162253-marostegui.json
16:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
16:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
16:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32611 and previous config saved to /var/cache/conftool/dbconfig/20220819-162232-marostegui.json
16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P32610 and previous config saved to /var/cache/conftool/dbconfig/20220819-160726-marostegui.json
15:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
15:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32609 and previous config saved to /var/cache/conftool/dbconfig/20220819-155611-ladsgroup.json
15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P32608 and previous config saved to /var/cache/conftool/dbconfig/20220819-155220-marostegui.json
15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32607 and previous config saved to /var/cache/conftool/dbconfig/20220819-154105-ladsgroup.json
15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32606 and previous config saved to /var/cache/conftool/dbconfig/20220819-153714-marostegui.json
15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32605 and previous config saved to /var/cache/conftool/dbconfig/20220819-153554-marostegui.json
15:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
15:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32604 and previous config saved to /var/cache/conftool/dbconfig/20220819-153533-marostegui.json
15:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2024.codfw.wmnet with OS bullseye
15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32603 and previous config saved to /var/cache/conftool/dbconfig/20220819-152559-ladsgroup.json
15:25 dancy@deploy1002: Installation of scap version "4.14.0" completed for 556 hosts
15:23 dancy@deploy1002: Installing scap version "4.14.0" for 556 hosts
15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P32602 and previous config saved to /var/cache/conftool/dbconfig/20220819-152027-marostegui.json
15:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
15:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2024.codfw.wmnet with reason: host reimage
15:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2023.codfw.wmnet with OS bullseye
15:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:12 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2024.codfw.wmnet with reason: host reimage
15:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
15:11 dancy@deploy1002: Finished scap: Backport for Add back fixed width to main content (T315653) (duration: 06m 59s)
15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32601 and previous config saved to /var/cache/conftool/dbconfig/20220819-151053-ladsgroup.json
15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P32600 and previous config saved to /var/cache/conftool/dbconfig/20220819-150521-marostegui.json
15:04 dancy@deploy1002: Started scap: Backport for Add back fixed width to main content (T315653)
14:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2023.codfw.wmnet with reason: host reimage
14:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2023.codfw.wmnet with reason: host reimage
14:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2024.codfw.wmnet with OS bullseye
14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32599 and previous config saved to /var/cache/conftool/dbconfig/20220819-145015-marostegui.json
14:48 dancy@deploy1002: backport aborted: (duration: 03m 01s)
14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32598 and previous config saved to /var/cache/conftool/dbconfig/20220819-144755-marostegui.json
14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312972)', diff saved to https://phabricator.wikimedia.org/P32597 and previous config saved to /var/cache/conftool/dbconfig/20220819-144734-marostegui.json
14:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2023.codfw.wmnet with OS bullseye
14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P32596 and previous config saved to /var/cache/conftool/dbconfig/20220819-143228-marostegui.json
14:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2024']
14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P32595 and previous config saved to /var/cache/conftool/dbconfig/20220819-141722-marostegui.json
14:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2024']
14:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2024.mgmt.codfw.wmnet with reboot policy FORCED
14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312972)', diff saved to https://phabricator.wikimedia.org/P32594 and previous config saved to /var/cache/conftool/dbconfig/20220819-140216-marostegui.json
13:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T312972)', diff saved to https://phabricator.wikimedia.org/P32593 and previous config saved to /var/cache/conftool/dbconfig/20220819-135956-marostegui.json
13:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
13:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
13:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
13:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
13:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312972)', diff saved to https://phabricator.wikimedia.org/P32592 and previous config saved to /var/cache/conftool/dbconfig/20220819-135917-marostegui.json
13:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2024.mgmt.codfw.wmnet with reboot policy FORCED
13:45 marostegui: Install 10.4.26 on db2111 db2148 db2124
13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P32591 and previous config saved to /var/cache/conftool/dbconfig/20220819-134411-marostegui.json
13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P32590 and previous config saved to /var/cache/conftool/dbconfig/20220819-132905-marostegui.json
13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312972)', diff saved to https://phabricator.wikimedia.org/P32589 and previous config saved to /var/cache/conftool/dbconfig/20220819-131359-marostegui.json
13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T312972)', diff saved to https://phabricator.wikimedia.org/P32588 and previous config saved to /var/cache/conftool/dbconfig/20220819-131139-marostegui.json
13:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
13:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
13:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
13:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
13:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 9 hosts with reason: Maintenance
13:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 9 hosts with reason: Maintenance
13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
12:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
12:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
12:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
12:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
12:44 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I0c45b6 (duration: 03m 24s)
11:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 11 hosts with reason: Maintenance
11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 11 hosts with reason: Maintenance
11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32587 and previous config saved to /var/cache/conftool/dbconfig/20220819-114703-marostegui.json
11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P32586 and previous config saved to /var/cache/conftool/dbconfig/20220819-113157-marostegui.json
11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P32584 and previous config saved to /var/cache/conftool/dbconfig/20220819-111651-marostegui.json
11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32583 and previous config saved to /var/cache/conftool/dbconfig/20220819-110145-marostegui.json
10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32582 and previous config saved to /var/cache/conftool/dbconfig/20220819-105934-marostegui.json
10:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
10:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
10:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
10:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32581 and previous config saved to /var/cache/conftool/dbconfig/20220819-105906-marostegui.json
10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32580 and previous config saved to /var/cache/conftool/dbconfig/20220819-105212-ladsgroup.json
10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T314041)', diff saved to https://phabricator.wikimedia.org/P32579 and previous config saved to /var/cache/conftool/dbconfig/20220819-105151-ladsgroup.json
10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P32578 and previous config saved to /var/cache/conftool/dbconfig/20220819-104400-marostegui.json
10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P32577 and previous config saved to /var/cache/conftool/dbconfig/20220819-103645-ladsgroup.json
10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P32576 and previous config saved to /var/cache/conftool/dbconfig/20220819-102854-marostegui.json
10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P32575 and previous config saved to /var/cache/conftool/dbconfig/20220819-102139-ladsgroup.json
10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32574 and previous config saved to /var/cache/conftool/dbconfig/20220819-101348-marostegui.json
10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T314041)', diff saved to https://phabricator.wikimedia.org/P32573 and previous config saved to /var/cache/conftool/dbconfig/20220819-100633-ladsgroup.json
09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32572 and previous config saved to /var/cache/conftool/dbconfig/20220819-095035-marostegui.json
09:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
09:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32571 and previous config saved to /var/cache/conftool/dbconfig/20220819-095014-marostegui.json
09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P32570 and previous config saved to /var/cache/conftool/dbconfig/20220819-093508-marostegui.json
09:23 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
09:21 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:21 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P32569 and previous config saved to /var/cache/conftool/dbconfig/20220819-092002-marostegui.json
09:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:14 cmooney@cumin1001: START - Cookbook sre.dns.netbox
09:11 topranks: running authdns-update on auth1001 to add new include to 0.0.5.e.2.f.d.0.1.0.0.2.ip6.arpa. zone
09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32568 and previous config saved to /var/cache/conftool/dbconfig/20220819-090456-marostegui.json
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32567 and previous config saved to /var/cache/conftool/dbconfig/20220819-090146-marostegui.json
09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
09:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312972)', diff saved to https://phabricator.wikimedia.org/P32566 and previous config saved to /var/cache/conftool/dbconfig/20220819-090124-marostegui.json
08:56 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
08:56 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P32565 and previous config saved to /var/cache/conftool/dbconfig/20220819-084618-marostegui.json
08:44 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P32564 and previous config saved to /var/cache/conftool/dbconfig/20220819-083112-marostegui.json
08:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2067.codfw.wmnet
08:16 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2067.codfw.wmnet
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312972)', diff saved to https://phabricator.wikimedia.org/P32563 and previous config saved to /var/cache/conftool/dbconfig/20220819-081606-marostegui.json
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T312972)', diff saved to https://phabricator.wikimedia.org/P32562 and previous config saved to /var/cache/conftool/dbconfig/20220819-081356-marostegui.json
08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
08:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
08:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T312972)', diff saved to https://phabricator.wikimedia.org/P32561 and previous config saved to /var/cache/conftool/dbconfig/20220819-081317-marostegui.json
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P32559 and previous config saved to /var/cache/conftool/dbconfig/20220819-075812-marostegui.json
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P32558 and previous config saved to /var/cache/conftool/dbconfig/20220819-074306-marostegui.json
07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T312972)', diff saved to https://phabricator.wikimedia.org/P32557 and previous config saved to /var/cache/conftool/dbconfig/20220819-072800-marostegui.json
07:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
07:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T312972)', diff saved to https://phabricator.wikimedia.org/P32556 and previous config saved to /var/cache/conftool/dbconfig/20220819-072422-marostegui.json
07:20 Amir1: killing cswiki's refreshlinksrecom script T299021
07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T314041)', diff saved to https://phabricator.wikimedia.org/P32555 and previous config saved to /var/cache/conftool/dbconfig/20220819-071934-ladsgroup.json
07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
07:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P32553 and previous config saved to /var/cache/conftool/dbconfig/20220819-070916-marostegui.json
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P32552 and previous config saved to /var/cache/conftool/dbconfig/20220819-065409-marostegui.json
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T312972)', diff saved to https://phabricator.wikimedia.org/P32551 and previous config saved to /var/cache/conftool/dbconfig/20220819-063903-marostegui.json
06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T312972)', diff saved to https://phabricator.wikimedia.org/P32550 and previous config saved to /var/cache/conftool/dbconfig/20220819-061649-marostegui.json
06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312972)', diff saved to https://phabricator.wikimedia.org/P32549 and previous config saved to /var/cache/conftool/dbconfig/20220819-061628-marostegui.json
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P32548 and previous config saved to /var/cache/conftool/dbconfig/20220819-061515-root.json
06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P32547 and previous config saved to /var/cache/conftool/dbconfig/20220819-060122-marostegui.json
05:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P32546 and previous config saved to /var/cache/conftool/dbconfig/20220819-054616-marostegui.json
05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312972)', diff saved to https://phabricator.wikimedia.org/P32544 and previous config saved to /var/cache/conftool/dbconfig/20220819-053110-marostegui.json
05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T312972)', diff saved to https://phabricator.wikimedia.org/P32543 and previous config saved to /var/cache/conftool/dbconfig/20220819-052900-marostegui.json
05:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
05:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
05:20 marostegui: Install 10.6.9 on db2122 and db2146
04:26 hashar@deploy1002: Finished deploy [integration/docroot@09eb565]: zuul: Fix/remove links to non-existent Grafana graphs - T307405 (duration: 00m 13s)
04:26 hashar@deploy1002: Started deploy [integration/docroot@09eb565]: zuul: Fix/remove links to non-existent Grafana graphs - T307405
01:38 tstarling@deploy1002: Synchronized php-1.39.0-wmf.25/includes/objectcache/SqlBagOStuff.php: fix potential mainstash exception file 2 T315274 (duration: 03m 30s)
01:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
01:31 tstarling@deploy1002: Synchronized php-1.39.0-wmf.25/includes/libs/rdbms/database/DBConnRef.php: fix potential mainstash exception file 1 T315274 (duration: 03m 21s)
01:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-18

23:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
23:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
23:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
23:19 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.39.0-wmf.25 refs T314186
23:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
23:12 dancy@deploy1002: Finished scap: Backport for gerrit:824573 Revert "Set initial-zoom via JavaScript to avoid font-scaling issue in iPad" (duration: 15m 27s)
23:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
23:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
23:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
22:57 dancy@deploy1002: Started scap: Backport for gerrit:824573 Revert "Set initial-zoom via JavaScript to avoid font-scaling issue in iPad"
22:53 mutante: phab1001, phab2001: sudo rm /usr/local/sbin/phab_deploy_ensure_config_ownership (follow-up gerrit:824547 T313953)
22:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:36 dancy@deploy1002: backport aborted: (duration: 00m 12s)
22:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
22:32 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
22:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
22:31 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.39.0-wmf.23 refs T314186
22:25 dancy: Rolling the train back to group1 due to T315620
22:25 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@ff0a0e2]: (no justification provided) (duration: 00m 19s)
22:24 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@ff0a0e2]: (no justification provided)
22:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2024.mgmt.codfw.wmnet with reboot policy FORCED
22:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2024.mgmt.codfw.wmnet with reboot policy FORCED
22:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
21:48 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes2024
21:47 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes2024
21:20 brennen: end of UTC late backport and config window
21:20 brennen@deploy1002: Finished scap: Set initial-zoom via JavaScript to avoid font-scaling issue in iPad (T311795) (duration: 10m 16s)
21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: elastic 7 upgrade
21:14 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: elastic 7 upgrade
21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:09 brennen@deploy1002: Started scap: Set initial-zoom via JavaScript to avoid font-scaling issue in iPad (T311795)
21:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-stretch2002.codfw.wmnet with OS bullseye
20:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-stretch2002.codfw.wmnet with OS bullseye
20:39 brennen@deploy1002: Finished scap: Allow admin to grant/revoke "transwiki" group on zh(wikt|wb|wq|ws) (T313657) (duration: 07m 09s)
20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch2002.codfw.wmnet with OS bullseye
20:32 brennen@deploy1002: Started scap: Allow admin to grant/revoke "transwiki" group on zh(wikt|wb|wq|ws) (T313657)
20:29 brennen@deploy1002: Finished scap: Deploy partial action blocks to cswiki (T315525) (duration: 19m 16s)
20:20 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-stretch2002.codfw.wmnet with OS bullseye
20:09 brennen@deploy1002: Started scap: Deploy partial action blocks to cswiki (T315525)
20:00 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
19:57 ottomata: renable puppet on an-master*
19:47 ottomata: temporarily disable puppet on an-master100* while applying change in test cluster - T312858
19:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
19:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
19:16 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
19:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
19:00 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
18:58 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
18:57 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
18:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-stretch2001.codfw.wmnet with OS bullseye
18:52 cmooney@cumin1001: START - Cookbook sre.dns.netbox
18:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-stretch2001.codfw.wmnet with reason: host reimage
18:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-stretch2001.codfw.wmnet with reason: host reimage
18:17 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch2001.codfw.wmnet with OS bullseye
18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
18:13 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.39.0-wmf.25 refs T314186
18:08 dancy: Testing stashbot behavior #2. T315444, T314613
18:07 dancy: Testing stashbot behavior #1 T315444
17:56 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host kafka-stretch2001.codfw.wmnet with OS bullseye
17:54 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:53 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:53 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:52 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:52 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:52 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:48 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch2001.codfw.wmnet with OS bullseye
17:46 dancy@deploy1002: backport aborted: (duration: 00m 21s)
17:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch2001.codfw.wmnet with OS bullseye
17:08 hashar@deploy1002: Finished deploy [integration/docroot@1aca57b]: doc: update links from /mw-tools-scap/ to /scap/ - T315541 (duration: 00m 09s)
17:08 hashar@deploy1002: Started deploy [integration/docroot@1aca57b]: doc: update links from /mw-tools-scap/ to /scap/ - T315541
16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
16:47 demon@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.25 refs T314186 (duration: 03m 20s)
16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
16:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
16:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T312972)', diff saved to https://phabricator.wikimedia.org/P32541 and previous config saved to /var/cache/conftool/dbconfig/20220818-164456-marostegui.json
16:44 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.25 refs T314186
16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P32540 and previous config saved to /var/cache/conftool/dbconfig/20220818-162950-marostegui.json
16:26 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: disk fault investigation
16:26 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: disk fault investigation
16:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-stretch2001.codfw.wmnet with OS bullseye
16:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch2001.codfw.wmnet with OS bullseye
16:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P32539 and previous config saved to /var/cache/conftool/dbconfig/20220818-161444-marostegui.json
15:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T312972)', diff saved to https://phabricator.wikimedia.org/P32538 and previous config saved to /var/cache/conftool/dbconfig/20220818-155938-marostegui.json
15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T312972)', diff saved to https://phabricator.wikimedia.org/P32537 and previous config saved to /var/cache/conftool/dbconfig/20220818-155410-marostegui.json
15:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
15:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312972)', diff saved to https://phabricator.wikimedia.org/P32536 and previous config saved to /var/cache/conftool/dbconfig/20220818-155348-marostegui.json
15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32535 and previous config saved to /var/cache/conftool/dbconfig/20220818-153842-marostegui.json
15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32534 and previous config saved to /var/cache/conftool/dbconfig/20220818-152335-marostegui.json
15:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312972)', diff saved to https://phabricator.wikimedia.org/P32533 and previous config saved to /var/cache/conftool/dbconfig/20220818-150829-marostegui.json
15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T312972)', diff saved to https://phabricator.wikimedia.org/P32532 and previous config saved to /var/cache/conftool/dbconfig/20220818-150621-marostegui.json
15:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
15:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T312972)', diff saved to https://phabricator.wikimedia.org/P32531 and previous config saved to /var/cache/conftool/dbconfig/20220818-150601-marostegui.json
15:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-stretch2001.codfw.wmnet with OS bullseye
14:58 dancy@deploy1002: Finished deploy [integration/docroot@a43ff3b]: (no justification provided) (duration: 00m 38s)
14:58 dancy@deploy1002: Started deploy [integration/docroot@a43ff3b]: (no justification provided)
14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32530 and previous config saved to /var/cache/conftool/dbconfig/20220818-145055-marostegui.json
14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32529 and previous config saved to /var/cache/conftool/dbconfig/20220818-143549-marostegui.json
14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T312972)', diff saved to https://phabricator.wikimedia.org/P32528 and previous config saved to /var/cache/conftool/dbconfig/20220818-142043-marostegui.json
14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T312972)', diff saved to https://phabricator.wikimedia.org/P32527 and previous config saved to /var/cache/conftool/dbconfig/20220818-141835-marostegui.json
14:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32526 and previous config saved to /var/cache/conftool/dbconfig/20220818-141815-marostegui.json
14:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:10 TheresNoTime: UTC afternoon backport window done
14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:09 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable new Vector skin on select pages (take 2) (T314286) (duration: 03m 07s)
14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32525 and previous config saved to /var/cache/conftool/dbconfig/20220818-140309-marostegui.json
14:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:01 TheresNoTime: extending deployment window slightly
13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32524 and previous config saved to /var/cache/conftool/dbconfig/20220818-134803-marostegui.json
13:45 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: Enable new Vector skin on select pages (T314286) (duration: 03m 35s)
13:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:37 jbond: uploaded spicerack_3.2.0 to apt.wikimedia.org bullseye-wikimedia
13:37 samtar@deploy1002: scap failed: average error rate on 5/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
13:37 jbond: release spicerack 3.2.0
13:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:33 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32523 and previous config saved to /var/cache/conftool/dbconfig/20220818-133257-marostegui.json
13:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
13:31 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=toolhub
13:31 samtar@deploy1002: Synchronized wmf-config: Config: Remove unused config for Echo notification emails (T314604) (duration: 03m 25s)
13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:28 awight@deploy1002: Finished deploy [kartotherian/deploy@672af45]: Update kartotherian to 285fc7d (duration: 03m 45s)
13:26 jayme@cumin1001: START - Cookbook sre.discovery.service-route
13:25 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw
13:24 awight@deploy1002: Started deploy [kartotherian/deploy@672af45]: Update kartotherian to 285fc7d
13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:23 samtar@deploy1002: Synchronized wmf-config: Config: Disable DiscussionTools pageframe everywhere except labs and mediawikiwiki (duration: 03m 26s)
13:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
13:13 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: InitialiseSettings-labs: Enable Phonos on beta enwiki (T314294) (duration: 03m 30s)
13:01 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
12:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
12:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
12:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
12:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
12:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
12:33 reedy@deploy1002: Synchronized wmf-config/: SFS config updates (duration: 03m 25s)
12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32522 and previous config saved to /var/cache/conftool/dbconfig/20220818-123241-marostegui.json
12:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
12:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T312972)', diff saved to https://phabricator.wikimedia.org/P32521 and previous config saved to /var/cache/conftool/dbconfig/20220818-123220-marostegui.json
12:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
12:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
12:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
12:28 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Set wgSFSReportOnly in here (duration: 03m 27s)
12:25 marostegui: Install 10.6.9 on pc1014
12:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
12:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
12:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32520 and previous config saved to /var/cache/conftool/dbconfig/20220818-121714-marostegui.json
12:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
12:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
12:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
12:04 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32519 and previous config saved to /var/cache/conftool/dbconfig/20220818-120208-marostegui.json
11:55 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T312972)', diff saved to https://phabricator.wikimedia.org/P32518 and previous config saved to /var/cache/conftool/dbconfig/20220818-114702-marostegui.json
11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T312972)', diff saved to https://phabricator.wikimedia.org/P32517 and previous config saved to /var/cache/conftool/dbconfig/20220818-114555-marostegui.json
11:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
11:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
11:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
11:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T312972)', diff saved to https://phabricator.wikimedia.org/P32516 and previous config saved to /var/cache/conftool/dbconfig/20220818-114518-marostegui.json
11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P32515 and previous config saved to /var/cache/conftool/dbconfig/20220818-113655-ladsgroup.json
11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'depool db1112', diff saved to https://phabricator.wikimedia.org/P32514 and previous config saved to /var/cache/conftool/dbconfig/20220818-113556-ladsgroup.json
11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32513 and previous config saved to /var/cache/conftool/dbconfig/20220818-113012-marostegui.json
11:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
11:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
11:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32511 and previous config saved to /var/cache/conftool/dbconfig/20220818-111506-marostegui.json
11:00 jayme: kubernetes2015:~$ sudo systemctl reset-failed ifup@ens13.service - T273026
11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T312972)', diff saved to https://phabricator.wikimedia.org/P32510 and previous config saved to /var/cache/conftool/dbconfig/20220818-110000-marostegui.json
10:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T312972)', diff saved to https://phabricator.wikimedia.org/P32509 and previous config saved to /var/cache/conftool/dbconfig/20220818-105552-marostegui.json
10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
10:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
10:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312972)', diff saved to https://phabricator.wikimedia.org/P32508 and previous config saved to /var/cache/conftool/dbconfig/20220818-105531-marostegui.json
10:55 jayme: kubernetes2016:~$ sudo systemctl reset-failed ifup@ens13.service - T273026
10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
10:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
10:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
10:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1166', diff saved to https://phabricator.wikimedia.org/P32506 and previous config saved to /var/cache/conftool/dbconfig/20220818-104731-ladsgroup.json
10:45 reedy@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/StopForumSpam/includes/: T315447 (duration: 03m 36s)
10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P32505 and previous config saved to /var/cache/conftool/dbconfig/20220818-104552-ladsgroup.json
10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32504 and previous config saved to /var/cache/conftool/dbconfig/20220818-104025-marostegui.json
10:37 jayme: kubernetes2006:~$ sudo systemctl reset-failed ifup@ens13.service - T273026
10:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
10:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
10:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
10:27 ladsgroup@deploy1002: Synchronized wmf-config/etcd.php: Config: Drop now-unused wmfEtcdApplyDBConfig() (T298485) (duration: 03m 36s)
10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32503 and previous config saved to /var/cache/conftool/dbconfig/20220818-102519-marostegui.json
10:22 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
10:22 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
10:22 jayme: kubernetes2005:~$ sudo systemctl status ifup@ens13.service - T273026
10:20 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
10:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
10:19 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
10:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
10:16 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
10:16 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
10:16 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Call wmfApplyEtcdDBConfig() directly in CS.php (T298485) (duration: 03m 46s)
10:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312972)', diff saved to https://phabricator.wikimedia.org/P32501 and previous config saved to /var/cache/conftool/dbconfig/20220818-101013-marostegui.json
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T312972)', diff saved to https://phabricator.wikimedia.org/P32500 and previous config saved to /var/cache/conftool/dbconfig/20220818-100806-marostegui.json
10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
10:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312972)', diff saved to https://phabricator.wikimedia.org/P32499 and previous config saved to /var/cache/conftool/dbconfig/20220818-100744-marostegui.json
10:03 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
10:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
10:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:56 ladsgroup@deploy1002: Synchronized wmf-config/etcd.php: Config: Allow passing arguments to wmfEtcdApplyDBConfig() (T298485) (duration: 03m 40s)
09:53 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:53 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:53 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:52 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32498 and previous config saved to /var/cache/conftool/dbconfig/20220818-095238-marostegui.json
09:47 jayme@cumin1001: START - Cookbook sre.discovery.service-route
09:44 jayme: dnsdisc depooling codfw for services running in kubernetes cluster (for 30-60min due to T310483, T260661)
09:43 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2004.codfw.wmnet
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32497 and previous config saved to /var/cache/conftool/dbconfig/20220818-093732-marostegui.json
09:34 _joe_: updating vopsbot to 0.3.0
09:33 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2004.codfw.wmnet
09:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2003.codfw.wmnet
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312972)', diff saved to https://phabricator.wikimedia.org/P32496 and previous config saved to /var/cache/conftool/dbconfig/20220818-092226-marostegui.json
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T312972)', diff saved to https://phabricator.wikimedia.org/P32495 and previous config saved to /var/cache/conftool/dbconfig/20220818-092219-marostegui.json
09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
09:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32494 and previous config saved to /var/cache/conftool/dbconfig/20220818-092130-marostegui.json
09:19 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2003.codfw.wmnet
09:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2002.codfw.wmnet
09:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:10 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Simplify wmfEtcdApplyDBConfig() a bit (T298485), Part II (duration: 03m 11s)
09:09 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2002.codfw.wmnet
09:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32493 and previous config saved to /var/cache/conftool/dbconfig/20220818-090624-marostegui.json
09:06 ladsgroup@deploy1002: Synchronized wmf-config/etcd.php: Config: Simplify wmfEtcdApplyDBConfig() a bit (T298485), Part I (duration: 03m 02s)
09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:59 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
08:59 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32492 and previous config saved to /var/cache/conftool/dbconfig/20220818-085118-marostegui.json
08:49 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
08:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
08:39 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1002.eqiad.wmnet
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32491 and previous config saved to /var/cache/conftool/dbconfig/20220818-083612-marostegui.json
08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32490 and previous config saved to /var/cache/conftool/dbconfig/20220818-083505-marostegui.json
08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
08:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
08:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312972)', diff saved to https://phabricator.wikimedia.org/P32489 and previous config saved to /var/cache/conftool/dbconfig/20220818-083417-marostegui.json
08:33 vgutierrez: upgrade to ATS 9.1.3 in cp5014 and cp5016 - T309651
08:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:26 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to the old templatelinks fields in wikidata and new wikis (T312865) (duration: 03m 20s)
08:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32488 and previous config saved to /var/cache/conftool/dbconfig/20220818-081911-marostegui.json
08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P32487 and previous config saved to /var/cache/conftool/dbconfig/20220818-081627-ladsgroup.json
08:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:09 marostegui: dbmaint Promote pc1013 as pc3 master T315526
08:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:07 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1013 to pc3 master T315526 (duration: 03m 11s)
08:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32486 and previous config saved to /var/cache/conftool/dbconfig/20220818-080405-marostegui.json
08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P32485 and previous config saved to /var/cache/conftool/dbconfig/20220818-080122-ladsgroup.json
07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312972)', diff saved to https://phabricator.wikimedia.org/P32484 and previous config saved to /var/cache/conftool/dbconfig/20220818-074859-marostegui.json
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T312972)', diff saved to https://phabricator.wikimedia.org/P32483 and previous config saved to /var/cache/conftool/dbconfig/20220818-074652-marostegui.json
07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
07:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P32482 and previous config saved to /var/cache/conftool/dbconfig/20220818-074618-ladsgroup.json
07:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:39 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/GrowthExperiments/modules/ext.growthExperiments.HelpPanel/SuggestedEditsGuidance.js: 520cd7b: Fix structured task restriction check (T315516) (duration: 03m 17s)
07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P32480 and previous config saved to /var/cache/conftool/dbconfig/20220818-073113-ladsgroup.json
07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
07:26 godog: roll-restart swift-proxy to apply bumbed memcached limits T314914
07:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:24 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/includes/specials/SpecialRecentChangesLinked.php: Backport: Revert "Revert "SpecialRecentChangesLinked: Use rdbms code for building the main query"" (duration: 03m 31s)
07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
07:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
06:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
06:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T314041)', diff saved to https://phabricator.wikimedia.org/P32479 and previous config saved to /var/cache/conftool/dbconfig/20220818-064124-ladsgroup.json
06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P32478 and previous config saved to /var/cache/conftool/dbconfig/20220818-062618-ladsgroup.json
06:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
06:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maint
06:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maint
06:11 Amir1: dbmaint@s8 eqiad (T314369 T312863 T309311 T60674 T303603 T310485)
06:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P32477 and previous config saved to /var/cache/conftool/dbconfig/20220818-061112-ladsgroup.json
06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1104 T314369', diff saved to https://phabricator.wikimedia.org/P32476 and previous config saved to /var/cache/conftool/dbconfig/20220818-060707-ladsgroup.json
06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1109 to s8 primary and set section read-write T314369', diff saved to https://phabricator.wikimedia.org/P32475 and previous config saved to /var/cache/conftool/dbconfig/20220818-060213-ladsgroup.json
06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T314369', diff saved to https://phabricator.wikimedia.org/P32474 and previous config saved to /var/cache/conftool/dbconfig/20220818-060137-ladsgroup.json
06:01 Amir1: Starting s8 eqiad failover from db1104 to db1109 - T314369
05:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T314041)', diff saved to https://phabricator.wikimedia.org/P32473 and previous config saved to /var/cache/conftool/dbconfig/20220818-055606-ladsgroup.json
04:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 31 hosts with reason: Primary switchover s8 T314369
04:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 31 hosts with reason: Primary switchover s8 T314369
04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1109 with weight 0 T314369', diff saved to https://phabricator.wikimedia.org/P32471 and previous config saved to /var/cache/conftool/dbconfig/20220818-045218-ladsgroup.json
04:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 31 hosts with reason: Primary switchover s8 T314369
04:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 31 hosts with reason: Primary switchover s8 T314369
04:30 TimStarling: on mw1411, mw1413, mw1419, mw1429, mw1431, mw1433: set scaling_governor to performance, attempt 2, T315398
02:15 TimStarling: on mw1411, mw1413, mw1419, mw1429, mw1431, mw1433: set scaling_governor to performance T315398
00:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
00:48 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
00:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
00:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
00:41 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
00:39 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
00:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
00:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
00:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2023.mgmt.codfw.wmnet with reboot policy FORCED
00:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2023.mgmt.codfw.wmnet with reboot policy FORCED
00:06 eileen___: civicrm upgraded from 97638e58 to edfe2f16
00:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)

2022-08-17

23:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
23:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes2023
23:57 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes2023
23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
23:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
23:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
23:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
23:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
23:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
23:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
23:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
23:23 mutante: phab2002 - chmod -R phd /srv/repos | find /srv/repos/ -gid 498 -exec chown phd:phd {} \; T313360
23:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
23:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-stretch2001.codfw.wmnet']
23:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2001.codfw.wmnet']
22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
22:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
22:34 dancy@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.25 refs T314186 (duration: 03m 17s)
22:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
22:31 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.25 refs T314186
22:16 eileen___: civicrm upgraded from 4be0724d to 97638e58
21:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:16 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/FlaggedRevs: Backport: Remove indexExists check for page_name_title index (duration: 03m 12s)
21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:13 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/FlaggedRevs/frontend/FlaggedRevsUIHooks.php: Backport: Do not attempt to create a FlaggableWikiPage when the title can't exist (T315479) (duration: 03m 26s)
21:08 ejegg: updated civicrm from c228e3d7 to 4be0724d
21:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2005.codfw.wmnet with OS bullseye
21:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2005.codfw.wmnet with reason: host reimage
20:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2005.codfw.wmnet with reason: host reimage
20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:36 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: InitialiseSettings: Add wmgUsePhonos (default => false) (T314294) (duration: 03m 29s)
20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:30 samtar@deploy1002: Synchronized wmf-config/extension-list: Config: extension-list: Add Phonos (T314294) (duration: 03m 17s)
20:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2005.codfw.wmnet with OS bullseye
20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS bullseye
20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1ddc661: QuickSurveys: Disable extension on JA wiki (T311015) (duration: 03m 19s)
20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2cf80d1: QuickSurveys: Remove research incentive survey from BN wiki (T314333) (duration: 03m 24s)
20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage
20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage
19:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS bullseye
19:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
19:11 demon@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.23 refs T314186 (duration: 03m 15s)
19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
19:07 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.23 refs T314186
19:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
19:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
19:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1073.eqiad.wmnet with OS bullseye
19:01 demon@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.25 refs T314186 (duration: 03m 24s)
19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
18:58 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.25 refs T314186
18:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging2004.codfw.wmnet with OS bullseye
18:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
18:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1073.eqiad.wmnet with reason: host reimage
18:40 urandom: disabling reserved space on codfw nodes (RESTBase), /dev/md2 (aka /srv/cassandra/instance-data) -- T314941
18:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
18:38 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1073.eqiad.wmnet with reason: host reimage
18:36 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T314041)', diff saved to https://phabricator.wikimedia.org/P32469 and previous config saved to /var/cache/conftool/dbconfig/20220817-183223-ladsgroup.json
18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T314041)', diff saved to https://phabricator.wikimedia.org/P32468 and previous config saved to /var/cache/conftool/dbconfig/20220817-183202-ladsgroup.json
18:25 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1073.eqiad.wmnet with OS bullseye
18:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P32467 and previous config saved to /var/cache/conftool/dbconfig/20220817-181656-ladsgroup.json
18:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1056.eqiad.wmnet with OS bullseye
18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P32466 and previous config saved to /var/cache/conftool/dbconfig/20220817-180150-ladsgroup.json
18:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS bullseye
17:48 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging2004.codfw.wmnet with OS bullseye
17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T314041)', diff saved to https://phabricator.wikimedia.org/P32465 and previous config saved to /var/cache/conftool/dbconfig/20220817-174644-ladsgroup.json
17:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1056.eqiad.wmnet with reason: host reimage
17:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2005
17:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2005
17:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1056.eqiad.wmnet with reason: host reimage
17:33 ladsgroup@deploy1002: Synchronized portals: Migrate wikinews.org to the modern portals (duration: 03m 32s)
17:31 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004
17:30 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004
17:29 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikinews.org to the modern portals (duration: 03m 29s)
17:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1056.eqiad.wmnet with OS bullseye
17:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS bullseye
17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2004.codfw.wmnet with OS bullseye
16:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
16:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
16:54 sbassett@deploy1002: Synchronized wmf-config/CommonSettings.php: Enable StopForumSpam on candidate wikis (CS.php) - T273220 (duration: 03m 26s)
16:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host graphite2004.codfw.wmnet with OS bullseye
16:50 sbassett@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable StopForumSpam on candidate wikis (IS.php) - T273220 (duration: 03m 20s)
16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
16:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
16:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
16:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
16:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
16:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
16:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
16:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32463 and previous config saved to /var/cache/conftool/dbconfig/20220817-162655-root.json
16:24 cwhite: restart logmsgbot T257861
16:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
16:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1079.eqiad.wmnet with OS bullseye
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32462 and previous config saved to /var/cache/conftool/dbconfig/20220817-161151-root.json
16:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
16:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32461 and previous config saved to /var/cache/conftool/dbconfig/20220817-155653-root.json
15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32460 and previous config saved to /var/cache/conftool/dbconfig/20220817-155646-root.json
15:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
15:54 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: jawiki: Restrict abusefilter log access (2) (T315199) (duration: 03m 47s)
15:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1079.eqiad.wmnet with reason: host reimage
15:52 jbond: push out update for linux-image-amd64 on bullseye
15:51 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1079.eqiad.wmnet with reason: host reimage
15:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
15:50 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: jawiki: Restrict abusefilter log access (1) (T315199) (duration: 03m 25s)
15:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
15:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
15:43 TheresNoTime: finished deploying RESTBase is not enabled on closed wikis (T315383)
15:42 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=1)
15:42 jayme@cumin1001: START - Cookbook sre.discovery.service-route
15:42 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: RESTBase is not enabled on closed wikis (T315383) (duration: 03m 27s)
15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32458 and previous config saved to /var/cache/conftool/dbconfig/20220817-154148-root.json
15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P32457 and previous config saved to /var/cache/conftool/dbconfig/20220817-154142-root.json
15:41 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
15:41 jayme@cumin1001: START - Cookbook sre.discovery.service-route
15:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1079.eqiad.wmnet with OS bullseye
15:37 jbond: install net-snmp updates
15:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32455 and previous config saved to /var/cache/conftool/dbconfig/20220817-152643-root.json
15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32454 and previous config saved to /var/cache/conftool/dbconfig/20220817-152637-root.json
15:24 TheresNoTime: deploying RESTBase is not enabled on closed wikis (T315383) out of window
15:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32453 and previous config saved to /var/cache/conftool/dbconfig/20220817-151139-root.json
15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32452 and previous config saved to /var/cache/conftool/dbconfig/20220817-151132-root.json
15:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32450 and previous config saved to /var/cache/conftool/dbconfig/20220817-145634-root.json
14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P32449 and previous config saved to /var/cache/conftool/dbconfig/20220817-145628-root.json
14:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host graphite2004.codfw.wmnet with OS bullseye
14:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch2002.mgmt.codfw.wmnet with reboot policy FORCED
14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32447 and previous config saved to /var/cache/conftool/dbconfig/20220817-144129-root.json
14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32446 and previous config saved to /var/cache/conftool/dbconfig/20220817-144123-root.json
14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
14:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
14:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-stretch2002.mgmt.codfw.wmnet with reboot policy FORCED
14:18 marostegui: Redact new wikis guwwiktionary pcmwiki bjnwiktionary T312214 T310879 T309056
14:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch2001.mgmt.codfw.wmnet with reboot policy FORCED
14:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host graphite2004.codfw.wmnet with OS bullseye
14:01 taavi: UTC afternoon deploys done
14:00 taavi@deploy1002: Finished scap: Backport for gerrit:823697 Add wgDiscussionToolsEnablePermalinksBackend config (duration: 19m 24s)
13:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-stretch2001.mgmt.codfw.wmnet with reboot policy FORCED
13:51 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
13:43 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-stretch2002
13:42 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-stretch2002
13:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-stretch2001
13:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-stretch2001
13:41 taavi@deploy1002: Started scap: Backport for gerrit:823697 Add wgDiscussionToolsEnablePermalinksBackend config
13:38 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Realtime Preview on Group 1 (T314182) (duration: 03m 26s)
13:32 taavi@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/DiscussionTools/includes/Hooks/DataUpdatesHooks.php: Backport: Add try…catch in failing deferred update (T315383) (duration: 03m 18s)
13:27 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: lots of DiscussionTools and other changes (duration: 03m 11s)
13:19 mforns@deploy1002: Finished deploy [airflow-dags/analytics@141f179]: (no justification provided) (duration: 00m 10s)
13:19 mforns@deploy1002: Started deploy [airflow-dags/analytics@141f179]: (no justification provided)
12:39 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (T310776, T312209, T309054) (duration: 03m 30s)
12:30 urbanecm@deploy1002: Synchronized dblists-index.php: Creating bjnwiktionary (T312209) (duration: 03m 32s)
12:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating bjnwiktionary (T312209) (duration: 03m 13s)
12:23 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating bjnwiktionary (T312209) (duration: 03m 19s)
12:20 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating bjnwiktionary (T312209) (duration: 03m 27s)
12:17 jbond: remove prometheus-ipmi-exporter from stretch
12:16 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating bjnwiktionary (T312209)
12:15 jbond: copy prometheus-ipmi-exporter package from buster to stretch
12:12 urbanecm@deploy1002: Synchronized dblists: Creating bjnwiktionary (T312209) (duration: 03m 33s)
12:09 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating bjnwiktionary (T312209) (duration: 03m 29s)
12:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating guwwiktionary (T309054) (duration: 03m 34s)
12:01 jbond: copy prometheus-ipmi-exporter package from bullseye to buster
11:58 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating guwwiktionary (T309054) (duration: 03m 43s)
11:54 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating guwwiktionary (T309054) (duration: 03m 25s)
11:51 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating guwwiktionary (T309054)
11:47 urbanecm@deploy1002: Synchronized dblists: Creating guwwiktionary (T309054) (duration: 03m 11s)
11:44 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating guwwiktionary (T309054) (duration: 03m 08s)
11:38 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1001.eqiad.wmnet sretest1002.eqiad.wmnet on all recursors
11:38 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache sretest1001.eqiad.wmnet sretest1002.eqiad.wmnet on all recursors
11:38 urbanecm@deploy1002: Synchronized langlist: Creating pcmwiki (T310776) (duration: 03m 42s)
11:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating pcmwiki (T310776) (duration: 03m 18s)
11:31 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating pcmwiki (T310776) (duration: 03m 24s)
11:27 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating pcmwiki (T310776) (duration: 03m 13s)
11:24 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating pcmwiki (T310776)
11:20 urbanecm@deploy1002: Synchronized dblists: Creating pcmwiki (T310776) (duration: 03m 13s)
11:17 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating pcmwiki (T310776) (duration: 03m 22s)
11:11 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
11:11 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32444 and previous config saved to /var/cache/conftool/dbconfig/20220817-092244-root.json
09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32443 and previous config saved to /var/cache/conftool/dbconfig/20220817-092125-root.json
09:10 hashar: Upgraded Gerrit from 3.4.4 to 3.4.5 # T315408
09:09 hashar@deploy1002: Finished deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit1001 # T315408 (duration: 00m 09s)
09:09 hashar@deploy1002: Started deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit1001 # T315408
09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32442 and previous config saved to /var/cache/conftool/dbconfig/20220817-090739-root.json
09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32441 and previous config saved to /var/cache/conftool/dbconfig/20220817-090620-root.json
09:04 hashar@deploy1002: Finished deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit 2002 # T315408 (duration: 00m 11s)
09:03 hashar@deploy1002: Started deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit 2002 # T315408
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32440 and previous config saved to /var/cache/conftool/dbconfig/20220817-085235-root.json
08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32439 and previous config saved to /var/cache/conftool/dbconfig/20220817-085224-root.json
08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32438 and previous config saved to /var/cache/conftool/dbconfig/20220817-085136-root.json
08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32437 and previous config saved to /var/cache/conftool/dbconfig/20220817-085115-root.json
08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32436 and previous config saved to /var/cache/conftool/dbconfig/20220817-083730-root.json
08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32435 and previous config saved to /var/cache/conftool/dbconfig/20220817-083719-root.json
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32434 and previous config saved to /var/cache/conftool/dbconfig/20220817-083631-root.json
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32433 and previous config saved to /var/cache/conftool/dbconfig/20220817-083611-root.json
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32432 and previous config saved to /var/cache/conftool/dbconfig/20220817-082226-root.json
08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32431 and previous config saved to /var/cache/conftool/dbconfig/20220817-082215-root.json
08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32430 and previous config saved to /var/cache/conftool/dbconfig/20220817-082127-root.json
08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32429 and previous config saved to /var/cache/conftool/dbconfig/20220817-082106-root.json
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32428 and previous config saved to /var/cache/conftool/dbconfig/20220817-080721-root.json
08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32427 and previous config saved to /var/cache/conftool/dbconfig/20220817-080710-root.json
08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32426 and previous config saved to /var/cache/conftool/dbconfig/20220817-080622-root.json
08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32425 and previous config saved to /var/cache/conftool/dbconfig/20220817-080602-root.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 2%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32424 and previous config saved to /var/cache/conftool/dbconfig/20220817-075216-root.json
07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32423 and previous config saved to /var/cache/conftool/dbconfig/20220817-075206-root.json
07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32422 and previous config saved to /var/cache/conftool/dbconfig/20220817-075118-root.json
07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32421 and previous config saved to /var/cache/conftool/dbconfig/20220817-075057-root.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32420 and previous config saved to /var/cache/conftool/dbconfig/20220817-073712-root.json
07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32419 and previous config saved to /var/cache/conftool/dbconfig/20220817-073701-root.json
07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32418 and previous config saved to /var/cache/conftool/dbconfig/20220817-073613-root.json
07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32417 and previous config saved to /var/cache/conftool/dbconfig/20220817-073553-root.json
07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T314041)', diff saved to https://phabricator.wikimedia.org/P32416 and previous config saved to /var/cache/conftool/dbconfig/20220817-073141-ladsgroup.json
07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
07:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32415 and previous config saved to /var/cache/conftool/dbconfig/20220817-073052-ladsgroup.json
07:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P32414 and previous config saved to /var/cache/conftool/dbconfig/20220817-071546-ladsgroup.json
07:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P32413 and previous config saved to /var/cache/conftool/dbconfig/20220817-070040-ladsgroup.json
06:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32412 and previous config saved to /var/cache/conftool/dbconfig/20220817-064534-ladsgroup.json
06:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
06:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
06:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1034.eqiad.wmnet with reason: host reimage
06:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
06:36 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
06:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1034.eqiad.wmnet with reason: host reimage
06:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
06:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
06:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
06:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
06:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1029.eqiad.wmnet with reason: host reimage
06:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
06:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1028.eqiad.wmnet with reason: host reimage
06:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
06:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1032.eqiad.wmnet with reason: host reimage
06:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1028.eqiad.wmnet with reason: host reimage
06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1032.eqiad.wmnet with reason: host reimage
06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1029.eqiad.wmnet with reason: host reimage
06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
06:00 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
05:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
05:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
05:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1026.eqiad.wmnet with OS bullseye
05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
05:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
05:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
05:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1030.eqiad.wmnet with OS bullseye
05:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
05:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
05:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
05:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
04:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
04:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1026.eqiad.wmnet with OS bullseye
04:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
04:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
04:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
04:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
04:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
04:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1030.eqiad.wmnet with OS bullseye
04:31 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1030.eqiad.wmnet with OS bullseye
04:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
04:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
04:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1030.eqiad.wmnet with OS bullseye
04:08 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
02:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[1051-1052].eqiad.wmnet
02:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:45 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
02:32 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic[1051-1052].eqiad.wmnet
02:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts elastic[1051-1052].eqiad.wmnet
02:16 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic[1051-1052].eqiad.wmnet
02:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[1049-1050].eqiad.wmnet
02:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:59 sbassett: Re-deployed security fix for T309894 to wmf.25
01:54 sbassett: Re-deployed security fix for T309894 to wmf.23
01:49 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
01:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging2005']
01:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
01:12 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic[1049-1050].eqiad.wmnet
01:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED
00:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging2004']
00:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2004']
00:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-logging2004']
00:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2004']
00:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED
00:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED

2022-08-16

23:56 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED
23:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging2004.mgmt.codfw.wmnet with reboot policy FORCED
23:44 mutante: phab1001 - repeated rsync of /srv/repos to phab2002, then chown -R phd /srv/repos/ (without setting the group) - this way UID is fixed and privs match exactly phab1001 - T313360
23:37 mutante: phab2002 - chown -R phd:www-data /srv/repos/ (because of UID mismatch) T313360
23:32 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging2004.mgmt.codfw.wmnet with reboot policy FORCED
23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['graphite2004']
23:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['graphite2004']
23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2005
23:27 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2005
23:27 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004
23:26 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004
23:24 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['graphite2004']
23:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['graphite2004']
23:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['graphite2004']
23:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['graphite2004']
23:20 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1026.eqiad.wmnet with OS bullseye
23:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
23:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
23:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host graphite2004.mgmt.codfw.wmnet with reboot policy FORCED
22:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
22:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
22:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1026.eqiad.wmnet with OS bullseye
22:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
22:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
22:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1025.eqiad.wmnet with OS bullseye
22:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
22:29 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
22:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Large deletions affecting this replica
22:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Large deletions affecting this replica
22:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
22:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
21:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
21:56 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
21:54 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.25 refs T314186
21:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
21:53 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
21:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic1048.eqiad.wmnet
21:53 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:47 bking@cumin1001: START - Cookbook sre.dns.netbox
21:45 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1075.eqiad.wmnet with OS bullseye
21:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
21:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
21:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
21:44 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
21:42 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic1048.eqiad.wmnet
21:41 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts elastic1048.eqiad.wmnet
21:31 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic1048.eqiad.wmnet
21:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host graphite2004.mgmt.codfw.wmnet with reboot policy FORCED
21:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1075.eqiad.wmnet with reason: host reimage
21:27 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host graphite2004
21:26 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host graphite2004
21:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1075.eqiad.wmnet with reason: host reimage
21:25 pt1979@cumin2002: START - Cookbook sre.dns.netbox
21:13 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1075.eqiad.wmnet with OS bullseye
21:11 cstone: civicrm upgraded from 92467234 to c228e3d7
21:05 otto@deploy1002: Finished deploy [airflow-dags/platform_eng@33afb85]: initial scap deploy to an-airflow1004, take 3 - T312858 (duration: 00m 18s)
21:05 otto@deploy1002: Started deploy [airflow-dags/platform_eng@33afb85]: initial scap deploy to an-airflow1004, take 3 - T312858
21:01 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.25 refs T314186 (duration: 08m 02s)
20:54 otto@deploy1002: Finished deploy [airflow-dags/platform_eng@da511ee]: initial scap deploy to an-airflow1004, take 2 - T312858 (duration: 01m 05s)
20:53 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.25 refs T314186
20:53 otto@deploy1002: Started deploy [airflow-dags/platform_eng@da511ee]: initial scap deploy to an-airflow1004, take 2 - T312858
20:47 cjming: end of UTC late backport window
20:45 cjming@deploy1002: Finished scap: Backport for gerrit:823268 Update sticky header config for idwiki, viwiki A/B experiment (duration: 06m 44s)
20:42 otto@deploy1002: Finished deploy [airflow-dags/platform_eng@eba3ff8]: initial scap deploy to an-airflow1004 - T312858 (duration: 02m 30s)
20:39 otto@deploy1002: Started deploy [airflow-dags/platform_eng@eba3ff8]: initial scap deploy to an-airflow1004 - T312858
20:39 cjming@deploy1002: Started scap: Backport for gerrit:823268 Update sticky header config for idwiki, viwiki A/B experiment
20:35 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1055.eqiad.wmnet with OS bullseye
20:28 cjming@deploy1002: Finished scap: Backport for gerrit:823658 mediawikiwiki: set $wgCdnMatchParameterOrder to false (duration: 08m 54s)
20:26 mutante: mw1406 - sudo systemctl start php7.2-fpm_check_restart
20:19 cjming@deploy1002: Started scap: Backport for gerrit:823658 mediawikiwiki: set $wgCdnMatchParameterOrder to false
20:18 ori: removed /var/lock/scap.operations_mediawiki-config.lock on deploy1002
20:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1055.eqiad.wmnet with reason: host reimage
20:14 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
20:14 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1055.eqiad.wmnet with reason: host reimage
20:13 cjming@deploy1002: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "demon"; reason is "all wikis to 1.39.0-wmf.23 refs T314186" (duration: 00m 00s)
19:58 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1055.eqiad.wmnet with OS bullseye
19:53 dancy@deploy1002: backport aborted: (duration: 00m 36s)
19:53 demon@deploy1002: stage-train aborted: (duration: 07m 00s)
19:53 demon@deploy1002: deploy-promote aborted: (duration: 05m 35s)
19:53 demon@deploy1002: sync-world aborted: testwikis wikis to 1.39.0-wmf.25 refs T314186 (duration: 03m 39s)
19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32408 and previous config saved to /var/cache/conftool/dbconfig/20220816-195115-ladsgroup.json
19:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
19:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
19:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32407 and previous config saved to /var/cache/conftool/dbconfig/20220816-195043-ladsgroup.json
19:49 demon@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.25 refs T314186
19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P32406 and previous config saved to /var/cache/conftool/dbconfig/20220816-193537-ladsgroup.json
19:25 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
19:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
19:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P32405 and previous config saved to /var/cache/conftool/dbconfig/20220816-192031-ladsgroup.json
19:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
19:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
19:13 otto@deploy1002: Finished deploy [analytics/refinery@6e47e0e]: Full deploy after last week's interrupted deployment. This syncs the latest refinery to all targets. an-launcher1002 already has these files. (duration: 24m 46s)
19:07 demon@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.23 refs T314186
19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32404 and previous config saved to /var/cache/conftool/dbconfig/20220816-190525-ladsgroup.json
19:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
19:04 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1025.eqiad.wmnet with OS bullseye
19:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1076.eqiad.wmnet with OS bullseye
18:58 demon@deploy1002: Pruned MediaWiki: 1.39.0-wmf.22 (duration: 02m 02s)
18:56 demon@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.24 refs T314186 (duration: 35m 39s)
18:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
18:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
18:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
18:48 otto@deploy1002: Started deploy [analytics/refinery@6e47e0e]: Full deploy after last week's interrupted deployment. This syncs the latest refinery to all targets. an-launcher1002 already has these files.
18:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1076.eqiad.wmnet with reason: host reimage
18:40 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
18:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
18:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1076.eqiad.wmnet with reason: host reimage
18:37 jynus: restore x2 codfw replication T315271
18:26 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1076.eqiad.wmnet with OS bullseye
18:20 demon@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.24 refs T314186
18:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
18:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
18:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1078.eqiad.wmnet with OS bullseye
17:51 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1025.eqiad.wmnet with OS bullseye
17:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1078.eqiad.wmnet with reason: host reimage
17:40 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1078.eqiad.wmnet with reason: host reimage
17:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
17:27 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1078.eqiad.wmnet with OS bullseye
17:00 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - T289135
16:57 ryankemper: [WDQS] `ryankemper@wdqs1007:~$ sudo systemctl restart wdqs-blazegraph`
16:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1049.eqiad.wmnet with OS bullseye
16:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1049.eqiad.wmnet with reason: host reimage
16:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1049.eqiad.wmnet with reason: host reimage
16:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1049.eqiad.wmnet with OS bullseye
16:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
16:02 btullis@deploy1002: Finished deploy [airflow-dags/analytics@3c998da]: (no justification provided) (duration: 00m 12s)
16:02 btullis@deploy1002: Started deploy [airflow-dags/analytics@3c998da]: (no justification provided)
15:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2032.codfw.wmnet
15:48 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2032.codfw.wmnet
15:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1074.eqiad.wmnet with OS bullseye
15:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
15:29 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
15:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1074.eqiad.wmnet with reason: host reimage
15:23 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1074.eqiad.wmnet with reason: host reimage
15:12 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route-jayme (exit_code=0)
15:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1074.eqiad.wmnet with OS bullseye
15:07 jayme@cumin1001: START - Cookbook sre.discovery.service-route-jayme
15:07 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route-jayme (exit_code=0)
15:07 jayme@cumin1001: START - Cookbook sre.discovery.service-route-jayme
14:31 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route-jayme (exit_code=0)
14:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1077.eqiad.wmnet with OS bullseye
14:26 jayme@cumin1001: START - Cookbook sre.discovery.service-route-jayme
14:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1077.eqiad.wmnet with reason: host reimage
14:10 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1077.eqiad.wmnet with reason: host reimage
14:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:57 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1077.eqiad.wmnet with OS bullseye
13:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1057.eqiad.wmnet with OS bullseye
13:55 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: revert: Config: jawiki: Restrict abusefilter log view to "abusefilter-modify" user (T315199) (duration: 03m 12s)
13:41 taavi: UTC afternoon deploys done
13:40 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: jawiki: Restrict abusefilter log view to "abusefilter-modify" user (T315199) (duration: 03m 21s)
13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:38 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
13:38 jayme@cumin1001: START - Cookbook sre.discovery.service-route
13:38 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=1)
13:38 jayme@cumin1001: START - Cookbook sre.discovery.service-route
13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:36 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1057.eqiad.wmnet with reason: host reimage
13:33 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1057.eqiad.wmnet with reason: host reimage
13:24 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
13:24 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
13:24 jayme@cumin1001: START - Cookbook sre.discovery.service-route
13:24 taavi@deploy1002: Synchronized wmf-config: Config: kowiki: Change logo for 600k articles (T315127) (duration: 03m 11s)
13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:20 taavi@deploy1002: Synchronized static/images: Config: kowiki: Add logo (legacy vector and vector-2022) for 600k articles (T315127) (duration: 03m 29s)
13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1057.eqiad.wmnet with OS bullseye
13:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
13:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
13:04 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
11:24 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
11:24 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
11:08 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
11:03 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
11:02 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
10:53 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
10:50 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
10:49 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-eqiad
10:49 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
10:40 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
10:38 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
10:34 jelto@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
10:30 jelto: reimaging gitlab2003 (insetup) to test partman recipe from gerrit:823115 - T274463
09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
09:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
09:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
08:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
08:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32402 and previous config saved to /var/cache/conftool/dbconfig/20220816-074259-ladsgroup.json
07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32401 and previous config saved to /var/cache/conftool/dbconfig/20220816-074239-ladsgroup.json
07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32400 and previous config saved to /var/cache/conftool/dbconfig/20220816-072733-ladsgroup.json
07:26 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2067.codfw.wmnet
07:26 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2067.codfw.wmnet
07:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
07:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
07:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
07:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
07:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
07:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32399 and previous config saved to /var/cache/conftool/dbconfig/20220816-071227-ladsgroup.json
06:58 hashar@deploy1002: Finished deploy [integration/docroot@c142ba7]: Drop archived wikibase-vuejs-components storybook - T309872 (duration: 00m 10s)
06:58 hashar@deploy1002: Started deploy [integration/docroot@c142ba7]: Drop archived wikibase-vuejs-components storybook - T309872
06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32398 and previous config saved to /var/cache/conftool/dbconfig/20220816-065721-ladsgroup.json
06:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
06:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
06:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P32397 and previous config saved to /var/cache/conftool/dbconfig/20220816-062955-ladsgroup.json
06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maint work on old s1 master (T312984 T312863 T310011 T309311 T60674 T298560 T298555 T310485 T301312)
06:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maint work on old s1 master (T312984 T312863 T310011 T309311 T60674 T298560 T298555 T310485 T301312)
06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1163 T314380', diff saved to https://phabricator.wikimedia.org/P32396 and previous config saved to /var/cache/conftool/dbconfig/20220816-061413-ladsgroup.json
06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1118 to s1 primary and set section read-write T314380', diff saved to https://phabricator.wikimedia.org/P32395 and previous config saved to /var/cache/conftool/dbconfig/20220816-060530-ladsgroup.json
06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T314380', diff saved to https://phabricator.wikimedia.org/P32394 and previous config saved to /var/cache/conftool/dbconfig/20220816-060455-ladsgroup.json
06:04 Amir1: Starting s1 eqiad failover from db1163 to db1118 - T314380
05:43 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=(appservers|api)-ro
05:43 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(appservers|api)-ro
05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1118 with weight 0 T314380', diff saved to https://phabricator.wikimedia.org/P32393 and previous config saved to /var/cache/conftool/dbconfig/20220816-053534-ladsgroup.json
05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s1 T314380
05:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s1 T314380
05:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db[2142-2143].codfw.wmnet with reason: After-canary
05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db[2142-2143].codfw.wmnet with reason: After-canary
04:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
04:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
04:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
04:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
04:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
04:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
04:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
04:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
04:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1059.eqiad.wmnet with OS bullseye
04:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1059.eqiad.wmnet with reason: host reimage
04:11 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1059.eqiad.wmnet with reason: host reimage
03:57 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1059.eqiad.wmnet with OS bullseye
03:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - T289135
03:55 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - T289135
03:53 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - T289135
01:39 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddumps1001.wikimedia.org with OS bullseye
00:18 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: replaceableSettings g 820247 (duration: 03m 18s)
00:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
00:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
00:13 tstarling@deploy1002: Synchronized tests: config tests, for consistency g 820247 (duration: 03m 22s)
00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-15

23:20 mutante: phab2002 - manually removing service IP addresses for git-ssh.codfw.wikimedia.org which were added by puppet even after gerrit:823220 (!) T280597
22:59 mutante: search-loader1001 - killed puppet process that had been running since May
22:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
22:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
22:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
22:33 mutante: rsyncing /srv/repos and /srv/dumps from phab1001 to phab2002 before applying prod puppet role (T313360)
22:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1083.eqiad.wmnet with OS bullseye
21:54 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Revert "Enable sticky header edit A/B test for idwiki + viwiki"" (duration: 03m 37s)
21:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:45 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1083.eqiad.wmnet with reason: host reimage
21:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:42 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1083.eqiad.wmnet with reason: host reimage
21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:42 cjming@deploy1002: Synchronized php-1.39.0-wmf.23/skins/Vector/resources/skins.vector.es6: Backport: Sticky header AB test bucketing for 2 treatment buckets (T312573) (duration: 03m 05s)
21:34 ejegg: payments-wiki upgraded from 41709763 to f9f91f1f
afk: payments-wiki rolled back to 41709763
21:29 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1083.eqiad.wmnet with OS bullseye
21:22 ejegg: payments-wiki upgraded from 41709763 to f9f91f1f
21:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1080.eqiad.wmnet with OS bullseye
20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:55 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Enable sticky header edit A/B test for idwiki + viwiki" (duration: 03m 15s)
20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1080.eqiad.wmnet with reason: host reimage
20:48 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1080.eqiad.wmnet with reason: host reimage
20:35 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1080.eqiad.wmnet with OS bullseye
20:33 cjming: end of UTC late backport window
20:31 cjming@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments: Backport: WelcomeSurvey/VariantHooks: Change hook used for redirection (T313064) (duration: 04m 37s)
20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:12 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable sticky header edit A/B test for idwiki + viwiki (T312295) (duration: 03m 30s)
20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32391 and previous config saved to /var/cache/conftool/dbconfig/20220815-193541-ladsgroup.json
19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T314041)', diff saved to https://phabricator.wikimedia.org/P32390 and previous config saved to /var/cache/conftool/dbconfig/20220815-193520-ladsgroup.json
19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32389 and previous config saved to /var/cache/conftool/dbconfig/20220815-192014-ladsgroup.json
19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32388 and previous config saved to /var/cache/conftool/dbconfig/20220815-190508-ladsgroup.json
18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T314041)', diff saved to https://phabricator.wikimedia.org/P32387 and previous config saved to /var/cache/conftool/dbconfig/20220815-185002-ladsgroup.json
18:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1081.eqiad.wmnet with OS bullseye
18:40 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@230a820]: include additional deubgging information in HivePartitionRangeSensor logs (duration: 02m 08s)
18:38 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@230a820]: include additional deubgging information in HivePartitionRangeSensor logs
18:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1081.eqiad.wmnet with reason: host reimage
18:31 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ms-be2067.codfw.wmnet
18:29 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1081.eqiad.wmnet with reason: host reimage
18:24 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2067.codfw.wmnet
18:16 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1081.eqiad.wmnet with OS bullseye
18:07 herron: thanos compact process was hung, forced thanos-compact restart on thanos-fe2001
17:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1052.eqiad.wmnet with OS bullseye
17:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1052.eqiad.wmnet with reason: host reimage
17:29 pt1979@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
17:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet
17:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1052.eqiad.wmnet with reason: host reimage
17:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ms-be2067.codfw.wmnet
17:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet
17:24 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@d4137b5]: increase subgraph query SLA and remove same from drop_old_data (duration: 02m 17s)
17:22 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@d4137b5]: increase subgraph query SLA and remove same from drop_old_data
17:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1052.eqiad.wmnet with OS bullseye
17:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1082.eqiad.wmnet with OS bullseye
16:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1082.eqiad.wmnet with reason: host reimage
16:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1082.eqiad.wmnet with reason: host reimage
16:32 damilare: payments-wiki upgraded from 0894d75a to 41709763
16:27 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
16:25 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
16:23 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1082.eqiad.wmnet with OS bullseye
16:17 dancy@deploy1002: Installation of scap version "4.13.0" completed for 553 hosts
16:17 dancy@deploy1002: Installing scap version "4.13.0" for 553 hosts
16:14 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts logstash2003.codfw.wmnet
15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts logstash2003.codfw.wmnet
15:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: disk fault investigation
15:32 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: disk fault investigation
15:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2032.codfw.wmnet
15:31 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2032.codfw.wmnet
15:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
15:31 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
15:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2032.codfw.wmnet
15:31 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2032.codfw.wmnet
15:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1068.eqiad.wmnet with OS bullseye
14:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1068.eqiad.wmnet with reason: host reimage
14:36 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1068.eqiad.wmnet with reason: host reimage
14:26 hnowlan@deploy1002: Finished deploy [restbase/deploy@a571f9a]: Add blwiki T310874 (duration: 15m 42s)
14:23 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1068.eqiad.wmnet with OS bullseye
14:10 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
14:10 hnowlan@deploy1002: Started deploy [restbase/deploy@a571f9a]: Add blwiki T310874
14:10 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
14:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1070.eqiad.wmnet with OS bullseye
13:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1070.eqiad.wmnet with reason: host reimage
13:46 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1070.eqiad.wmnet with reason: host reimage
13:34 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1070.eqiad.wmnet with OS bullseye
13:29 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: de81bcb: testwikidatawiki: Add wikidata as import source (T315211) (duration: 03m 26s)
13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e277223: Revert "Revert "Remove WikibaseTermboxInteraction $wgEventLoggingSchemas entry"" (T290303) (duration: 03m 29s)
13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
10:03 Emperor: pd 1I:1:1 modify disablepd forced on ms-be2028 T315213
07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:17 urbanecm: UTC morning B&C window done
07:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a454d3b: Pin wgCheckUserLogReasonMigrationStage to read and write old (T233004) (duration: 03m 16s)
07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 43cd5ef: Add bnwiki in wgImportSources to bnwikibooks (T314820) (duration: 03m 05s)
07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T314041)', diff saved to https://phabricator.wikimedia.org/P32386 and previous config saved to /var/cache/conftool/dbconfig/20220815-070955-ladsgroup.json
07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:08 urbanecm: mwscript resetAuthenticationThrottle.php --wiki=cswiki --signup --ip='194.31.191.20' # T315141
07:06 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 7c2a393ee: dc0d62a3: 6f687bcfc: Update throttle rules (T315182, T315141) (duration: 03m 21s)
02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32385 and previous config saved to /var/cache/conftool/dbconfig/20220815-023538-ladsgroup.json
02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32384 and previous config saved to /var/cache/conftool/dbconfig/20220815-022032-ladsgroup.json
02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32383 and previous config saved to /var/cache/conftool/dbconfig/20220815-020526-ladsgroup.json
01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32382 and previous config saved to /var/cache/conftool/dbconfig/20220815-015020-ladsgroup.json

2022-08-14

08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32380 and previous config saved to /var/cache/conftool/dbconfig/20220814-085443-ladsgroup.json
08:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
08:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance

2022-08-13

13:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32379 and previous config saved to /var/cache/conftool/dbconfig/20220813-133713-ladsgroup.json
13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32378 and previous config saved to /var/cache/conftool/dbconfig/20220813-132207-ladsgroup.json
13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32377 and previous config saved to /var/cache/conftool/dbconfig/20220813-130701-ladsgroup.json
12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32376 and previous config saved to /var/cache/conftool/dbconfig/20220813-125156-ladsgroup.json

2022-08-12

23:41 mutante: wikistats-bullseye:~$ /usr/lib/wikistats/update.php wp prefix blk ; /usr/lib/wikistats/update.php wp prefix kcg T315121
23:38 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_initsitestats.timer T315121
22:14 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
21:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1071.eqiad.wmnet with OS bullseye
21:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb2002-dev.codfw.wmnet with OS bullseye
21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
21:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1071.eqiad.wmnet with OS bullseye
21:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
21:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
21:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1053.eqiad.wmnet with OS bullseye
20:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb2002-dev.codfw.wmnet with OS bullseye
20:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
20:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
20:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1053.eqiad.wmnet with OS bullseye
20:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1048.eqiad.wmnet with OS bullseye
19:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
19:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
19:42 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1048.eqiad.wmnet with OS bullseye
19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32375 and previous config saved to /var/cache/conftool/dbconfig/20220812-193822-ladsgroup.json
19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312863)', diff saved to https://phabricator.wikimedia.org/P32374 and previous config saved to /var/cache/conftool/dbconfig/20220812-193801-ladsgroup.json
19:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1054.eqiad.wmnet with OS bullseye
19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32373 and previous config saved to /var/cache/conftool/dbconfig/20220812-192255-ladsgroup.json
19:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
19:09 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32372 and previous config saved to /var/cache/conftool/dbconfig/20220812-190749-ladsgroup.json
18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
18:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
18:54 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1054.eqiad.wmnet with OS bullseye
18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312863)', diff saved to https://phabricator.wikimedia.org/P32371 and previous config saved to /var/cache/conftool/dbconfig/20220812-185243-ladsgroup.json
18:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1066.eqiad.wmnet with OS bullseye
18:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
18:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
18:08 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1066.eqiad.wmnet with OS bullseye
18:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1064.eqiad.wmnet with OS bullseye
17:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
17:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
17:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1064.eqiad.wmnet with OS bullseye
17:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts netmon2002.wikimedia.org
17:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon2002.wikimedia.org
17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bullseye
17:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
17:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
16:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bullseye
16:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1067.eqiad.wmnet with OS bullseye
16:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2003-dev.wikimedia.org
16:21 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:16 andrew@cumin1001: START - Cookbook sre.dns.netbox
16:11 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2003-dev.wikimedia.org
16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['netmon2002.wikimedia.org']
16:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
15:58 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
15:43 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1067.eqiad.wmnet with OS bullseye
15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
15:31 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['netmon2002.wikimedia.org']
15:31 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
15:07 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts netmon1002.wikimedia.org
15:07 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon1002.wikimedia.org
15:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1061.eqiad.wmnet with OS bullseye
14:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=varnish-fe
14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-tls
14:43 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
14:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1061.eqiad.wmnet with OS bullseye
14:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
14:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1063.eqiad.wmnet with OS bullseye
14:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
14:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
13:47 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1063.eqiad.wmnet with OS bullseye
13:41 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
06:01 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic10[8-9][0-9].*
05:54 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic110.*
01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T312863)', diff saved to https://phabricator.wikimedia.org/P32369 and previous config saved to /var/cache/conftool/dbconfig/20220812-010312-ladsgroup.json
01:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
01:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312863)', diff saved to https://phabricator.wikimedia.org/P32368 and previous config saved to /var/cache/conftool/dbconfig/20220812-010233-ladsgroup.json
00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32367 and previous config saved to /var/cache/conftool/dbconfig/20220812-004727-ladsgroup.json
00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32366 and previous config saved to /var/cache/conftool/dbconfig/20220812-003221-ladsgroup.json
00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312863)', diff saved to https://phabricator.wikimedia.org/P32365 and previous config saved to /var/cache/conftool/dbconfig/20220812-001715-ladsgroup.json

2022-08-11

21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:04 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: revert Define default value for "wmgSiteLogoVariants" (T305692 T308620) (duration: 03m 15s)
20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:47 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Define default value for "wmgSiteLogoVariants" (T305692 T308620) (duration: 03m 07s)
20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:29 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/modules/ve-mw/preinit/ve.init.mw.DesktopArticleTarget.init.js: Backport: Do not show incompatible skin warning when page is not editable (T314952) (duration: 03m 16s)
20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:23 mutante: merging change on prod phabricator host to allow scap deployment, part 1
19:42 damilare: payments-wiki upgraded from cf5e1848 to 0894d75a
19:41 mutante: disabling puppet on C:profile::phabricator::main
19:20 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: upgrade to 3.11.13 T309896 - mvernon@cumin2002
17:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
17:58 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Fix labtestwiki database name servers (T310795) (duration: 03m 39s)
17:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
17:52 sukhe: testing ATS 9.1.3-1wm1 on cp3064: T309651
17:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host netmon2002.mgmt.codfw.wmnet with reboot policy FORCED
17:46 sukhe: testing ATS 9.1.3-1wm1 on cp3064: T3096515
17:41 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon2002.mgmt.codfw.wmnet with reboot policy FORCED
17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:38 sukhe: testing ATS 9.1.3-1wm1 on cp1090: T309651
17:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host netmon2002
17:34 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host netmon2002
17:33 sukhe: testing ATS 9.1.3-1wm1 on cp3065: T309651
17:28 sukhe: testing ATS 9.1.3-1wm1 on cp1089: T309651
17:19 bking@cumin1001: conftool action : set/weight=10:pooled=no; selector: service=elasticsearch-omega-ssl,name=elastic1100.eqiad.wmnet
17:18 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=elasticsearch-omega-ssl,name=elastic1100.eqiad.wmnet
17:15 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=search-omega-https,name=elastic1100.eqiad.wmnet
16:35 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: upgrade to 3.11.13 T309896 - mvernon@cumin2002
16:30 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: upgrade to 3.11.13 T309896 - mvernon@cumin2002
16:29 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: T309810
16:29 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: T309810
16:26 inflatador: bking@elastic1054 attempting to ban elastic1100-1102 from cluster due to firewall issues
16:13 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=search-omega-https,name=elastic1100.eqiad.wmnet
16:12 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic1100
15:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P32364 and previous config saved to /var/cache/conftool/dbconfig/20220811-145823-ladsgroup.json
14:55 inflatador: bking@cumin1001 running puppet agent across eqiad elastic hosts
14:48 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P32362 and previous config saved to /var/cache/conftool/dbconfig/20220811-144318-ladsgroup.json
14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P32361 and previous config saved to /var/cache/conftool/dbconfig/20220811-142813-ladsgroup.json
14:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1003.wikimedia.org
14:28 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:24 andrew@cumin1001: START - Cookbook sre.dns.netbox
14:19 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1003.wikimedia.org
14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1004.wikimedia.org
14:18 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:17 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to the old templatelinks fields in s2 (T312865) (duration: 03m 25s)
14:16 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
14:15 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:13 andrew@cumin1001: START - Cookbook sre.dns.netbox
14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P32360 and previous config saved to /var/cache/conftool/dbconfig/20220811-141309-ladsgroup.json
14:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:11 awight: EU backport window complete
14:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:10 awight@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/DiscussionTools/includes/CommentFormatter.php: Backport: CommentFormatter: Set 'data-mw-comment' even when reply tool disabled (T314707) (duration: 03m 31s)
14:09 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1004.wikimedia.org
14:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:52 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: upgrade to 3.11.13 T309896 - mvernon@cumin2002
13:50 awight@deploy1002: Synchronized wmf-config: Config: Revert "Revert "testwiki: Add mediawiki.web_ui.interactions stream"" (duration: 03m 10s)
13:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:36 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1060.eqiad.wmnet with OS bullseye
13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:36 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: trwikiquote: Install WikiLove extension (T314895) (duration: 03m 30s)
13:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:33 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host logstash2003.codfw.wmnet
13:25 awight@deploy1002: Synchronized static/images: Config: Revert "trwiki: Change old and new vector logos for 500k articles" (part 3) (duration: 03m 09s)
13:21 awight@deploy1002: Synchronized logos/: Config: Revert "trwiki: Change old and new vector logos for 500k articles" (part 2) (duration: 03m 09s)
13:19 topranks: merging CR821781 to expose additional network info in puppet facts
13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:18 awight@deploy1002: Synchronized wmf-config/: Config: Revert "trwiki: Change old and new vector logos for 500k articles" (part 1) (duration: 03m 13s)
13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1060.eqiad.wmnet with reason: host reimage
13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:11 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1060.eqiad.wmnet with reason: host reimage
13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:08 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable editor line numbering on all namespaces, for twwiki (T302852) (duration: 03m 42s)
12:56 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1060.eqiad.wmnet with OS bullseye
12:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
12:49 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
12:46 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2018.codfw.wmnet
12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase202[367].codfw.wmnet
12:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
12:17 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
12:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
12:16 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
12:13 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2003.codfw.wmnet
12:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
12:10 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
12:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
09:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
09:32 godog: arm keyholder on netmon2001
09:09 jbond: update gnutls28 on bullseye systems
09:00 jbond: update unzip
08:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:12 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr3-ulsfo:xe-0/1/1
08:06 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr3-ulsfo:xe-0/1/1
07:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr3-ulsfo:xe-0/1/1
07:57 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr3-ulsfo:xe-0/1/1
07:55 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-wikikube-rw,name=codfw
07:51 vgutierrez: rolling restart of pybal in eqsin and ulsfo
07:24 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
07:24 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-timeline
07:23 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=inference
07:19 _joe_: pooling all services in codfw
07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T312863)', diff saved to https://phabricator.wikimedia.org/P32357 and previous config saved to /var/cache/conftool/dbconfig/20220811-070312-ladsgroup.json
07:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
07:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T312863)', diff saved to https://phabricator.wikimedia.org/P32356 and previous config saved to /var/cache/conftool/dbconfig/20220811-070252-ladsgroup.json
06:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32355 and previous config saved to /var/cache/conftool/dbconfig/20220811-064746-ladsgroup.json
06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32354 and previous config saved to /var/cache/conftool/dbconfig/20220811-063240-ladsgroup.json
06:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
06:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T312863)', diff saved to https://phabricator.wikimedia.org/P32353 and previous config saved to /var/cache/conftool/dbconfig/20220811-061734-ladsgroup.json
06:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maint
06:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maint
06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1162 (T314368 T298555 T312863 T310011 T309311 T60674 T298560 T303603 T310485)', diff saved to https://phabricator.wikimedia.org/P32352 and previous config saved to /var/cache/conftool/dbconfig/20220811-060625-ladsgroup.json
06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1122 to s2 primary and set section read-write T314368', diff saved to https://phabricator.wikimedia.org/P32351 and previous config saved to /var/cache/conftool/dbconfig/20220811-060113-ladsgroup.json
06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T314368', diff saved to https://phabricator.wikimedia.org/P32350 and previous config saved to /var/cache/conftool/dbconfig/20220811-060042-ladsgroup.json
06:00 Amir1: Starting s2 eqiad failover from db1162 to db1122 - T314368
05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1122 with weight 0 T314368', diff saved to https://phabricator.wikimedia.org/P32349 and previous config saved to /var/cache/conftool/dbconfig/20220811-051913-ladsgroup.json
05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 T314368
05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s2 T314368
m: chown -R librenms /srv/librenms/rrd/ on netmon1003 T314972
03:51 cwhite: chown librenms /srv/librenms/rrd/* on netmon1003 T314972
02:55 ejegg: civicrm upgraded from 1f91ac2d to 92467234
02:46 ejegg: updated process-control yaml files with @wmff alias
02:08 ejegg: civicrm rolled back from 92467234 to 1f91ac2d
02:05 ejegg: civicrm upgraded from 1f91ac2d to 92467234
01:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
01:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
01:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
01:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
01:38 tstarling@deploy1002: Synchronized wmf-config/logging.php: (no justification provided) (duration: 03m 25s)
01:19 tstarling@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers|api)-ro,name=codfw
01:19 tstarling@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=varnish-fe
00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-be
00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-tls
00:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2042.codfw.wmnet with reason: host down; depooled and will debug tomorrow
00:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2042.codfw.wmnet with reason: host down; depooled and will debug tomorrow

2022-08-10

21:25 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1016.eqiad.wmnet
21:23 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1014.eqiad.wmnet
21:10 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 16 hosts with reason: T309810
21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 16 hosts with reason: T309810
21:09 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: T309810
21:09 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: T309810
21:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:00 cjming: end of UTC late backport window
20:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:59 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused $wgEnableMWSuggest (duration: 03m 04s)
20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:56 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable new topic tool on dewiki (T313699) (duration: 03m 01s)
20:34 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: testwiki: set $wgCdnMatchParameterOrder to false (T314868) (duration: 03m 20s)
20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:08 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Start writing to cuc_actor everywhere except s4 and s8 (T233004) (duration: 03m 15s)
20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
19:51 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2053-2054].codfw.wmnet
19:51 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2053-2054].codfw.wmnet
19:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2019-2020].codfw.wmnet
19:35 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2019-2020].codfw.wmnet
19:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2016-2018].codfw.wmnet
19:35 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2016-2018].codfw.wmnet
19:34 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2036.codfw.wmnet
19:34 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2036.codfw.wmnet
19:28 sukhe: testing ATS 9.1.3-1wm1 on cp4026: T309651
19:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1087.eqiad.wmnet with OS bullseye
19:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1086.eqiad.wmnet with OS bullseye
18:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1087.eqiad.wmnet with reason: host reimage
18:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1086.eqiad.wmnet with reason: host reimage
18:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1087.eqiad.wmnet with reason: host reimage
18:49 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1086.eqiad.wmnet with reason: host reimage
18:47 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
18:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1087.eqiad.wmnet with OS bullseye
18:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1086.eqiad.wmnet with OS bullseye
18:22 urandom: truncating Cassandra hints (eqiad datacenter) -- T314941
18:13 urandom: truncating codfw Cassandra hints (eqiad datacenter) -- T314941
18:07 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main2005.codfw.wmnet
18:07 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kafka-main2005.codfw.wmnet
18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool D8 DBs after PDU maint (T310146)', diff saved to https://phabricator.wikimedia.org/P32346 and previous config saved to /var/cache/conftool/dbconfig/20220810-180529-ladsgroup.json
17:42 otto@deploy1002: Finished deploy [analytics/refinery@6e47e0e]: Add missing changes to the deletion script - T270433 - [analytics/refinery@6e47e0e] (duration: 05m 28s)
17:39 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labweb1002.wikimedia.org
17:39 fnegri@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:36 otto@deploy1002: Started deploy [analytics/refinery@6e47e0e]: Add missing changes to the deletion script - T270433 - [analytics/refinery@6e47e0e]
17:35 fnegri@cumin1001: START - Cookbook sre.dns.netbox
17:34 otto@deploy1002: Finished deploy [analytics/refinery@6e47e0e] (hadoop-test): Add missing changes to the deletion script - T270433 - TEST [analytics/refinery@6e47e0e] (duration: 04m 19s)
17:30 fnegri@cumin1001: START - Cookbook sre.hosts.decommission for hosts labweb1002.wikimedia.org
17:30 otto@deploy1002: Started deploy [analytics/refinery@6e47e0e] (hadoop-test): Add missing changes to the deletion script - T270433 - TEST [analytics/refinery@6e47e0e]
17:09 dzahn@cumin2002: START - Cookbook sre.dns.netbox
17:08 otto@deploy1002: Started deploy [analytics/refinery@d4dd7e4] (hadoop-test): Add safety limits to refinery-drop-older-than - T270433 - TEST [analytics/refinery@d4dd7e4]
17:06 sukhe: testing ATS 9.1.3-1wm1 on cp4032: T309651
17:06 urandom: flushing RESTBase Cassandra tables -row B- to (temporarily) free instance-data space -- T314941
17:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on krb2001.codfw.wmnet with reason: btullis codfw maintenance
17:05 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on krb2001.codfw.wmnet with reason: btullis codfw maintenance
17:04 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gerrit2001.wikimedia.org
17:02 sukhe: testing ATS 9.1.3-1wm1 on cp6008: T309651
16:56 sukhe: testing ATS 9.1.3-1wm1 on cp6016: T309651
16:55 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labweb1001.wikimedia.org
16:55 fnegri@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:32 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gerrit2001.wikimedia.org
16:32 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:32 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2013-2014].codfw.wmnet
16:31 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes[2013-2014].codfw.wmnet
16:31 jelto: kubectl uncordon kubernetes2014.codfw.wmnet
16:31 fnegri@cumin1001: START - Cookbook sre.dns.netbox
16:30 jelto: kubectl uncordon kubernetes2013.codfw.wmnet
16:29 urandom: restarting Cassandra (RESTBase) -row A- to apply r822110 -- T314941
16:27 dzahn@cumin2002: START - Cookbook sre.dns.netbox
16:25 fnegri@cumin1001: START - Cookbook sre.hosts.decommission for hosts labweb1001.wikimedia.org
16:23 mutante: shutting down gerrit2001
16:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2034-2035].codfw.wmnet
16:23 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2034-2035].codfw.wmnet
16:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2016-2018].codfw.wmnet
16:22 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2016-2018].codfw.wmnet
16:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
16:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=sessionstore2003.codfw.wmnet
16:13 sukhe: reprepro -C component/trafficserver9 include buster-wikimedia trafficserver_9.1.3-1wm1_amd64.changes: T309651
16:13 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gerrit2001.wikimedia.org
16:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2039,2050,2056,2059].codfw.wmnet,thanos-be2004.codfw.wmnet with reason: PDU work
16:10 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2039,2050,2056,2059].codfw.wmnet,thanos-be2004.codfw.wmnet with reason: PDU work
16:09 urandom: flushing tables in row D (RESTBase Cassandra cluster) -- T314941
15:54 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gitlab-runner2004.codfw.wmnet
15:54 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for gitlab-runner2004.codfw.wmnet
15:53 sukhe: poweroff cp2041, 42 for PDU ugprade: rack D7
15:51 urandom: flushing tables in row B (RESTBase Cassandra cluster) -- T314941
15:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2004.codfw.wmnet with reason: PDU maintenance
15:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2004.codfw.wmnet with reason: PDU maintenance
15:46 urandom: flushing tables in row A (RESTBase Cassandra cluster) -- T314941
15:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2012.codfw.wmnet with reason: btullis codfw maintenance
15:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2041-2042].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
15:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2012.codfw.wmnet with reason: btullis codfw maintenance
15:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2041-2042].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
15:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2011.codfw.wmnet with reason: btullis codfw maintenance
15:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2011.codfw.wmnet with reason: btullis codfw maintenance
15:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2010.codfw.wmnet with reason: btullis codfw maintenance
15:45 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2010.codfw.wmnet with reason: btullis codfw maintenance
15:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2009.codfw.wmnet with reason: btullis codfw maintenance
15:45 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2009.codfw.wmnet with reason: btullis codfw maintenance
15:37 urandom: (ephemerally) increasing hinted hand-off delivery rate limit to 16KB, RESTBase eqiad nodes -- T314941
15:34 jbond: remove puppetmaster[12]002 from production
15:30 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main2004.codfw.wmnet
15:30 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for kafka-main2004.codfw.wmnet
15:20 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2051-2052].codfw.wmnet
15:20 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2051-2052].codfw.wmnet
15:17 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc-gp2003.codfw.wmnet
15:17 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc-gp2003.codfw.wmnet
15:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2033.codfw.wmnet
15:16 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2033.codfw.wmnet
15:14 _joe_: power off krb2002
15:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb2002.codfw.wmnet with reason: PDU maintenance
15:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on krb2002.codfw.wmnet with reason: PDU maintenance
15:13 _joe_: shutting down rdb2010,puppetmaster2002 for d5 maintenance
15:02 jelto: power off mc2035
15:01 jelto: power off mc2034
15:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc2035.codfw.wmnet with reason: PDU swap
15:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mc2035.codfw.wmnet with reason: PDU swap
15:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc2034.codfw.wmnet with reason: PDU swap
15:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mc2034.codfw.wmnet with reason: PDU swap
14:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: PDU Maint (T310146)
14:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: PDU Maint (T310146)
14:38 urandom: disabling reserved space on eqiad nodes (RESTBase), /dev/md2 (aka /srv/cassandra/instance-data) -- T314941
14:28 jelto: power off kafka-main2004 gracefully
14:28 hnowlan: shutting down sessionstore2003
14:27 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=sessionstore2003.codfw.wmnet
14:27 sukhe: power off cp2039, cp2040 for PDU upgrade: rack D
14:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: PDU maintenance
14:27 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: PDU maintenance
14:25 jelto: power off mc-gp2003
14:25 jelto: power off mc2033
14:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on kafka-main2004.codfw.wmnet with reason: PDU swap
14:23 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on kafka-main2004.codfw.wmnet with reason: PDU swap
14:23 sukhe: depool codfw for PDU upgrade: rack D
14:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on mc-gp2003.codfw.wmnet with reason: PDU swap
14:23 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on mc-gp2003.codfw.wmnet with reason: PDU swap
14:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on mc2033.codfw.wmnet with reason: PDU swap
14:23 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on mc2033.codfw.wmnet with reason: PDU swap
14:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp20[39|40]\.codfw\.wmnet,service=ats-tls
14:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2039-2040].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
14:13 urandom: flushing Cassandra tables, restbase1030
14:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2039-2040].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
14:13 urandom: flushing Cassandra tables, restbase1019
14:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on dns2002.wikimedia.org with reason: shutdown for PDU upgrade: rack D4
14:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on dns2002.wikimedia.org with reason: shutdown for PDU upgrade: rack D4
14:11 urandom: flushing Cassandra tables, restbase1017 1018 1021 1024 1025 1026 1028 1029
14:05 urandom: flushing tables, restbase1016
13:52 hnowlan: powered up restbase2018
13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on netmon1003.wikimedia.org with reason: pdu
13:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on netmon1003.wikimedia.org with reason: pdu
13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on logstash2003.codfw.wmnet with reason: pdu
13:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on logstash2003.codfw.wmnet with reason: pdu
13:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on logstash2029.codfw.wmnet with reason: pdu
13:31 root@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on logstash2029.codfw.wmnet with reason: pdu
13:30 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: T310146
13:30 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: T310146
13:17 elukey: powering on restbase2027
13:12 elukey: powering on restbase2026
13:12 _joe_: powering on restbase2023
13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T312863)', diff saved to https://phabricator.wikimedia.org/P32343 and previous config saved to /var/cache/conftool/dbconfig/20220810-130108-ladsgroup.json
13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
13:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
12:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[2072,2084-2085].codfw.wmnet with reason: T310146
12:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[2072,2084-2085].codfw.wmnet with reason: T310146
12:27 jbond: remove confd from serveres that shouldn;t have it
12:05 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/Echo/maintenance/removeOrphanedEvents.php: Backport: Run clean ups with removeOrphanedEvents in major batches (T310428) (duration: 03m 32s)
11:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
11:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
11:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
11:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
11:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
10:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
10:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
10:37 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2181-2182].codfw.wmnet with reason: D6 PDU maint (T310146)
10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2181-2182].codfw.wmnet with reason: D6 PDU maint (T310146)
10:26 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on maps2010.codfw.wmnet with reason: PDU maintenance
10:26 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on maps2010.codfw.wmnet with reason: PDU maintenance
10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
10:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
10:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
10:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase2018.codfw.wmnet with reason: PDU maintenance
10:24 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2018.codfw.wmnet
10:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase2018.codfw.wmnet with reason: PDU maintenance
10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ml-serve2008.codfw.wmnet with reason: PDU maintenance
10:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ml-serve2008.codfw.wmnet with reason: PDU maintenance
10:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ores2009.codfw.wmnet with reason: PDU maintenance
10:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ores2009.codfw.wmnet with reason: PDU maintenance
10:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ores2008.codfw.wmnet with reason: PDU maintenance
10:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ores2008.codfw.wmnet with reason: PDU maintenance
10:03 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase202[367].codfw.wmnet
10:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase[2023,2026-2027].codfw.wmnet with reason: PDU maintenance
10:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase[2023,2026-2027].codfw.wmnet with reason: PDU maintenance
09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: D8 PDU Maint (T310146)
09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: D8 PDU Maint (T310146)
09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D8 DBs for PDU maint (T310146)', diff saved to https://phabricator.wikimedia.org/P32341 and previous config saved to /var/cache/conftool/dbconfig/20220810-095059-ladsgroup.json
09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2101,2130,2140].codfw.wmnet,dbproxy2004.codfw.wmnet with reason: D6 PDU maint (T310146)
09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2101,2130,2140].codfw.wmnet,dbproxy2004.codfw.wmnet with reason: D6 PDU maint (T310146)
09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D6 dbs (T310146)', diff saved to https://phabricator.wikimedia.org/P32340 and previous config saved to /var/cache/conftool/dbconfig/20220810-093433-ladsgroup.json
09:31 jelto: depool services in codfw for upcoming PDU replacement - T309956
09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
09:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
09:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
09:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
09:28 jynus: shutdown backup2007 before pdu upgrade T310146
09:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:15 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/maintenance/namespaceDupes.php: Backport: maintenance: Add support for links migration to namespaceDupes.php (T314711) (duration: 03m 18s)
09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2093,2120,2129,2172].codfw.wmnet with reason: D5 PDU maint (T310146)
09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2093,2120,2129,2172].codfw.wmnet with reason: D5 PDU maint (T310146)
09:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D5 dbs (T310146)', diff saved to https://phabricator.wikimedia.org/P32339 and previous config saved to /var/cache/conftool/dbconfig/20220810-091038-ladsgroup.json
08:49 jynus: shutdown dbprov2003 before pdu upgrade T310146
08:49 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.debug (exit_code=99) for Netbox interface ID cr1-drmrs:xe-0/1/2
08:48 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
08:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2028.codfw.wmnet
08:48 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2028.codfw.wmnet
08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P32337 and previous config saved to /var/cache/conftool/dbconfig/20220810-084222-ladsgroup.json
08:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
08:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
08:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:35 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to the old templatelinks fields in s5 (T312865) (duration: 03m 29s)
08:32 jelto: power off gitlab-runner2004
08:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:30:00 on gitlab-runner2004.codfw.wmnet with reason: PDU swap
08:31 root@cumin1001: START - Cookbook sre.hosts.downtime for 8:30:00 on gitlab-runner2004.codfw.wmnet with reason: PDU swap
08:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2028.codfw.wmnet with reason: Trying to fix full /
08:28 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2028.codfw.wmnet with reason: Trying to fix full /
08:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr1-drmrs:xe-0/1/2
08:27 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P32336 and previous config saved to /var/cache/conftool/dbconfig/20220810-082718-ladsgroup.json
08:25 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.debug (exit_code=99) for Netbox interface ID cr1-drmrs:xe-0/1/2
08:25 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
08:24 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.debug (exit_code=99) for Netbox interface ID cr1-drmrs:xe-0/1/2
08:24 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
08:23 kart_: Run: mwscript namespaceDupes.php arywiki --fix (T291737)
08:13 jynus: restart replication on db1117:m1 T309074
08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P32335 and previous config saved to /var/cache/conftool/dbconfig/20220810-081213-ladsgroup.json
08:09 kartik@deploy1002: Finished scap: Backport: arywiki: change namespace translations, add unchanged namespaces and add old translations as aliases (T291737) (duration: 10m 37s)
07:59 kartik@deploy1002: Started scap: Backport: arywiki: change namespace translations, add unchanged namespaces and add old translations as aliases (T291737)
07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P32334 and previous config saved to /var/cache/conftool/dbconfig/20220810-075708-ladsgroup.json
07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P32333 and previous config saved to /var/cache/conftool/dbconfig/20220810-075636-ladsgroup.json
07:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
07:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
07:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:51 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:46 dcaro@cumin1001: START - Cookbook sre.dns.netbox
07:39 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:34 dcaro@cumin1001: START - Cookbook sre.dns.netbox
07:33 godog: depool thanos-fe2001 for debugging
07:11 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SectionTranslation on testwiki with new MT support from Google (T313296) (duration: 05m 44s)
07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
05:24 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on kubernetes[2013-2014].codfw.wmnet with reason: PDU maintenance
05:24 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on kubernetes[2013-2014].codfw.wmnet with reason: PDU maintenance
05:19 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on parse[2016-2020].codfw.wmnet with reason: PDU maintenance
05:19 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on parse[2016-2020].codfw.wmnet with reason: PDU maintenance
05:12 _joe_: starting to shut down servers in codfw for the PDU maintenance
05:09 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 10 hosts with reason: PDU maintenance
05:09 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 10 hosts with reason: PDU maintenance
05:09 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on mc-gp2003.codfw.wmnet with reason: PDU maintenance
05:09 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on mc-gp2003.codfw.wmnet with reason: PDU maintenance
05:06 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on mc2033.codfw.wmnet with reason: PDU maintenance
05:06 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on mc2033.codfw.wmnet with reason: PDU maintenance
05:05 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 7 hosts with reason: PDU maintenance
05:05 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 7 hosts with reason: PDU maintenance
02:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-09

23:17 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1011.eqiad.wmnet
23:07 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
23:06 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
22:51 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
22:51 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
22:49 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
22:49 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
22:46 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1015.eqiad.wmnet
22:31 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
22:31 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
22:28 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
22:02 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
22:02 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
21:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2006.codfw.wmnet with reason: T310146
21:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2006.codfw.wmnet with reason: T310146
21:53 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
21:52 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
21:50 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
21:49 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
21:43 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
21:43 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
21:43 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
21:43 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
21:43 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
21:43 bking@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
21:08 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:00 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1014.eqiad.wmnet
20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T312863)', diff saved to https://phabricator.wikimedia.org/P32332 and previous config saved to /var/cache/conftool/dbconfig/20220809-205548-ladsgroup.json
20:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1014.eqiad.wmnet
20:51 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1014.eqiad.wmnet
20:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32331 and previous config saved to /var/cache/conftool/dbconfig/20220809-204042-ladsgroup.json
20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32330 and previous config saved to /var/cache/conftool/dbconfig/20220809-202536-ladsgroup.json
20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T312863)', diff saved to https://phabricator.wikimedia.org/P32329 and previous config saved to /var/cache/conftool/dbconfig/20220809-201030-ladsgroup.json
19:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: T314890
19:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: T314890
19:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: T314890
19:56 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: T314890
19:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: T314890
19:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: T314890
19:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
19:36 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
19:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
18:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
17:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1072.eqiad.wmnet with OS bullseye
17:29 vgutierrez: test trafficserver 9.1.2-1wm2 in cp6016 - T309651
17:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1072.eqiad.wmnet with reason: host reimage
17:13 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1072.eqiad.wmnet with reason: host reimage
17:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1072.eqiad.wmnet with OS bullseye
16:54 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
16:54 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
16:53 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
16:53 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
16:26 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
16:26 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
16:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1069.eqiad.wmnet with OS bullseye
15:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1069.eqiad.wmnet with reason: host reimage
15:42 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1069.eqiad.wmnet with reason: host reimage
15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
15:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:30 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1069.eqiad.wmnet with OS bullseye
15:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
15:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1058.eqiad.wmnet with OS bullseye
15:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1058.eqiad.wmnet with reason: host reimage
15:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1058.eqiad.wmnet with reason: host reimage
14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
m: finished running 'homer "status:active" commit "netmon: Add the netmon1003 host as a syslog destination"' in the cumin1001 host. Homer reported no errors.
14:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1058.eqiad.wmnet with OS bullseye
14:28 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=codfw
13:57 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
13:57 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
m: Add the new netmon1003 host as a syslog destination in homer templates/common/system.conf https://gerrit.wikimedia.org/r/c/operations/homer/public/+/819124
m: Successfully ran '# run-puppet-merge' in the netmon1002 and netmon1003 hosts.
m: Running '# run-puppet-agent' in the netmon1003 host
m: Running '# run-puppet-agent' in the netmon1002 host
13:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
13:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
m: puppet-merge on puppetmaster2004.codfw.wmnet for patch 819179 succeeded
m: Set netmon1003 as netmon_server and netmon1002 as a netmon_servers_failover in the Puppet repository https://gerrit.wikimedia.org/r/c/operations/puppet/+/819179
m: authdns updated successfully
m: Had to revert https://gerrit.wikimedia.org/r/c/operations/dns/+/819177 because I rebased my changes incorrectly, sent the new patch in https://gerrit.wikimedia.org/r/c/operations/dns/+/821746
m: running '# authdns-update' in ns0.wikimedia.org
m: Flip DNS for LibreNMS and Smokeping from netmon1002 to netmon1003 https://gerrit.wikimedia.org/r/c/operations/dns/+/819177
13:23 jynus: stop replication on db1117:m1 T309074
m: netmon1002 to netmon1003 failover
13:17 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
13:16 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
10:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
09:53 vgutierrez: rolling restart of pybal in eqsin - T310070
09:25 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:12 vgutierrez: rolling restart of pybal in codfw - T310070
08:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
08:30 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
08:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
08:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
08:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
08:24 jynus: starting data check using es1021 and es2021, expect increased read traffic T314559
08:21 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
06:19 Amir1: dbmaint s5@eqiad (T312863 T312984 T310011 T310485)
06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1130.eqiad.wmnet with reason: Maint
06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1130.eqiad.wmnet with reason: Maint
06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 T314370', diff saved to https://phabricator.wikimedia.org/P32323 and previous config saved to /var/cache/conftool/dbconfig/20220809-060836-ladsgroup.json
06:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write T314370', diff saved to https://phabricator.wikimedia.org/P32322 and previous config saved to /var/cache/conftool/dbconfig/20220809-060159-ladsgroup.json
06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T314370', diff saved to https://phabricator.wikimedia.org/P32321 and previous config saved to /var/cache/conftool/dbconfig/20220809-060105-ladsgroup.json
06:00 Amir1: Starting s5 eqiad failover from db1130 to db1100 - T314370
05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 T314370', diff saved to https://phabricator.wikimedia.org/P32320 and previous config saved to /var/cache/conftool/dbconfig/20220809-051251-ladsgroup.json
05:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s5 T314370
05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 22 hosts with reason: Primary switchover s5 T314370
02:42 ejegg: SmashPig upgraded from 9b97ea15 to 13e9e9cc
02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T312863)', diff saved to https://phabricator.wikimedia.org/P32318 and previous config saved to /var/cache/conftool/dbconfig/20220809-023113-ladsgroup.json
02:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
02:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32317 and previous config saved to /var/cache/conftool/dbconfig/20220809-023052-ladsgroup.json
02:28 ejegg: payments-wiki upgraded from 6880236d to cf5e1848
02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32316 and previous config saved to /var/cache/conftool/dbconfig/20220809-021546-ladsgroup.json
02:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32315 and previous config saved to /var/cache/conftool/dbconfig/20220809-020040-ladsgroup.json
01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32314 and previous config saved to /var/cache/conftool/dbconfig/20220809-014534-ladsgroup.json

2022-08-08

23:52 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: clean up testwiki experiments T314750 (duration: 03m 19s)
23:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
23:46 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: clean up testwiki experiments T314750 (duration: 03m 27s)
23:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
23:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
23:32 eileen___: config revision changed from f5668044 to 787cd0e0<eileen___> eileen
23:32 eileen___: civicrm upgraded from 497bddf7 to 1f91ac2d
22:16 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
22:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic1065.eqiad.wmnet with OS bullseye
21:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1065.eqiad.wmnet with reason: host reimage
21:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1065.eqiad.wmnet with reason: host reimage
21:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1065.eqiad.wmnet with OS bullseye
21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1062.eqiad.wmnet with OS bullseye
20:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1062.eqiad.wmnet with reason: host reimage
20:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1062.eqiad.wmnet with reason: host reimage
20:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1062.eqiad.wmnet with OS bullseye
20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:29 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
20:28 cjming: end of UTC late backport window
20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:27 cjming@deploy1002: Synchronized php-1.39.0-wmf.23/skins/Vector/resources/skins.vector.styles/layouts/grid.less: Backport: Fix grid blowout bug (T314756) (duration: 03m 26s)
20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable sticky header edit A/B test for pilot wikis (T312296) (duration: 03m 35s)
20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
17:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1088.eqiad.wmnet with OS bullseye
17:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1088.eqiad.wmnet with reason: host reimage
17:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1088.eqiad.wmnet with reason: host reimage
17:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1088.eqiad.wmnet with OS bullseye
16:54 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1085.eqiad.wmnet with OS bullseye
16:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
16:39 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:38 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
16:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS bullseye
16:24 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic1085.eqiad.wmnet with OS bullseye
16:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
16:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
16:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
16:14 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
16:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
16:10 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
16:09 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
16:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS bullseye
16:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1084.eqiad.wmnet with OS bullseye
15:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
15:47 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1084.eqiad.wmnet with reason: host reimage
15:46 sukhe: upload reprepro -C main include bullseye-wikimedia python-pynetbox_6.6.0-1+wmf11u1_amd64.changes
15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1084.eqiad.wmnet with reason: host reimage
15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maint
15:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maint
15:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1084.eqiad.wmnet with OS bullseye
14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:55 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: T314256
14:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: T314256
14:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:11 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
13:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
12:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
12:56 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 77fd5ab: Growth: Add new rights to wgAvailableRights (duration: 03m 24s)
12:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1102.eqiad.wmnet
12:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
12:06 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments/: 3eaf155: MentorTools: Do not use MentorWeightManager (T314362) (duration: 03m 31s)
12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
11:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1102.eqiad.wmnet
11:21 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2022.codfw.wmnet
11:21 jelto: kubectl uncordon kubernetes2022.codfw.wmnet
10:43 Amir1: Removing db2079 from orchestrator (T313885)
10:39 Amir1: Removing db2079 from zarcillo (T313885)
10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2079.codfw.wmnet
10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:30 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2079.codfw.wmnet
10:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2079.codfw.wmnet with reason: Decom
10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2079.codfw.wmnet with reason: Decom
08:41 jbond: deploy libtirpc update
07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32310 and previous config saved to /var/cache/conftool/dbconfig/20220808-075723-ladsgroup.json
07:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
07:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T312863)', diff saved to https://phabricator.wikimedia.org/P32309 and previous config saved to /var/cache/conftool/dbconfig/20220808-075702-ladsgroup.json
07:53 godog: grow sda/sdb 3 by 100G on thanos-be2001 - T314275
07:50 godog: grow sda/sdb 3 by 100G on thanos-be1004 - T314275
07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32308 and previous config saved to /var/cache/conftool/dbconfig/20220808-074156-ladsgroup.json
07:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32307 and previous config saved to /var/cache/conftool/dbconfig/20220808-072650-ladsgroup.json
07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:22 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: trwikivoyage: Create rollbacker user group (T314678) (duration: 03m 17s)
07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:11 elukey: restart rsyslog on ml-serve2007
07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T312863)', diff saved to https://phabricator.wikimedia.org/P32306 and previous config saved to /var/cache/conftool/dbconfig/20220808-071144-ladsgroup.json
07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SectionTranslation on 10 Wikipedias where ContentTranslation is default (T308829) (duration: 03m 15s)
07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:06 XioNoX: add CSP headers to Netbox - T296356
07:05 elukey: restart rsyslog on ml-serve-ctrl2001

2022-08-07

19:58 taavi: taavi@mwmaint1002 ~ $ echo "https://upload.wikimedia.org/wikipedia/commons/1/15/Keep_tidy_ask.svg" | mwscript purgeList.php --wiki enwiki # T314712
13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T312863)', diff saved to https://phabricator.wikimedia.org/P32305 and previous config saved to /var/cache/conftool/dbconfig/20220807-135204-ladsgroup.json
13:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32304 and previous config saved to /var/cache/conftool/dbconfig/20220807-135143-ladsgroup.json
13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32303 and previous config saved to /var/cache/conftool/dbconfig/20220807-133637-ladsgroup.json
13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32302 and previous config saved to /var/cache/conftool/dbconfig/20220807-132131-ladsgroup.json
13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32301 and previous config saved to /var/cache/conftool/dbconfig/20220807-130625-ladsgroup.json
12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32300 and previous config saved to /var/cache/conftool/dbconfig/20220807-120610-ladsgroup.json
12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312863)', diff saved to https://phabricator.wikimedia.org/P32299 and previous config saved to /var/cache/conftool/dbconfig/20220807-120549-ladsgroup.json
11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32298 and previous config saved to /var/cache/conftool/dbconfig/20220807-115043-ladsgroup.json
11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32297 and previous config saved to /var/cache/conftool/dbconfig/20220807-113537-ladsgroup.json
11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312863)', diff saved to https://phabricator.wikimedia.org/P32296 and previous config saved to /var/cache/conftool/dbconfig/20220807-112031-ladsgroup.json

2022-08-06

17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T312863)', diff saved to https://phabricator.wikimedia.org/P32295 and previous config saved to /var/cache/conftool/dbconfig/20220806-175916-ladsgroup.json
17:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
03:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
03:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
03:02 krinkle@deploy1002: Synchronized w/: I9067d4 (duration: 03m 25s)
03:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
03:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
02:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
02:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
02:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-05

22:20 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly (duration: 02m 01s)
22:18 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly
17:08 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1195.eqiad.wmnet with OS bullseye
16:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS bullseye
16:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
16:49 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
16:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
16:37 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
16:34 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-be
16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-tls
16:26 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS bullseye
16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1193.eqiad.wmnet with OS bullseye
16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db1192.eqiad.wmnet with OS bullseye
16:12 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@8489923]: T304954: Automate imagesuggestion imports (duration: 02m 03s)
16:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
16:11 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :) (duration: 06m 09s)
16:10 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@8489923]: T304954: Automate imagesuggestion imports
16:07 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
16:07 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
16:05 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :)
16:04 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine (duration: 34m 38s)
16:03 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
15:55 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1193.eqiad.wmnet with OS bullseye
15:52 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1191.eqiad.wmnet with OS bullseye
15:51 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1192.eqiad.wmnet with OS bullseye
15:42 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1190.eqiad.wmnet with OS bullseye
15:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
15:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
15:30 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine
15:28 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
15:25 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
15:24 jbond: upload trapperkeeper-metrics-clojure to puppet7 component
15:22 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS bullseye
15:19 jbond: upload puppetlabs-http-client-clojur to puppet7 component
15:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
15:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
15:14 dancy@deploy1002: Finished scap: Backport for gerrit:820653 scap gitignore: ignore all files under the `scap` directory (duration: 04m 41s)
15:11 jbond: upload jolokia to puppet7 component
15:10 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1185.eqiad.wmnet with OS bullseye
15:09 dancy@deploy1002: Started scap: Backport for gerrit:820653 scap gitignore: ignore all files under the `scap` directory
15:09 jbond: upload test-chuck-clojure to puppet7 component
15:05 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS bullseye
15:04 jbond: upload test-check-clojure to puppet7 component
14:57 jbond: upload nippy-clojure to puppet7 component
14:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
14:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
14:43 jbond: upload fressian to puppet7 component
14:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
14:40 jbond: upload test-generative-clojure to puppet7 component
14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:34 jbond: upload data-generators-clojure to puppet7 component
14:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
14:23 jbond: upload encore-clojure to puppet7 component
14:17 jbond: upload truss-clojure to puppet7 component
14:13 jbond: upload structured-logging-clojure to puppet7 component
14:06 jbond: upload murphy-clojure to puppet7 component
13:57 jbond: upload logstash-logback-encoder-7.2 to puppet7 component
13:49 jbond: upload kitchensink-clojure to puppet7 component
13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool hosts with fragile power supply (T314559 T314628)', diff saved to https://phabricator.wikimedia.org/P32292 and previous config saved to /var/cache/conftool/dbconfig/20220805-132709-ladsgroup.json
13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
13:09 sukhe: repool codfw
13:02 jbond: upload honeysql-clojure to puppet7 component
12:53 _joe_: progressive repool of services in codfw
12:24 moritzm: installing nano bugfix updates from bullseye point release
11:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
11:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on D3 (T310146)', diff saved to https://phabricator.wikimedia.org/P32291 and previous config saved to /var/cache/conftool/dbconfig/20220805-113729-ladsgroup.json
11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C6 (T310145)', diff saved to https://phabricator.wikimedia.org/P32290 and previous config saved to /var/cache/conftool/dbconfig/20220805-113555-ladsgroup.json
11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C5 (T310145)', diff saved to https://phabricator.wikimedia.org/P32289 and previous config saved to /var/cache/conftool/dbconfig/20220805-113436-ladsgroup.json
10:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
10:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
10:12 Amir1: dbmaint at s4@codfw (T312863)
10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
09:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
00:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gerrit2002.wikimedia.org
00:53 dzahn@cumin1001: START - Cookbook sre.hosts.remove-downtime for gerrit2002.wikimedia.org
00:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
00:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
00:18 mutante: restarting gerrit for config change - removing old replica T313250

2022-08-04

23:07 mutante: switching gerrit-replica.wikimedia.org to new machine gerrit2002, dropping gerrit-replica-new.wikimedia.org T313250
21:07 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:56 thcipriani@deploy1002: Finished scap: Backport for gerrit:819774 tkwiki: Update wordmark (duration: 06m 12s)
20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
20:51 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
20:50 thcipriani@deploy1002: Started scap: Backport for gerrit:819774 tkwiki: Update wordmark
20:48 thcipriani@deploy1002: Finished scap: Backport for gerrit:812391 [config]: Add click event logging for mobile and desktop (duration: 39m 16s)
20:45 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
20:24 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
20:23 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
20:22 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:13 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
20:13 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
20:10 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
20:09 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
20:08 thcipriani@deploy1002: Started scap: Backport for gerrit:812391 [config]: Add click event logging for mobile and desktop
19:59 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
19:55 dancy@deploy1002: rebuilt and synchronized wikiversions files: resync
19:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for thanos-be2001.codfw.wmnet
19:49 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for thanos-be2001.codfw.wmnet
19:44 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 8 hosts
19:44 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 8 hosts
19:42 Emperor: rebooting thanos-be2001 to fix drive ordering
19:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2071.codfw.wmnet
19:37 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2071.codfw.wmnet
19:31 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2071.codfw.wmnet with reason: T310146
19:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2071.codfw.wmnet with reason: T310146
19:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
19:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
19:12 ryankemper@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
19:11 ryankemper@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
19:11 dancy: There were many errors during php-fpm restart due to failure to contact http://lvs2009:9090/pools/appservers-https_443/mw2361.codfw.wmnet and the like.
19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
19:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.39.0-wmf.23 refs T308076
19:09 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
19:09 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
19:05 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
19:04 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
19:04 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
19:03 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
19:03 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync
19:02 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: sync
19:02 ottomata: roll-restarting eventgate-analytics-external to pick up backwards incompatible schema change - T314151
18:47 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
18:46 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
18:41 cwhite: poweroff kafka-logging2003 - T310145
18:39 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw237[0-6].codfw.wmnet
18:39 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts
18:39 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for 7 hosts
18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2369.codfw.wmnet
18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2369.codfw.wmnet
18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2368.codfw.wmnet
18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2368.codfw.wmnet
18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2367.codfw.wmnet
18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2367.codfw.wmnet
18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2369.codfw.wmnet
18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2368.codfw.wmnet
18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2367.codfw.wmnet
18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2366.codfw.wmnet
18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2366.codfw.wmnet
18:34 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2366.codfw.wmnet
18:30 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2279.codfw.wmnet
18:30 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2278.codfw.wmnet
18:29 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2277.codfw.wmnet
18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2276.codfw.wmnet
18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2276.codfw.wmnet
18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2275.codfw.wmnet
18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2275.codfw.wmnet
18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2274.codfw.wmnet
18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2274.codfw.wmnet
18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2273.codfw.wmnet
18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2273.codfw.wmnet
18:26 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 02m 39s)
18:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2272.codfw.wmnet
18:24 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2272.codfw.wmnet
18:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2271.codfw.wmnet
18:24 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2271.codfw.wmnet
18:23 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
18:23 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 32s)
18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2276.codfw.wmnet
18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2275.codfw.wmnet
18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2274.codfw.wmnet
18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2273.codfw.wmnet
18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2272.codfw.wmnet
18:22 Emperor: shutdown moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,68].codfw.wmnet PDU work T310145
18:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work
18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
18:21 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work
18:21 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 03s)
18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
18:21 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 03s)
18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
18:20 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 49s)
18:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
18:20 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 9 hosts
18:19 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
18:14 mutante: mw2272 and upwards: scap pull, checking monitoring, repooling.. one by one
18:13 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2271.codfw.wmnet
18:12 btullis@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 51s)
18:11 btullis@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
18:06 btullis@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 54s)
18:04 btullis@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2009.codfw.wmnet with reason: shutdown for PDU upgrade
17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2009.codfw.wmnet with reason: shutdown for PDU upgrade
17:43 mutante: maps2008 - downtime and shutdown for D3 maintenance
17:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps2008.codfw.wmnet with reason: codfw reboots
17:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps2008.codfw.wmnet with reason: codfw reboots
17:42 mutante: thunmbor2006 - downtime and shutdown for D3 maintenance
17:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on thumbor2006.codfw.wmnet with reason: codfw reboots
17:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on thumbor2006.codfw.wmnet with reason: codfw reboots
17:39 mutante: mw2386 - systemctl reset-failed
17:31 mutante: phab2001 - systemctl restart ssh-phab, attempting to clear Icinga pybal alerts, related to reboots
17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
17:30 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
17:28 Amir1: dbmaint at s4@eqiad (T312863)
17:26 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:26 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:24 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:23 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:23 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:23 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:20 mutante: [an-launcher1002:~] $ sudo systemctl reset-failed
17:20 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=ms-fe2012.codfw.wmnet
17:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
17:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=varnish-fe
17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=ats-be
17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=ats-tls
17:16 Emperor: shutdown of moss-fe2002.codfw.wmnet,ms-be20[37,38,43,61,65,69].codfw.wmnet,ms-fe2012.codfw.wmnet,thanos-fe2003.codfw.wmnet for power work T310146
17:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[2035-2036].codfw.wmnet with reason: shutdown for PDU upgrade
17:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[2035-2036].codfw.wmnet with reason: shutdown for PDU upgrade
17:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: PDU work
17:15 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: PDU work
17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=ats-be
17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=ats-tls
17:13 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet
17:13 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet
17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=varnish-fe
17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=ats-be
17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=ats-tls
17:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2050.codfw.wmnet with reason: T310146
17:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2050.codfw.wmnet with reason: T310146
17:11 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288] (duration: 00m 04s)
17:11 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288]
17:11 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288] (duration: 00m 07s)
17:10 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288]
17:10 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 15s)
17:09 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
17:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: shutdown for PDU upgrade
17:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: shutdown for PDU upgrade
16:55 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2008.codfw.wmnet
16:51 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288] (duration: 07m 14s)
16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2016.codfw.wmnet
16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase202[05].codfw.wmnet
16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase202[05].codfw.wmnet
16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
16:43 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288]
16:43 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288] (duration: 00m 07s)
16:43 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288]
16:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 18 hosts
16:37 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 18 hosts
16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2059.codfw.wmnet with reason: T310145
16:35 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2059.codfw.wmnet with reason: T310145
16:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2003.codfw.wmnet with reason: PDU swap
16:34 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 20s)
16:34 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2003.codfw.wmnet with reason: PDU swap
16:34 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
16:32 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 29m 59s)
16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D3 for PDU maint', diff saved to https://phabricator.wikimedia.org/P32286 and previous config saved to /var/cache/conftool/dbconfig/20220804-163037-ladsgroup.json
16:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
16:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Start reading from new templatelinks columns in commons (T306673) (duration: 03m 00s)
16:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
16:17 brett: deploying authdns - geodns: Map out African countries by DC latency (T311472)
16:12 cwhite: poweroff logstash2028 - T310145
16:06 Emperor: shutdown ms-be20[39,49,54].codfw.wmnet,thanos-be2003 for PDU swap T310145
16:03 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet with reason: PDU work
16:02 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet with reason: PDU work
16:02 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
15:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2048.codfw.wmnet with reason: T310145
15:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2048.codfw.wmnet with reason: T310145
15:43 damilare: payments-wiki upgraded from 0e4a5b3b to 6880236d
15:37 _joe_: uncordoning ml-serve200{1,6}
15:27 sukhe: power off cp2037,cp2038: PDU upgrade
15:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:30:00 on phab2001.codfw.wmnet with reason: PDU swap
15:25 jelto: power off phab2001
15:25 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:30:00 on phab2001.codfw.wmnet with reason: PDU swap
15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[2037-2038].codfw.wmnet with reason: shutdown for PDU upgrade
15:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[2037-2038].codfw.wmnet with reason: shutdown for PDU upgrade
15:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=varnish-fe
15:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=ats-be
15:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=ats-tls
15:21 XioNoX: un-drain codfw-ulsfo link - T310310
15:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db[2116,2127,2167-2168].codfw.wmnet,es2022.codfw.wmnet with reason: Maintenance (T310145)
15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db[2116,2127,2167-2168].codfw.wmnet,es2022.codfw.wmnet with reason: Maintenance (T310145)
15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool C6 for PDU maint (T310145)', diff saved to https://phabricator.wikimedia.org/P32285 and previous config saved to /var/cache/conftool/dbconfig/20220804-151958-ladsgroup.json
15:16 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
15:16 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on restbase[2016,2020,2025].codfw.wmnet with reason: PDU maintenance
15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on restbase[2016,2020,2025].codfw.wmnet with reason: PDU maintenance
15:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db[2114,2126,2166].codfw.wmnet with reason: Maintenance (T310145)
15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db[2114,2126,2166].codfw.wmnet with reason: Maintenance (T310145)
15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=varnish-fe
15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=ats-be
15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=ats-tls
15:12 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2058,2064].codfw.wmnet
15:12 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2058,2064].codfw.wmnet
15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool hosts for PDU maint (T310145)', diff saved to https://phabricator.wikimedia.org/P32284 and previous config saved to /var/cache/conftool/dbconfig/20220804-151121-ladsgroup.json
15:09 godog: poweroff logstash2002 - T310145
15:07 _joe_: pwoering down mc203{0,1}
15:07 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on logstash2002.codfw.wmnet with reason: pdu
15:06 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on logstash2002.codfw.wmnet with reason: pdu
15:05 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
14:58 jelto: power off mc20[30-31]
14:56 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc[2030-2031].codfw.wmnet with reason: PDU swap
14:56 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mc[2030-2031].codfw.wmnet with reason: PDU swap
14:56 XioNoX: draining codfw-ulsfo link - T310310
14:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2025.codfw.wmnet
14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2020.codfw.wmnet
14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2016.codfw.wmnet
14:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2011.codfw.wmnet with reason: T310145
14:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2011.codfw.wmnet with reason: T310145
14:25 jelto: power off gitlab-runner2003
14:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on gitlab-runner2003.codfw.wmnet with reason: PDU swap
14:25 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2001.codfw.wmnet with reason: T310145
14:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2001.codfw.wmnet with reason: T310145
14:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on gitlab-runner2003.codfw.wmnet with reason: PDU swap
14:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2032.codfw.wmnet with reason: T310145
14:22 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2032.codfw.wmnet with reason: T310145
14:22 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on logstash2035.codfw.wmnet with reason: pdu
14:22 godog: poweroff logstash2035 - T310145
14:22 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on logstash2035.codfw.wmnet with reason: pdu
14:21 Emperor: shutdown ms-be20[58,64].codfw.wmnet for PDU swap T310145
14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:14 Lucas_WMDE: UTC afternoon backport+config window done
14:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Remove unused $wgMathUseRestBase (T274436) (duration: 03m 01s)
14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: CommonSettings-labs: Fix usage of $wgSFSValidateIPListLocationMD5 (duration: 02m 51s)
14:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2033.codfw.wmnet with reason: T310145
14:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2033.codfw.wmnet with reason: T310145
14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:59 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: wikitech: Remove old LDAP config vars (duration: 02m 54s)
13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2058,2064].codfw.wmnet with reason: PDU work
13:58 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2058,2064].codfw.wmnet with reason: PDU work
13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove unused $wgIncludejQueryMigrate (T280944) (2/2) (duration: 03m 03s)
13:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:45 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused $wgIncludejQueryMigrate (T280944) (1/2) (duration: 02m 58s)
13:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:40 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2066.codfw.wmnet with reason: T310145
13:39 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2066.codfw.wmnet with reason: T310145
13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove unused $wgLegacyJavaScriptGlobals (T72470) (2/2) (duration: 02m 59s)
13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused $wgLegacyJavaScriptGlobals (T72470) (1/2) (duration: 02m 58s)
13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:26 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForSDC.php: Config: Remove unused $wgWBCSEnableDispatchingQueryBuilder (duration: 03m 01s)
13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:17 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Remove unused CA P3P config (duration: 03m 09s)
13:14 jbond: intorudce new puppetmaster backends puppetmaster[12]004
13:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2065.codfw.wmnet with reason: T310145
13:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2065.codfw.wmnet with reason: T310145
13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:11 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: QuickSurveys: Deploy research incentive survey to Bengali wiki (T314333) (duration: 03m 26s)
13:07 moritzm: installing jetty9 security updates
12:48 moritzm: installing Linux 4.19.249 kernels on Buster hosts
12:03 jbond: send sretest100[12] and idp-test2001 to the new puppetmaster[12]004 servers to test
11:46 moritzm: installing Linux 5.10.127-2 kernels on Bullseye hosts
11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2017.codfw.wmnet to cluster codfw and group D
11:41 moritzm: installing libpgjava security updates
11:37 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2017.codfw.wmnet to cluster codfw and group D
11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2017.codfw.wmnet with OS bullseye
10:53 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2015.codfw.wmnet
10:53 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2022.codfw.wmnet
10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2017.codfw.wmnet with reason: host reimage
10:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2017.codfw.wmnet with reason: host reimage
10:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2017.codfw.wmnet with OS bullseye
10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2017.codfw.wmnet with reason: Remove node for eventual reimage, T311686
10:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2017.codfw.wmnet with reason: Remove node for eventual reimage, T311686
10:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 9:00:00 on 32 hosts with reason: PDU swap
10:19 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 9:00:00 on 32 hosts with reason: PDU swap
10:03 Lucas_WMDE: stashbot temporarily parted and lost several logs between 9:42 UTC and 9:49 UTC; mainly mwdebug helmfil start/done, also ayounsi sre.deploy.python-code cookbook to cumin1001, cumin2002; see IRC logs
10:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update requirements + wmf-netbox - ayounsi@cumin1001
10:00 jynus: stop db2099 T310145
10:00 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update requirements + wmf-netbox - ayounsi@cumin1001
09:39 jelto: power off mw22[71-79].codfw.wmnet
09:38 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments/includes/EventLogging/SpecialEditGrowthConfigLogger.php: ba67dd9: SpecialEditGrowthConfigLogger: Update schema version (T314173, T312148) (duration: 03m 18s)
09:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2177 to s3 T311494', diff saved to https://phabricator.wikimedia.org/P32282 and previous config saved to /var/cache/conftool/dbconfig/20220804-093704-marostegui.json
09:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ddcd333: testwiki: Growth: Assign enrollasmentor to * (T310905) (duration: 03m 41s)
09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:32 jelto: set/pooled=inactive mw22[71-79].codfw.wmnet
09:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 9:30:00 on 9 hosts with reason: PDU swap
09:31 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 9:30:00 on 9 hosts with reason: PDU swap
09:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: wmf-netbox.py update - ayounsi@cumin1001
09:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2089.codfw.wmnet
09:26 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0614a39: testwiki: Growth: Switch to structured mentor list (T310905) (duration: 03m 38s)
09:25 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: wmf-netbox.py update - ayounsi@cumin1001
09:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
09:23 marostegui@cumin1001: START - Cookbook sre.dns.netbox
09:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
09:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
09:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2089.codfw.wmnet
09:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=kubernetes2022.codfw.wmnet
09:03 oblivian@mwmaint1002: pull aborted: (duration: 00m 06s)
08:58 moritzm: installing gsasl security updates
08:57 oblivian@mwmaint1002: pull aborted: (duration: 00m 18s)
08:48 moritzm: draining ganeti2017 T311686
08:45 jelto: power off kubernetes2022
08:43 oblivian@deploy1002: Synchronized README: testing new scap configuration (duration: 03m 18s)
08:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:22:00 on kubernetes2022.codfw.wmnet with reason: PDU swap
08:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:22:00 on kubernetes2022.codfw.wmnet with reason: PDU swap
08:37 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2022.codfw.wmnet
08:35 jelto: kubectl drain kubernetes2022.codfw.wmnet
08:32 jelto: kubectl cordon kubernetes2022.codfw.wmnet
08:28 moritzm: imported gsasl 1.8.0-8+wmf1 to stretch-wikimedia
08:26 jelto: power off mc2049 and mc2050
08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:36:00 on mc[2049-2050].codfw.wmnet with reason: PDU swap
08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:36:00 on mc[2049-2050].codfw.wmnet with reason: PDU swap
08:22 oblivian@mwmaint1002: pull aborted: (duration: 00m 11s)
08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132, db111, db1127, db1143', diff saved to https://phabricator.wikimedia.org/P32281 and previous config saved to /var/cache/conftool/dbconfig/20220804-081958-root.json
08:19 jelto: power off mc2047 and mc2048
08:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:45:00 on mc[2047-2048].codfw.wmnet with reason: PDU swap
08:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:45:00 on mc[2047-2048].codfw.wmnet with reason: PDU swap
08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
08:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
07:55 marostegui: Remove grants for 208.80.154.160/208.80.155.109 T314528
07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2089 from dbctl T313799', diff saved to https://phabricator.wikimedia.org/P32280 and previous config saved to /var/cache/conftool/dbconfig/20220804-074957-marostegui.json
07:47 godog: grow sda/sdb 3 by 100G on thanos-be2002 - T314275
07:46 godog: grow sda/sdb 3 by 100G on thanos-be1003 - T314275
07:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2030.codfw.wmnet to cluster codfw and group A
07:29 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2030.codfw.wmnet to cluster codfw and group A
07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2135,2160].codfw.wmnet with reason: codfw pdu maintenance
07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2135,2160].codfw.wmnet with reason: codfw pdu maintenance
07:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
07:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2030.codfw.wmnet to cluster codfw and group A
07:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2030.codfw.wmnet to cluster codfw and group A
07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es[2023-2025].codfw.wmnet with reason: codfw pdu maintenance
07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es[2023-2025].codfw.wmnet with reason: codfw pdu maintenance
07:05 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
07:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
07:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:58 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
06:58 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
06:58 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
06:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
06:06 _joe_: restarted memcached on mc2038 to pick up the actual production configuration
05:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2030.codfw.wmnet with OS bullseye
05:49 kart_: Updated cxserver to 2022-08-04-022612-production (T313296, T308248)
05:44 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:43 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:42 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2030.codfw.wmnet with reason: host reimage
05:39 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:38 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2030.codfw.wmnet with reason: host reimage
05:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2030.codfw.wmnet with OS bullseye
05:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2030.codfw.wmnet with reason: Remove node for eventual reimage, T311686
05:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2030.codfw.wmnet with reason: Remove node for eventual reimage, T311686
04:38 ejegg: payments-wiki upgraded from 712df4ce to 0e4a5b3b
04:29 TimStarling: on mw2377 fiddling with CPU frequency control and doing benchmarks
04:09 krinkle@mwmaint1002: pull aborted: (duration: 00m 05s)
01:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32278 and previous config saved to /var/cache/conftool/dbconfig/20220804-012341-marostegui.json
01:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32277 and previous config saved to /var/cache/conftool/dbconfig/20220804-010834-marostegui.json
00:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32276 and previous config saved to /var/cache/conftool/dbconfig/20220804-005328-marostegui.json
00:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32275 and previous config saved to /var/cache/conftool/dbconfig/20220804-003822-marostegui.json
00:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32274 and previous config saved to /var/cache/conftool/dbconfig/20220804-003611-marostegui.json
00:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
00:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32273 and previous config saved to /var/cache/conftool/dbconfig/20220804-003549-marostegui.json
00:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32272 and previous config saved to /var/cache/conftool/dbconfig/20220804-002043-marostegui.json
00:06 mutante: gerrit - [2022-08-04 00:05:33,173] Replication to gerrit2@gerrit2002.wikimedia.org:/srv/gerrit/git/analytics/geowiki.git started.. T313250
00:06 mutante: gerrit - [2022-08-04 00:05:33,173] Replication to gerrit2@gerrit2002.wikimedia.org:/srv/gerrit/git/analytics/geowiki.git started... [CONTEXT pushOneId="83ad5008" ]
00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32271 and previous config saved to /var/cache/conftool/dbconfig/20220804-000536-marostegui.json
00:03 mutante: gerrit - service restart to deploy config change to add second replica T313250
00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit.wikimedia.org with reason: service restart
00:00 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit.wikimedia.org with reason: service restart
00:00 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart

2022-08-03

23:59 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart
23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32270 and previous config saved to /var/cache/conftool/dbconfig/20220803-235030-marostegui.json
22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32269 and previous config saved to /var/cache/conftool/dbconfig/20220803-225015-marostegui.json
22:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 9 hosts with reason: Maintenance
22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 9 hosts with reason: Maintenance
22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312972)', diff saved to https://phabricator.wikimedia.org/P32268 and previous config saved to /var/cache/conftool/dbconfig/20220803-224827-marostegui.json
22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32267 and previous config saved to /var/cache/conftool/dbconfig/20220803-223321-marostegui.json
22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32266 and previous config saved to /var/cache/conftool/dbconfig/20220803-221815-marostegui.json
22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312972)', diff saved to https://phabricator.wikimedia.org/P32265 and previous config saved to /var/cache/conftool/dbconfig/20220803-220309-marostegui.json
22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T312972)', diff saved to https://phabricator.wikimedia.org/P32264 and previous config saved to /var/cache/conftool/dbconfig/20220803-220057-marostegui.json
22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32263 and previous config saved to /var/cache/conftool/dbconfig/20220803-220007-marostegui.json
21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32262 and previous config saved to /var/cache/conftool/dbconfig/20220803-214501-marostegui.json
21:44 damilare: payments-wiki updated from e1b6036a to 712df4ce
21:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster plugin upgrade - ryankemper@cumin1001 - T314078
21:35 ryankemper@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
21:35 ryankemper@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
21:30 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
21:30 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32261 and previous config saved to /var/cache/conftool/dbconfig/20220803-212955-marostegui.json
21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32260 and previous config saved to /var/cache/conftool/dbconfig/20220803-211449-marostegui.json
21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32259 and previous config saved to /var/cache/conftool/dbconfig/20220803-211237-marostegui.json
21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312972)', diff saved to https://phabricator.wikimedia.org/P32258 and previous config saved to /var/cache/conftool/dbconfig/20220803-211216-marostegui.json
21:03 ejegg: updated standalone SmashPig deployment from 8e8f0017 to 9b97ea15
21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32257 and previous config saved to /var/cache/conftool/dbconfig/20220803-205710-marostegui.json
20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:55 ebernhardson@deploy1002: Synchronized wmf-config/CirrusSearch-production.php: Config: cirrus: Set ElasticaWrite partition count for cloudelastic to 3 (duration: 03m 29s)
20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:43 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/includes/VisualEditorParsoidClient.php: a804fe1: Update call to PageConfigFactory::create to use new signature (T314523) (duration: 03m 25s)
20:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32256 and previous config saved to /var/cache/conftool/dbconfig/20220803-204204-marostegui.json
20:39 urbanecm@deploy1002: sync-file aborted: a804fe1: Update call to PageConfigFactory::create to use new signature (T314523ú (duration: 00m 00s)
20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/DiscussionTools/: b840eef: Fix ReplyLinksController#teardown (duration: 03m 27s)
20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:31 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/: 70a18f5: Add explicit partitioning key to ElasticaWrite (T314426) (duration: 03m 13s)
20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:28 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/CirrusSearch/: 9961e9b: Add explicit partitioning key to ElasticaWrite (T314426) (duration: 03m 23s)
20:28 cwhite@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host logstash2032.codfw.wmnet
20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312972)', diff saved to https://phabricator.wikimedia.org/P32255 and previous config saved to /var/cache/conftool/dbconfig/20220803-202658-marostegui.json
20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T312972)', diff saved to https://phabricator.wikimedia.org/P32254 and previous config saved to /var/cache/conftool/dbconfig/20220803-202146-marostegui.json
20:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
20:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312972)', diff saved to https://phabricator.wikimedia.org/P32253 and previous config saved to /var/cache/conftool/dbconfig/20220803-202125-marostegui.json
20:14 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
20:13 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
20:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 195f809: Start writing to cuc_actor on test wikis (T233004) (duration: 03m 27s)
20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:08 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2032.codfw.wmnet on all recursors
20:08 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2032.codfw.wmnet on all recursors
20:08 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:07 mutante: gerrit - adding second replica T313250
20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32252 and previous config saved to /var/cache/conftool/dbconfig/20220803-200619-marostegui.json
20:04 cwhite@cumin2002: START - Cookbook sre.dns.netbox
20:03 cwhite@cumin2002: START - Cookbook sre.ganeti.makevm for new host logstash2032.codfw.wmnet
20:00 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2012.codfw.wmnet
20:00 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2012.codfw.wmnet
20:00 rzl@deploy1002: conftool action : set/pooled=yes; selector: name=kubernetes2012.codfw.wmnet
19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32251 and previous config saved to /var/cache/conftool/dbconfig/20220803-195113-marostegui.json
19:40 ryankemper: T314078 Forgot to mention, restart is at `ryankemper@cumin1001` tmux session `codfw_restarts`
19:39 ryankemper: T314078 Rolling upgrade of codfw hosts; after this all of eqiad/codfw will have the new plugin version and we can resume the `search-loader` instances: `sudo -E cookbook sre.elasticsearch.rolling-operation search_codfw "codfw cluster plugin upgrade" --upgrade --nodes-per-run 3 --start-datetime 2022-08-03T19:38:10 --task-id T314078`
19:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster plugin upgrade - ryankemper@cumin1001 - T314078
19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312972)', diff saved to https://phabricator.wikimedia.org/P32250 and previous config saved to /var/cache/conftool/dbconfig/20220803-193607-marostegui.json
19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T312972)', diff saved to https://phabricator.wikimedia.org/P32249 and previous config saved to /var/cache/conftool/dbconfig/20220803-193354-marostegui.json
19:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
19:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312972)', diff saved to https://phabricator.wikimedia.org/P32248 and previous config saved to /var/cache/conftool/dbconfig/20220803-193334-marostegui.json
19:25 mutante: gerrit1001 - rsyncing /var/lib/gerrit/review_site/ over to gerrit2002 815401
19:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32247 and previous config saved to /var/cache/conftool/dbconfig/20220803-191828-marostegui.json
19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32246 and previous config saved to /var/cache/conftool/dbconfig/20220803-190321-marostegui.json
18:56 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2011.codfw.wmnet
18:56 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2011.codfw.wmnet
18:56 rzl@deploy1002: conftool action : set/pooled=yes; selector: name=kubernetes2011.codfw.wmnet
18:33 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2027,2037].codfw.wmnet
18:33 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2027,2037].codfw.wmnet
18:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
18:16 dancy@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.23 refs T308076 (duration: 03m 37s)
18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
18:12 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.23 refs T308076
17:58 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubestage2002.codfw.wmnet
17:58 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubestage2002.codfw.wmnet
17:57 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2025-2026].codfw.wmnet
17:57 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2025-2026].codfw.wmnet
17:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2044.codfw.wmnet
17:57 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2044.codfw.wmnet
17:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2043.codfw.wmnet
17:56 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2043.codfw.wmnet
17:55 ottomata: increasing partitions from 5 to 6 for *.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite topics in Kafka main-eqiad and main-codfw - T314426
17:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2055.codfw.wmnet
17:55 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2055.codfw.wmnet
17:50 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=kubestage2002.codfw.wmnet
17:38 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2008-2010].codfw.wmnet
17:38 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2008-2010].codfw.wmnet
17:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase20[12]4.codfw.wmnet
17:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
17:14 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
17:08 ryankemper: T310145 `elastic2031` and `wcqs2002` powered off in preparation for C1 maintenance
17:06 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=(kubernetes2020.codfw.wmnet|kubernetes2009.codfw.wmnet|kubernetes2010.codfw.wmnet)
17:00 btullis@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
16:48 Emperor: shutdown moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,55,68].codfw.wmnet PDU work T310145
16:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work
16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping
16:47 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work
16:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping
16:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet
16:46 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet
16:40 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2046.codfw.wmnet
16:40 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2046.codfw.wmnet
16:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 10 hosts
16:39 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 10 hosts
16:38 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2023.codfw.wmnet
16:38 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2023.codfw.wmnet
16:37 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap
16:37 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap
16:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap
16:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap
16:32 jelto: power off mc2025-2026
16:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for rdb2008.codfw.wmnet
16:30 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for rdb2008.codfw.wmnet
16:28 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
16:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2009-2010,2020].codfw.wmnet
16:27 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes[2009-2010,2020].codfw.wmnet
16:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 12 hosts
16:11 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for 12 hosts
16:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts
16:08 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 15 hosts
16:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs[2005-2008].codfw.wmnet
16:08 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs[2005-2008].codfw.wmnet
15:59 Emperor: shutdown ms-be20[33,47],thanos-be2002 prior to PDU work T310070
15:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work
15:58 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work
15:52 jelto: pooling mw2259-2270 again
15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312972)', diff saved to https://phabricator.wikimedia.org/P32242 and previous config saved to /var/cache/conftool/dbconfig/20220803-154515-marostegui.json
15:38 vgutierrez: clearing ats-be cache on cp6008 - T309651
15:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
15:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
15:36 elukey: powercycle kafka-logging2003 - not responsive to serial console
15:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: 4438957: ServiceImageRecommendationProvider: Add extra logging when no JSON response received (T313973) (duration: 03m 04s)
15:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: PDU maintenance
15:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: PDU maintenance
15:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
15:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase2024.codfw.wmnet with reason: PDU maintenance
15:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase2024.codfw.wmnet with reason: PDU maintenance
15:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2024.codfw.wmnet
15:30 vgutierrez: clearing ats-be cache on cp6016 - T309651
15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32241 and previous config saved to /var/cache/conftool/dbconfig/20220803-153009-marostegui.json
15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.eqsin.wmnet on all recursors
15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.eqsin.wmnet on all recursors
15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.ulsfo.wmnet on all recursors
15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.ulsfo.wmnet on all recursors
15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.codfw.wmnet on all recursors
15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.codfw.wmnet on all recursors
15:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2021.codfw.wmnet
15:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2030.codfw.wmnet with reason: T310070
15:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2030.codfw.wmnet with reason: T310070
15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32240 and previous config saved to /var/cache/conftool/dbconfig/20220803-151502-marostegui.json
15:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2004.codfw.wmnet
15:10 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2004.codfw.wmnet
15:04 jelto: power off mc2023
14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312972)', diff saved to https://phabricator.wikimedia.org/P32239 and previous config saved to /var/cache/conftool/dbconfig/20220803-145956-marostegui.json
14:59 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc2023.codfw.wmnet with reason: PDU swap
14:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc2023.codfw.wmnet with reason: PDU swap
14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T312972)', diff saved to https://phabricator.wikimedia.org/P32238 and previous config saved to /var/cache/conftool/dbconfig/20220803-145849-marostegui.json
14:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
14:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312972)', diff saved to https://phabricator.wikimedia.org/P32237 and previous config saved to /var/cache/conftool/dbconfig/20220803-145828-marostegui.json
14:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:53 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.19 (duration: 05m 37s)
14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:47 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.21 (duration: 06m 13s)
14:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P32236 and previous config saved to /var/cache/conftool/dbconfig/20220803-144322-marostegui.json
14:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2029.codfw.wmnet with reason: T310070
14:33 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2029.codfw.wmnet with reason: T310070
14:32 Emperor: shutdown aqs200[5-8] prior to PDU work T310070
14:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs[2005-2008].codfw.wmnet with reason: PDU work
14:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on thumbor[2003-2004].codfw.wmnet with reason: PDU swap
14:31 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs[2005-2008].codfw.wmnet with reason: PDU work
14:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on thumbor[2003-2004].codfw.wmnet with reason: PDU swap
14:28 jelto: power off thumbor2003 and thumbor2004
14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P32235 and previous config saved to /var/cache/conftool/dbconfig/20220803-142816-marostegui.json
14:27 moritzm: upgrading ganeti/esams to Ganeti 3.0.2 T312637
14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312972)', diff saved to https://phabricator.wikimedia.org/P32234 and previous config saved to /var/cache/conftool/dbconfig/20220803-141310-marostegui.json
14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1109 (T312972)', diff saved to https://phabricator.wikimedia.org/P32233 and previous config saved to /var/cache/conftool/dbconfig/20220803-141103-marostegui.json
14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1109.eqiad.wmnet with reason: Maintenance
14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1109.eqiad.wmnet with reason: Maintenance
14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32232 and previous config saved to /var/cache/conftool/dbconfig/20220803-141042-marostegui.json
14:06 moritzm: installing freetype security updates on bullseye
13:57 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin 'P{R:Class = Confd}' 'systemctl restart confd'
13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32231 and previous config saved to /var/cache/conftool/dbconfig/20220803-135536-marostegui.json
13:46 cdanis: ✔️ cdanis@deploy1002.eqiad.wmnet ~ 🕙☕ sudo systemctl restart confd
13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32230 and previous config saved to /var/cache/conftool/dbconfig/20220803-134030-marostegui.json
13:30 moritzm: installing Java 8 security updates for Buster
13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32229 and previous config saved to /var/cache/conftool/dbconfig/20220803-132524-marostegui.json
13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32228 and previous config saved to /var/cache/conftool/dbconfig/20220803-131916-marostegui.json
13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312972)', diff saved to https://phabricator.wikimedia.org/P32227 and previous config saved to /var/cache/conftool/dbconfig/20220803-131855-marostegui.json
13:18 sukhe: depool codfw for PDU upgrade: CR 819798
13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:16 urbanecm@deploy1002: Synchronized wmf-config/MetaContactPages.php: f89f02e: Amend license request contact form per Legal (T303359) (duration: 09m 27s)
13:12 jbond: introduce puppetmaster[12]004 for now as offline
13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:09 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on kafka-logging2003.codfw.wmnet with reason: pdu
13:09 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on kafka-logging2003.codfw.wmnet with reason: pdu
13:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2044.codfw.wmnet with reason: T310070
13:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2044.codfw.wmnet with reason: T310070
13:04 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32226 and previous config saved to /var/cache/conftool/dbconfig/20220803-130348-marostegui.json
12:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2043.codfw.wmnet with reason: T310070
12:59 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2043.codfw.wmnet with reason: T310070
12:56 pt1979@cumin1001: START - Cookbook sre.dns.netbox
12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32224 and previous config saved to /var/cache/conftool/dbconfig/20220803-124842-marostegui.json
12:40 moritzm: uploaded openjdk-8 8u342-b07-1~deb10u1 to component/jdk8 for buster-wikimedia (rebuild of latest Java 8 security update)
12:36 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
12:36 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312972)', diff saved to https://phabricator.wikimedia.org/P32223 and previous config saved to /var/cache/conftool/dbconfig/20220803-123336-marostegui.json
12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T312972)', diff saved to https://phabricator.wikimedia.org/P32222 and previous config saved to /var/cache/conftool/dbconfig/20220803-122929-marostegui.json
12:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
12:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312972)', diff saved to https://phabricator.wikimedia.org/P32221 and previous config saved to /var/cache/conftool/dbconfig/20220803-122819-marostegui.json
12:16 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@614f7b2]: (no justification provided) (duration: 00m 11s)
12:16 ebysans@deploy1002: Started deploy [airflow-dags/analytics@614f7b2]: (no justification provided)
12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32220 and previous config saved to /var/cache/conftool/dbconfig/20220803-121313-marostegui.json
11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32219 and previous config saved to /var/cache/conftool/dbconfig/20220803-115807-marostegui.json
11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2176 to s1 T311494', diff saved to https://phabricator.wikimedia.org/P32218 and previous config saved to /var/cache/conftool/dbconfig/20220803-115706-marostegui.json
11:49 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cumin2002.codfw.wmnet with reason: PDU maintenance, T310145
11:49 root@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cumin2002.codfw.wmnet with reason: PDU maintenance, T310145
11:46 jayme@cumin1001: conftool action : set/weight=10; selector: name=(kubernetes2019.codfw.wmnet|kubernetes2021.codfw.wmnet|kubernetes2022.codfw.wmnet|kubernetes2018.codfw.wmnet|kubernetes2020.codfw.wmnet)
11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312972)', diff saved to https://phabricator.wikimedia.org/P32217 and previous config saved to /var/cache/conftool/dbconfig/20220803-114301-marostegui.json
11:41 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=(kubernetes2020.codfw.wmnet|kubernetes2009.codfw.wmnet|kubernetes2010.codfw.wmnet|kubernetes2011.codfw.wmnet|kubernetes2012.codfw.wmnet|kubestage2002.codfw.wmnet)
11:38 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase2022.codfw.wmnet
11:37 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
11:35 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:32 jbond@cumin2002: START - Cookbook sre.dns.netbox
11:26 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=wdqs
11:22 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=kartotherian
11:22 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-backend
11:21 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-async
11:17 _joe_: depooling codfw services from all traffic
10:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2011.codfw.wmnet to cluster codfw and group C
10:53 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2011.codfw.wmnet to cluster codfw and group C
10:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
10:47 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubestage2002.codfw.wmnet with reason: PDU swap
10:46 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubestage2002.codfw.wmnet with reason: PDU swap
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T312972)', diff saved to https://phabricator.wikimedia.org/P32216 and previous config saved to /var/cache/conftool/dbconfig/20220803-104246-marostegui.json
10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312972)', diff saved to https://phabricator.wikimedia.org/P32215 and previous config saved to /var/cache/conftool/dbconfig/20220803-104224-marostegui.json
10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
10:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase201[45].codfw.wmnet
10:38 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2022.codfw.wmnet
10:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase[2014-2015,2021-2022].codfw.wmnet with reason: PDU maintenance
10:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase[2014-2015,2021-2022].codfw.wmnet with reason: PDU maintenance
10:37 jelto: shutdown kubestage2002 kubernetes2020 kubernetes2009 kubernetes2010 kubernetes2011 kubernetes2012
10:30 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
10:30 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
10:29 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
10:29 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
10:27 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
10:27 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
10:27 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
10:27 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32213 and previous config saved to /var/cache/conftool/dbconfig/20220803-102718-marostegui.json
10:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2012.codfw.wmnet
10:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2011.codfw.wmnet
10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2010.codfw.wmnet
10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2009.codfw.wmnet
10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2020.codfw.wmnet
10:20 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubestage2002.codfw.wmnet
10:14 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2011.codfw.wmnet with OS bullseye
10:14 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
10:14 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
10:14 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32212 and previous config saved to /var/cache/conftool/dbconfig/20220803-101212-marostegui.json
09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312972)', diff saved to https://phabricator.wikimedia.org/P32211 and previous config saved to /var/cache/conftool/dbconfig/20220803-095706-marostegui.json
09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2011.codfw.wmnet with reason: host reimage
09:56 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2021.codfw.wmnet
09:56 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2012.codfw.wmnet
09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T312972)', diff saved to https://phabricator.wikimedia.org/P32210 and previous config saved to /var/cache/conftool/dbconfig/20220803-095559-marostegui.json
09:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
09:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32209 and previous config saved to /var/cache/conftool/dbconfig/20220803-095538-marostegui.json
09:55 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=restbase2027.codfw.wmnet
09:54 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2011.codfw.wmnet
09:54 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:54 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
09:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2011.codfw.wmnet with reason: host reimage
09:52 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2010.codfw.wmnet
09:50 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2009.codfw.wmnet
09:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 49 hosts with reason: PDU swap
09:48 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 49 hosts with reason: PDU swap
09:47 jelto: kubectl drain --ignore-daemonsets kubernetes2020.codfw.wmnet
09:46 jelto: kubectl cordon kubernetes2020.codfw.wmnet kubernetes2009.codfw.wmnet kubernetes2010.codfw.wmnet kubernetes2011.codfw.wmnet kubernetes2012.codfw.wmnet
09:43 jelto: kubectl drain --ignore-daemonsets kubestage2002.codfw.wmnet
09:43 vgutierrez: rolling restart of pybal in codfw lvs instances - T310070
09:42 jelto: kubectl cordon kubestage2002
09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32208 and previous config saved to /var/cache/conftool/dbconfig/20220803-094032-marostegui.json
09:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2011.codfw.wmnet with OS bullseye
09:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@674bb8b]: (no justification provided) (duration: 00m 10s)
09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2090.codfw.wmnet
09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:33 ebysans@deploy1002: Started deploy [airflow-dags/analytics@674bb8b]: (no justification provided)
09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Remove node for eventual reimage, T311686
09:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Remove node for eventual reimage, T311686
09:29 marostegui@cumin1001: START - Cookbook sre.dns.netbox
09:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2090.codfw.wmnet
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32207 and previous config saved to /var/cache/conftool/dbconfig/20220803-092525-marostegui.json
09:24 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
09:24 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
09:23 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
09:23 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
09:22 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
09:22 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2090 from dbctl T314109', diff saved to https://phabricator.wikimedia.org/P32206 and previous config saved to /var/cache/conftool/dbconfig/20220803-092053-marostegui.json
09:20 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
09:15 jelto: power on mc2024
09:10 XioNoX: configure BGP on the esams-drmrs link - T307221
09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32205 and previous config saved to /var/cache/conftool/dbconfig/20220803-091019-marostegui.json
09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32204 and previous config saved to /var/cache/conftool/dbconfig/20220803-090912-marostegui.json
09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
09:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312972)', diff saved to https://phabricator.wikimedia.org/P32203 and previous config saved to /var/cache/conftool/dbconfig/20220803-090836-marostegui.json
09:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2032.codfw.wmnet
09:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
09:05 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
09:04 jynus: stop backup2006 backup2009 for T310070
09:00 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc2024.codfw.wmnet
09:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
08:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
08:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2032.codfw.wmnet
08:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
08:58 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc2024.codfw.wmnet
08:58 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
08:57 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
08:57 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
08:54 XioNoX: put the esams-drmrs link in service - T307221
08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32202 and previous config saved to /var/cache/conftool/dbconfig/20220803-085330-marostegui.json
08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:51 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
08:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
08:47 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
08:41 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32201 and previous config saved to /var/cache/conftool/dbconfig/20220803-083824-marostegui.json
08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312972)', diff saved to https://phabricator.wikimedia.org/P32200 and previous config saved to /var/cache/conftool/dbconfig/20220803-082318-marostegui.json
08:19 jynus: stop db2098 for T310070
08:17 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers|api)-ro,name=codfw
08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2072.codfw.wmnet
08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:54 marostegui@cumin1001: START - Cookbook sre.dns.netbox
07:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2072.codfw.wmnet
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2072 from dbctl T313911', diff saved to https://phabricator.wikimedia.org/P32199 and previous config saved to /var/cache/conftool/dbconfig/20220803-074806-marostegui.json
07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T312972)', diff saved to https://phabricator.wikimedia.org/P32197 and previous config saved to /var/cache/conftool/dbconfig/20220803-072253-marostegui.json
07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312972)', diff saved to https://phabricator.wikimedia.org/P32196 and previous config saved to /var/cache/conftool/dbconfig/20220803-072214-marostegui.json
07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2134,2160].codfw.wmnet with reason: codfw pdu maintenance
07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2134,2160].codfw.wmnet with reason: codfw pdu maintenance
07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es[2020-2022].codfw.wmnet with reason: codfw pdu maintenance
07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es[2020-2022].codfw.wmnet with reason: codfw pdu maintenance
07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: codfw pdu maintenance
07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: codfw pdu maintenance
07:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: codfw pdu maintenance
07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: codfw pdu maintenance
07:11 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: CX: Set MT threshold for publishing in Armenian WP to 80% (T313208) (duration: 03m 49s)
07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32195 and previous config saved to /var/cache/conftool/dbconfig/20220803-070708-marostegui.json
07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
07:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
07:00 moritzm: draining ganeti2011 T311686
06:56 godog: grow sda/sdb 3 by 100G on thanos-be2003 - T314275
06:56 godog: grow sda/sdb 3 by 100G on thanos-be1002 - T314275
06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32194 and previous config saved to /var/cache/conftool/dbconfig/20220803-065202-marostegui.json
06:46 godog: power up centrallog2002 and prometheus2005 - T310070
06:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
06:37 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312972)', diff saved to https://phabricator.wikimedia.org/P32193 and previous config saved to /var/cache/conftool/dbconfig/20220803-063656-marostegui.json
06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T312972)', diff saved to https://phabricator.wikimedia.org/P32192 and previous config saved to /var/cache/conftool/dbconfig/20220803-063148-marostegui.json
06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T312972)', diff saved to https://phabricator.wikimedia.org/P32191 and previous config saved to /var/cache/conftool/dbconfig/20220803-063045-marostegui.json
06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P32190 and previous config saved to /var/cache/conftool/dbconfig/20220803-061538-marostegui.json
06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P32189 and previous config saved to /var/cache/conftool/dbconfig/20220803-060032-marostegui.json
05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T312972)', diff saved to https://phabricator.wikimedia.org/P32188 and previous config saved to /var/cache/conftool/dbconfig/20220803-054526-marostegui.json
05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T312972)', diff saved to https://phabricator.wikimedia.org/P32187 and previous config saved to /var/cache/conftool/dbconfig/20220803-054106-marostegui.json
05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance

2022-08-02

22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
22:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
22:15 mutante: gerrit - syncing data (/srv/gerrit /var/lib/gerrit2/review_site /home) again after gerrit2002 was reimaged with buster T313250 T313972
22:04 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 06s)
22:04 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
22:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23 refs T308076
21:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:29 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/includes/Sanity/Checker.php: Backport: Fix appending of join conds (T312421 T314439) (duration: 03m 15s)
21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
21:27 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - T314078
21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS buster
21:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.22 refs T308076
20:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:51 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
20:50 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23 refs T308076
20:38 mutante: re-imaging gerrit2002 with buster - because it's on bullseye, needs git-fat and that has not been ported to python3 yet which blocks upgrading gerrit machines otherwise T313250 T243027 T279509
20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:36 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS buster
20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:36 urbanecm: UTC evening B&C window done
20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/HTMLTransformInput.php: 69e9152: ParsoidHandler: fix page bundle input with no orig HTML (duration: 03m 22s)
20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:29 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/ParsoidHandler.php: 322a960: ParsoidHandler: pass metrics object to HTMLTransformInput (duration: 03m 19s)
20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5fac0aa: GrowthExperiments: Remove wgGEHomepageTutorialTitle (duration: 03m 26s)
20:06 dancy@deploy1002: Finished scap: Backport for gerrit:819612 Revert "Bump wikimedia/parsoid to 0.16.0-a18" (duration: 11m 30s)
20:01 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 05s)
20:01 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
19:59 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 01s)
19:59 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
19:55 dancy@deploy1002: Started scap: Backport for gerrit:819612 Revert "Bump wikimedia/parsoid to 0.16.0-a18"
19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-tls
19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=varnish-fe
19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-be
19:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-tls
19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=varnish-fe
19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-be
19:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2041,2046].codfw.wmnet
19:35 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2041,2046].codfw.wmnet
19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
19:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for thanos-fe2002.codfw.wmnet
19:28 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for thanos-fe2002.codfw.wmnet
19:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-fe2010.codfw.wmnet
19:26 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-fe2010.codfw.wmnet
19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-tls
19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=varnish-fe
19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-be
19:17 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
19:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-tls
19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=varnish-fe
19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
19:11 mutante: gerrit1001 - rsyncing /home/ to gerrit2002:/srv/home-gerrit1001.wikimedia.org T313250
19:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
19:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
18:55 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.23 refs T308076 (duration: 50m 39s)
18:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
18:52 ejegg: updated payments-wiki from 589bb64e to e1b6036a (just i18n changes in extensions)
18:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
18:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - T314078
18:46 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mc2038.codfw.wmnet with reason: install
18:45 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mc2038.codfw.wmnet with reason: install
18:41 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2038.codfw.wmnet
18:41 rzl@cumin2002: START - Cookbook sre.hosts.remove-downtime for mc2038.codfw.wmnet
18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
18:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
18:18 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: install
18:18 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: install
18:17 rzl@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2038.codfw.wmnet with reason: install
18:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2038.codfw.wmnet with reason: install
18:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
18:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
18:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
18:04 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.23 refs T308076
17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312972)', diff saved to https://phabricator.wikimedia.org/P32185 and previous config saved to /var/cache/conftool/dbconfig/20220802-175233-marostegui.json
17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2159', diff saved to https://phabricator.wikimedia.org/P32184 and previous config saved to /var/cache/conftool/dbconfig/20220802-174311-ladsgroup.json
17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32183 and previous config saved to /var/cache/conftool/dbconfig/20220802-173723-marostegui.json
17:35 moritzm: installing node-moment security updates
17:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: T310070
17:32 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: T310070
17:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
17:25 moritzm: installing fribidi security updates
17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32182 and previous config saved to /var/cache/conftool/dbconfig/20220802-172217-marostegui.json
17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-tls
17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=varnish-fe
17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-be
17:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312972)', diff saved to https://phabricator.wikimedia.org/P32181 and previous config saved to /var/cache/conftool/dbconfig/20220802-170711-marostegui.json
17:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
17:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
17:05 Emperor: ms-be20[31,32,41,46].codfw.wmnet,ms-fe2010.codfw.wmnet,thanos-fe2002.codfw.wmnet downtime for PDU work T309957
17:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T312972)', diff saved to https://phabricator.wikimedia.org/P32180 and previous config saved to /var/cache/conftool/dbconfig/20220802-170503-marostegui.json
17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
17:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
17:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
17:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312972)', diff saved to https://phabricator.wikimedia.org/P32179 and previous config saved to /var/cache/conftool/dbconfig/20220802-170333-marostegui.json
17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-tls
17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=varnish-fe
17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-be
17:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2030,2045,2052].codfw.wmnet
17:00 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2030,2045,2052].codfw.wmnet
16:57 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1004.eqiad.wmnet
16:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
16:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
16:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
16:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
16:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32178 and previous config saved to /var/cache/conftool/dbconfig/20220802-164827-marostegui.json
16:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
16:35 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
16:35 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32177 and previous config saved to /var/cache/conftool/dbconfig/20220802-163321-marostegui.json
16:29 dancy@mwmaint1002: pull aborted: (duration: 00m 07s)
16:25 rzl: rzl@stat1007:~$ sudo systemctl stop wmde-analytics-daily-early # wedged, timer will restart it now with max_runtime_seconds
16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312972)', diff saved to https://phabricator.wikimedia.org/P32176 and previous config saved to /var/cache/conftool/dbconfig/20220802-161815-marostegui.json
16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T312972)', diff saved to https://phabricator.wikimedia.org/P32175 and previous config saved to /var/cache/conftool/dbconfig/20220802-161607-marostegui.json
16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
16:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312972)', diff saved to https://phabricator.wikimedia.org/P32174 and previous config saved to /var/cache/conftool/dbconfig/20220802-161545-marostegui.json
16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1004.eqiad.wmnet on all recursors
16:10 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1004.eqiad.wmnet on all recursors
16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:05 btullis@cumin1001: START - Cookbook sre.dns.netbox
16:05 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1004.eqiad.wmnet
16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32173 and previous config saved to /var/cache/conftool/dbconfig/20220802-160039-marostegui.json
15:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: T309957
15:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: T309957
15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: T309957
15:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: T309957
15:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: T309957
15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: T309957
15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32172 and previous config saved to /var/cache/conftool/dbconfig/20220802-154533-marostegui.json
15:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
15:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
15:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2037.codfw.wmnet
15:36 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312972)', diff saved to https://phabricator.wikimedia.org/P32171 and previous config saved to /var/cache/conftool/dbconfig/20220802-153027-marostegui.json
15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T312972)', diff saved to https://phabricator.wikimedia.org/P32170 and previous config saved to /var/cache/conftool/dbconfig/20220802-152818-marostegui.json
15:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32169 and previous config saved to /var/cache/conftool/dbconfig/20220802-152740-marostegui.json
15:24 moritzm: installing gnupg2 security updates
15:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
15:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
15:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster1004.eqiad.wmnet with OS buster
15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32167 and previous config saved to /var/cache/conftool/dbconfig/20220802-151234-marostegui.json
15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, T310070
15:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, T310070
15:08 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
15:08 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
15:07 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, T310070
15:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, T310070
15:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: T309957
15:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: T309957
15:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
15:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
14:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: T309957
14:59 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: T309957
14:58 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=(appservers|api)-ro,name=codfw
14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32166 and previous config saved to /var/cache/conftool/dbconfig/20220802-145728-marostegui.json
14:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2060.codfw.wmnet with OS bullseye
14:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
14:50 moritzm: uploaded gnupg2 2.1.18-8~deb9u4+wmf1 to stretch-wikimedia
14:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32164 and previous config saved to /var/cache/conftool/dbconfig/20220802-144222-marostegui.json
14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32163 and previous config saved to /var/cache/conftool/dbconfig/20220802-144013-marostegui.json
14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
14:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32162 and previous config saved to /var/cache/conftool/dbconfig/20220802-143952-marostegui.json
14:37 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetmaster1004.eqiad.wmnet with OS buster
14:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32161 and previous config saved to /var/cache/conftool/dbconfig/20220802-142446-marostegui.json
14:23 Emperor: shutdown ms-be20[30,45,52] for PDU work T309957
14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
14:21 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
14:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2060.codfw.wmnet with OS bullseye
14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32160 and previous config saved to /var/cache/conftool/dbconfig/20220802-140940-marostegui.json
14:05 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster2004.codfw.wmnet with OS buster
14:04 godog: grow sda/sdb 3 by 100G on thanos-be1001 - T314275
14:03 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
14:03 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
14:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
14:01 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-tls
13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2032.codfw.wmnet,service=ats-be
13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
13:56 godog: schedule poweroff for centrallog2002 at 16 utc - T310070
13:54 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-be
13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32159 and previous config saved to /var/cache/conftool/dbconfig/20220802-135435-marostegui.json
13:53 godog: depool and poweroff prometheus2005 - T310070
13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=varnish-fe
13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32158 and previous config saved to /var/cache/conftool/dbconfig/20220802-135226-marostegui.json
13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
13:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
13:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32157 and previous config saved to /var/cache/conftool/dbconfig/20220802-135155-marostegui.json
13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=varnish-fe
13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=varnish-fe
13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-tls
13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=varnish-fe
13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-be
13:45 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
13:42 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
13:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:42 Lucas_WMDE: UTC afternoon backport+config window done
13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2013.codfw.wmnet with OS bullseye
13:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:40 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable usage tracking for statement for cebwiki (T296384) – expected to gradually increase number of wbc_entity_usage and probably recentchanges rows on cebwiki, but not too much, see task for details (duration: 03m 06s)
13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2028.codfw.wmnet with OS bullseye
13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32156 and previous config saved to /var/cache/conftool/dbconfig/20220802-133648-marostegui.json
13:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Introduce $wmgEntityUsageModifierLimitsStatement (T296384) (2/2) (duration: 03m 21s)
13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Introduce $wmgEntityUsageModifierLimitsStatement (T296384) (1/2) (duration: 03m 16s)
13:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, T309957
13:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, T309957
13:27 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster2004.codfw.wmnet with OS buster
13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
13:24 vgutierrez: restarting ATS 9.x instances to apply https://gerrit.wikimedia.org/r/819585 - T309651
13:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32155 and previous config saved to /var/cache/conftool/dbconfig/20220802-132142-marostegui.json
13:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
13:19 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a4499e5: Revert "testwiki: Add mediawiki.web_ui.interactions stream" (T314151, T311268) (duration: 03m 19s)
13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c2fb8a5: Enable RealtimePreview on Group 0 wikis (T314150) (duration: 03m 21s)
13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32154 and previous config saved to /var/cache/conftool/dbconfig/20220802-130636-marostegui.json
13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32153 and previous config saved to /var/cache/conftool/dbconfig/20220802-130428-marostegui.json
13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
13:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
13:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312972)', diff saved to https://phabricator.wikimedia.org/P32152 and previous config saved to /var/cache/conftool/dbconfig/20220802-130351-marostegui.json
13:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2013.codfw.wmnet with OS bullseye
13:00 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2028.codfw.wmnet with OS bullseye
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, T311686
12:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, T311686
12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32151 and previous config saved to /var/cache/conftool/dbconfig/20220802-124845-marostegui.json
12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32150 and previous config saved to /var/cache/conftool/dbconfig/20220802-123338-marostegui.json
12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312972)', diff saved to https://phabricator.wikimedia.org/P32149 and previous config saved to /var/cache/conftool/dbconfig/20220802-121832-marostegui.json
12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T312972)', diff saved to https://phabricator.wikimedia.org/P32148 and previous config saved to /var/cache/conftool/dbconfig/20220802-121624-marostegui.json
12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
12:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
12:01 marostegui: dbmaint x1@eqiad T314087
11:57 marostegui: dbmaint s7@eqiad T314377
11:57 marostegui: dbmaint s3@eqiad T314377
11:57 marostegui: dbmaint s8@eqiad T314377
11:55 marostegui: dbmait s8@eqiad T314377
11:54 marostegui: dbmait s3@eqiad T314377
11:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
11:48 marostegui: dbmait s7@eqiad T314377
11:46 marostegui: dbmait s4@eqiad T314377
11:35 elukey: restart rsyslog on ml-serve1006
10:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: T312626 btullis
10:50 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: T312626 btullis
10:49 godog: grow sda3 by 100G on thanos-be2004 - T314275
10:42 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
10:42 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
10:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
10:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P32147 and previous config saved to /var/cache/conftool/dbconfig/20220802-103318-root.json
10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P32146 and previous config saved to /var/cache/conftool/dbconfig/20220802-101813-root.json
10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2175 to s2 T311494', diff saved to https://phabricator.wikimedia.org/P32145 and previous config saved to /var/cache/conftool/dbconfig/20220802-101522-marostegui.json
10:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1019.eqiad.wmnet with OS bullseye
10:05 jynus: shutdown dbprov2002 backup2005 backup2008 T310070
10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P32144 and previous config saved to /var/cache/conftool/dbconfig/20220802-100308-root.json
10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32143 and previous config saved to /var/cache/conftool/dbconfig/20220802-100304-root.json
09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2079 from dbctl T313885', diff saved to https://phabricator.wikimedia.org/P32141 and previous config saved to /var/cache/conftool/dbconfig/20220802-095455-marostegui.json
09:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage
09:49 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage
09:49 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P32140 and previous config saved to /var/cache/conftool/dbconfig/20220802-094804-root.json
09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32139 and previous config saved to /var/cache/conftool/dbconfig/20220802-094759-root.json
09:44 godog: grow sdb3 by 100G on thanos-be2004 - T314275
09:43 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
09:42 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
09:37 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1019.eqiad.wmnet with OS bullseye
09:36 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P32138 and previous config saved to /var/cache/conftool/dbconfig/20220802-093259-root.json
09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32137 and previous config saved to /var/cache/conftool/dbconfig/20220802-093254-root.json
09:30 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
09:30 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
09:28 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
09:26 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
09:25 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
09:22 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P32136 and previous config saved to /var/cache/conftool/dbconfig/20220802-091754-root.json
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32135 and previous config saved to /var/cache/conftool/dbconfig/20220802-091749-root.json
09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2143', diff saved to https://phabricator.wikimedia.org/P32134 and previous config saved to /var/cache/conftool/dbconfig/20220802-091518-root.json
09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P32133 and previous config saved to /var/cache/conftool/dbconfig/20220802-090250-root.json
09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32132 and previous config saved to /var/cache/conftool/dbconfig/20220802-090245-root.json
08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P32131 and previous config saved to /var/cache/conftool/dbconfig/20220802-084745-root.json
08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32130 and previous config saved to /var/cache/conftool/dbconfig/20220802-084740-root.json
08:46 marostegui: stop mysql on db2095 db2107 db2109 db2137 db2147 db2159 db2160 pc2012 for pdu maintenance on codfw b5 T310070
07:49 moritzm: upgrading drmrs ganeti clusters to 3.0.2 T312637
07:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to plain disks, T311686
07:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to plain disks, T311686
07:22 godog: bounce icinga on alert2001 - T314353
07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to DRBD, T311686
07:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to DRBD, T311686
06:58 elukey: restart rsyslog on ml-serve2006
06:56 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: pruneRevData: Make cleaning in larger batches (T296380) (duration: 03m 26s)
06:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
06:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
06:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
06:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
06:46 godog: bounce icinga on alert1001 - T314353
05:48 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db2088.codfw.wmnet
05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
05:44 marostegui@cumin1001: START - Cookbook sre.dns.netbox
05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2088.codfw.wmnet
05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P32127 and previous config saved to /var/cache/conftool/dbconfig/20220802-052923-root.json
05:24 marostegui: dbmait x1@eqiad T314087
04:17 ryankemper: [Elastic] Small amendment to my earlier statement; based off epoch time `be_x_oldwiki_titlesuggest_1659407912` was not an old index hanging around after a reindex operation, but rather the new one that the reindex operation was trying to create, but had not yet finished (therefore didn't switch over the aliases). It presumably got interrupted by the reimage of `elastic2059`.
04:15 ryankemper: [Elastic] Blew away red index like so: `ryankemper@cumin1001:~$ curl -XDELETE https://search.svc.codfw.wmnet:9243/be_x_oldwiki_titlesuggest_1659407912`. Cluster is back to `green` status.
04:07 ryankemper: [Elastic] Per `curl -s https://search.svc.codfw.wmnet:9243/_cat/aliases | grep -i be_x` I see `be_x_oldwiki_titlesuggest ` alias points to `be_x_oldwiki_titlesuggest_1658396688`. I think this means the red index is an old index from an in-progress reindex operation. I likely just need to delete `be_x_oldwiki_titlesuggest_1659407912` but doing some quick digging first
04:04 ryankemper: [Elastic] Red cluster status in main codfw elasticsearch cluster (`https://search.svc.codfw.wmnet:9243`); culprit appears to be index `be_x_oldwiki_titlesuggest_1659407912`. Confusingly it has 2 replicas set so it's not clear to me how we got into this state starting from green (in the past we've gone into red status from indices that erroneously had 0 replicas in production)
03:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
03:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
03:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
03:40 krinkle@deploy1002: Synchronized multiversion/: I0802db272695 (duration: 03m 10s)
03:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
03:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
03:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
03:34 krinkle@deploy1002: Synchronized wmf-config/: I9b89c0ff5c2 (duration: 03m 32s)
03:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
03:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
03:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
03:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
03:27 krinkle@deploy1002: Synchronized multiversion/: I6e97d39a3, Ib843ebced31 (duration: 03m 30s)
03:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
03:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
03:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
03:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
03:22 krinkle@mwmaint1002: pull aborted: (duration: 00m 11s)
03:21 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I39a2b86065 (duration: 03m 19s)
03:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic2059.codfw.wmnet with OS bullseye
03:15 krinkle@deploy1002: Synchronized multiversion/: Ieaea60 (duration: 03m 03s)
03:14 krinkle@mwmaint2002: pull aborted: (duration: 01m 36s)
03:14 krinkle@mwmaint1002: pull aborted: (duration: 01m 31s)
03:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
03:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
03:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
03:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
02:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2059.codfw.wmnet with reason: host reimage
02:54 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph.service` to clear `Query Service HTTP Port` && `WDQS SPARQL` alerts
02:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2059.codfw.wmnet with reason: host reimage
02:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2059.codfw.wmnet with OS bullseye
02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
00:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
00:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
00:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
00:35 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: Ieaea60a991e5 (duration: 03m 10s)
00:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
00:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
00:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
00:23 krinkle@deploy1002: Synchronized multiversion/: Ia3406e (duration: 03m 22s)
00:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
00:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
00:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
00:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
00:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
00:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-01

23:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Id1ce285631f5, I194d419fbfe (duration: 03m 09s)
23:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
23:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
23:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
21:08 moritzm: drain ganeti2028 T309957
21:03 mutante: gerrit2002 - mkdir /var/lib/gerrit2/review_site | gerrit1001 - rsyncing /var/lib/gerrit2/review_site/ to gerrit2002 T313250 T313972
21:01 urbanecm: UTC late backport window done
21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 461e070: itwiki: Change robot policy on NS2 and NS3 (T314165) (duration: 03m 18s)
20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:57 mutante: phab1001 - rsyncing repo data /srv/repos/ to phab2002 (in addition to phab1004 previously) T313360
20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:55 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mnwwiktionary --fix # T314023
20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ba8c177: mnwwiktionary: Create Appendix namespace (T314023) (duration: 03m 09s)
20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:48 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateArticleCount.php --wiki=viwikibooks --update # T314239
20:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c19c3e36ab: DiscussionTools: Make new reply buttons available at mediawiki.org (T314076); 24db016c4: viwikibooks: Change wgArticleCountMethod to any (T314239) (duration: 03m 10s)
20:35 daniel@deploy1002: Synchronized php-1.39.0-wmf.22/includes/Rest/Handler: Fix: Parsoid REST handler: allow pagebundle input without original HTML. (duration: 03m 15s)
20:25 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-ne.svg (T311700)
20:21 daniel@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ne.svg: Config: newiki: Update wordmark (T311700) (duration: 03m 17s)
20:17 daniel@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: newiki: Update wordmark (T311700) (duration: 03m 32s)
20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
20:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2054.codfw.wmnet with OS bullseye
19:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
19:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
19:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2054.codfw.wmnet with OS bullseye
18:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2031.codfw.wmnet with OS bullseye
18:44 mutante: gitlab - moved data_persistence group to new parent, under /repos/
18:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
18:32 mutante: gitlab - created group 'data_persistence' - added Ladsgroup and upgraded from member to maintainer
18:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
18:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2031.codfw.wmnet with OS bullseye
17:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2025.codfw.wmnet with OS bullseye
17:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
17:31 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
17:18 ryankemper: T289135 T314078 Manually reimaging remaining codfw stretch hosts (`elastic[2025,2031,2054,2059-2060]`) to bullseye, one host at a time, waiting for green cluster status to return between each run. `ryankemper@cumin1001` tmux session `codfw_reimage`
17:16 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2025.codfw.wmnet with OS bullseye
17:08 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
17:08 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
17:06 mutante: alert1001 - systemctl restart nsca - pinged by fundraising tech because fundraising hosts have the "passive check is awol" issue again (T196336)
16:25 moritzm: installing tcpdump updates from bullseye point release
16:23 cwhite@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
16:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1018.eqiad.wmnet with OS bullseye
16:10 cwhite@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
15:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
15:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
15:41 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1018.eqiad.wmnet with OS bullseye
15:39 mvernon@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase T309896 - mvernon@cumin1001
15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:29 mvernon@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase T309896 - mvernon@cumin1001
15:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Beta: add configuration for redirect badges (T313896) (2/2, should be a no-op) (duration: 03m 30s)
15:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
15:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Beta: add configuration for redirect badges (T313896) (1/2, should be a no-op) (duration: 03m 15s)
15:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:54 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
14:53 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
14:42 moritzm: installing openjdk-11 security updates
14:39 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
14:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
14:38 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
14:34 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
14:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
14:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
14:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
14:28 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
14:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
14:13 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/skins/Vector/: b5007c5: Revert "styles: Unify on standard external link icon"" (duration: 03m 16s)
14:12 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
14:12 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
14:05 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
14:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2044.codfw.wmnet with OS bullseye
14:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: bcb7b0d: Adjust width-height ratio of logo to fix display issue (T310961; 2/2) (duration: 03m 17s)
14:04 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/srwikisource{.png;-1.5x.png;-2x.png} (T310961)
14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
14:01 urbanecm@deploy1002: Synchronized static/images/project-logos/: bcb7b0d: srwikisource: Adjust width-height ratio of logo to fix display issue (T310961; 1/2) (duration: 03m 41s)
14:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
14:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
13:58 urbanecm: UTC afternoon backport window is going to overflow by a couple of minutes
13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
13:44 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
13:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2044.codfw.wmnet with OS bullseye
13:22 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
11:50 moritzm: installing openjdk-8 security updates for stretch
11:43 moritzm: uploaded openjdk-8 8u342-b07-1~deb9u1 for stretch-wikimedia
10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P32124 and previous config saved to /var/cache/conftool/dbconfig/20220801-102714-ladsgroup.json
10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32123 and previous config saved to /var/cache/conftool/dbconfig/20220801-101208-ladsgroup.json
10:09 vgutierrez: test ATS 9.1.2 on cp6016 - T309651
10:05 vgutierrez: test ATS 9.1.2 on cp6008 - T309651
10:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@4da9195]: (no justification provided) (duration: 00m 19s)
10:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@4da9195]: (no justification provided)
09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32122 and previous config saved to /var/cache/conftool/dbconfig/20220801-095702-ladsgroup.json
09:56 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@85585b0]: (no justification provided) (duration: 00m 05s)
09:56 ebysans@deploy1002: Started deploy [airflow-dags/analytics@85585b0]: (no justification provided)
09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P32121 and previous config saved to /var/cache/conftool/dbconfig/20220801-094156-ladsgroup.json
09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P32120 and previous config saved to /var/cache/conftool/dbconfig/20220801-093845-ladsgroup.json
09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Maintenance
09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Maintenance
09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
09:21 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2004.codfw.wmnet
09:10 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2004.codfw.wmnet
09:10 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2003.codfw.wmnet
09:01 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2003.codfw.wmnet
09:00 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2002.codfw.wmnet
08:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:53 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/includes/api: Backport: api: Support for links migration in ApiQueryBacklinks (T312865 T314112) (duration: 03m 01s)
08:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
08:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
08:50 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2002.codfw.wmnet
08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
08:48 godog: thanos-be2004: copy quarantined and tmp off sdb3 and into sdb4 for analysis and to free space - T314275
08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
08:47 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to the old templatelinks columns in itwikisource (T312865) (duration: 03m 12s)
08:43 vgutierrez: rolling upgrade of HAProxy to version 2.4.18
08:43 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
08:41 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
08:39 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
08:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
08:28 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
08:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
08:14 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1002.eqiad.wmnet
06:19 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers|api)-ro,name=codfw
06:14 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appservers-ro
06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appserver-ro
06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=(appserver|api)-ro
05:43 moritzm: installing Linux 5.10.127-2 on Gitlab runners
01:00 krinkle@deploy1002: Synchronized multiversion/: Ic0dbcb (duration: 03m 31s)
00:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
00:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
00:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
00:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
00:45 krinkle@deploy1002: Synchronized multiversion/MWMultiVersion.php: I9d363abd7cfef (duration: 03m 17s)
00:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
00:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: appl

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s