Server Admin Log/Archive 62

2023-01-31

23:51 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS bullseye
23:45 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3055.esams.wmnet with OS bullseye
23:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
23:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS bullseye
23:34 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
23:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3054.esams.wmnet with OS bullseye
22:54 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2040.codfw.wmnet
22:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2040.codfw.wmnet with OS bullseye
22:35 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), Stop writing to cuc_comment in testwiki (T233004) (duration: 07m 34s)
22:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
22:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
22:30 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), Stop writing to cuc_comment in testwiki (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
22:28 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), Stop writing to cuc_comment in testwiki (T233004)
22:26 zabe@deploy1002: Finished scap: Backport for Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097) (duration: 08m 43s)
22:19 zabe@deploy1002: zabe and bawolff: Backport for Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
22:17 zabe@deploy1002: Started scap: Backport for Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097)
22:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2040.codfw.wmnet with OS bullseye
22:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet
22:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2038.codfw.wmnet with OS bullseye
22:07 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=ats-be
22:07 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=cdn
22:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5020.eqsin.wmnet with OS bullseye
21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
21:44 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cassandra-dev2002.codfw.wmnet
21:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
21:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host cassandra-dev2002.codfw.wmnet
21:36 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
21:35 kindrobot: close UTC late backport window. Did not deploy bawolff 884142 as I ran out of time. zabe may reopen the window in around 30 minutes to finish it out
21:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
21:33 kindrobot@deploy1002: Finished scap: Backport for Enable ClientPreferences for group0 (T327979) (duration: 10m 17s)
21:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
21:29 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
21:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2038.codfw.wmnet with OS bullseye
21:25 kindrobot@deploy1002: kindrobot and nray: Backport for Enable ClientPreferences for group0 (T327979) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2039.codfw.wmnet
21:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet
21:23 kindrobot@deploy1002: Started scap: Backport for Enable ClientPreferences for group0 (T327979)
21:17 kindrobot@deploy1002: Finished scap: Backport for Enable Linter write namespace, tag and template for group0 and group1 (T299612) (duration: 13m 20s)
21:06 kindrobot@deploy1002: sbailey and kindrobot: Backport for Enable Linter write namespace, tag and template for group0 and group1 (T299612) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:04 kindrobot@deploy1002: Started scap: Backport for Enable Linter write namespace, tag and template for group0 and group1 (T299612)
21:04 jgleeson: smashpig updated from d1434aeb to 683df497
21:03 kindrobot: start UTC late backport window
20:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
20:57 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5020.eqsin.wmnet with OS bullseye
20:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2036.codfw.wmnet with OS bullseye
20:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS bullseye
20:45 zabe: start running "foreachwikiindblist s5.dblist migrateRevisionCommentTemp.php --sleep 2" in screen # T275246
20:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
20:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
20:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
20:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
20:11 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2036.codfw.wmnet with OS bullseye
20:09 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5029.eqsin.wmnet,service=ats-be
20:09 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5029.eqsin.wmnet,service=cdn
20:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS bullseye
20:05 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet
20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5029.eqsin.wmnet with OS bullseye
20:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2037.codfw.wmnet with OS bullseye
20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
19:59 sukhe: sudo rm /etc/dhcp/automation/ttyS1-115200/cp5020.conf
19:58 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5020.eqsin.wmnet with OS bullseye
19:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
19:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
19:40 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
19:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
19:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
19:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2037.codfw.wmnet with OS bullseye
19:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
19:16 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
19:12 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.21 refs T325584
18:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-be
18:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=cdn
18:44 mutante: gitlab-prod-1001.devtools (cloud) - rebooted VM ; ip addr del 172.16.7.146/32 dev eth0 - T318521
18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
18:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
18:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2034.codfw.wmnet with OS bullseye
18:26 mutante: gitlab-prod-1001.devtools (cloud) - ip addr del 172.16.7.146/21 dev eth0 - T318521
18:25 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
18:25 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
18:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1075']
18:24 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1075']
18:22 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1075.eqiad.wmnet']
18:22 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1075.eqiad.wmnet']
18:21 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1075.eqiad.wmnet
18:21 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1075.eqiad.wmnet
18:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=cdn
18:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
18:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
18:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
17:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp5029.eqsin.wmnet with OS bullseye
17:53 sukhe: depool cp1075.eqiad.wmnet for iDRAC firmware testing: T321309
17:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
17:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=cdn
17:50 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2034.codfw.wmnet with OS bullseye
17:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp5019.eqsin.wmnet
17:47 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp5019.eqsin.wmnet
17:39 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1090.eqiad.wmnet
17:38 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1090.eqiad.wmnet
17:38 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1076.eqiad.wmnet
17:37 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1076.eqiad.wmnet
17:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet,service=ats-be
17:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet,service=cdn
17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5018.eqsin.wmnet with OS bullseye
17:33 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet
17:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5028.eqsin.wmnet,service=ats-be
17:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5028.eqsin.wmnet,service=cdn
17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5028.eqsin.wmnet with OS bullseye
17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
17:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=ats-be
17:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=cdn
17:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5029.eqsin.wmnet,service=ats-be
17:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5029.eqsin.wmnet,service=cdn
17:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2032.codfw.wmnet with OS bullseye
17:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp5019.eqsin.wmnet
17:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
17:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
17:03 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp5019.eqsin.wmnet
16:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
16:54 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
16:54 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
16:52 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 11s)
16:52 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
16:52 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:52 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 10s)
16:52 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:52 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
16:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS bullseye
16:49 mutante: mw2271 - renabling disabled puppet
16:49 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2032.codfw.wmnet with OS bullseye
16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:45 cwhite@cumin2002: conftool action : set/weight=10; selector: name=logstash2032.codfw.wmnet
16:44 cwhite@cumin2002: conftool action : set/weight=10; selector: name=logstash1032.eqiad.wmnet
16:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
16:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
16:40 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
16:38 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
16:37 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
16:37 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
16:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
16:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
16:29 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Grants:Programs/Wikimedia Community Fund" "Grants:Programs/Wikimedia Community Fund/General Support Fund" "Zabe" --reason "per request T328456" --skip-subpages # T328456
16:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet,service=ats-be
16:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet,service=cdn
16:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS bullseye
16:20 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
16:19 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
16:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS bullseye
16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5028.eqsin.wmnet with OS bullseye
16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5018.eqsin.wmnet with OS bullseye
16:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bullseye
16:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
16:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
16:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS bullseye
16:01 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
15:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS bullseye
15:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
15:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
15:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=cdn
15:54 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS bullseye
15:49 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagemaster1001.eqiad.wmnet with OS bullseye
15:40 ladsgroup@deploy1002: Finished scap: Backport for Set 'groupLoadsBySection' for s11 (T326980) (duration: 09m 49s)
15:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
15:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
15:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
15:32 ladsgroup@deploy1002: ladsgroup and zabe: Backport for Set 'groupLoadsBySection' for s11 (T326980) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
15:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
15:30 ladsgroup@deploy1002: Started scap: Backport for Set 'groupLoadsBySection' for s11 (T326980)
15:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2035.codfw.wmnet with OS bullseye
15:23 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bullseye
15:20 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagemaster1001.eqiad.wmnet with OS bullseye
15:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
15:01 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1005.eqiad.wmnet with OS bullseye
15:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
14:56 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1004.eqiad.wmnet with OS bullseye
14:56 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1006.eqiad.wmnet with OS bullseye
14:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1005.eqiad.wmnet with reason: host reimage
14:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:46 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:46 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1004.eqiad.wmnet with reason: host reimage
14:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
14:43 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1006.eqiad.wmnet with reason: host reimage
14:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2035.codfw.wmnet with OS bullseye
14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1005.eqiad.wmnet with reason: host reimage
14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1004.eqiad.wmnet with reason: host reimage
14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1006.eqiad.wmnet with reason: host reimage
14:34 urbanecm@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) (duration: 07m 23s)
14:33 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1006.eqiad.wmnet with OS bullseye
14:33 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1005.eqiad.wmnet with OS bullseye
14:32 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1004.eqiad.wmnet with OS bullseye
14:28 urbanecm@deploy1002: dreamyjazz and urbanecm and daniel: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwde
14:26 urbanecm@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534)
14:20 urbanecm@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) (duration: 16m 33s)
14:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
14:07 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:06 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:05 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
14:05 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
14:05 urbanecm@deploy1002: urbanecm and dreamyjazz and daniel: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwde
14:05 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
14:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
14:04 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
14:03 urbanecm@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534)
14:01 urbanecm@deploy1002: Backport cancelled.
12:36 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 35s)
12:36 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
11:51 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@42a07d3] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 35s)
11:50 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@42a07d3] (eqiad): Disable traffic mirroring from codfw to eqiad
11:21 moritzm: installing bind9 security updates (client-side tools/libs only)
10:57 jayme@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=k8s-ingress-staging
10:57 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=k8s-ingress-staging
10:18 jayme: switching active kubernetes staging cluster from eqiad to codfw - T327664
09:20 marostegui: dbmaint Install MariaDB 10.6 on db2093 (db_inventory) T328408
09:05 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:00 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text in testwiki (T233004) (duration: 08m 11s)
09:00 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
08:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
08:54 elukey: roll restart kafka on kafka-logging1001 to pick up new pki certs
08:53 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text in testwiki (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
08:51 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text in testwiki (T233004)
08:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:45 elukey: restore previously removed password for keystore to kafka-logging clusters
08:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
08:36 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
07:56 moritzm: installing bash bugfix updates from Bullseye point release
07:22 marostegui: dbmaint Schema change on s3 eqiad T328373
07:22 marostegui: dbmaint Schema change on s1 eqiad T328373
07:10 marostegui: Failover m2 from db1164 to db1195 - T328253
07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
07:03 marostegui: dbmaint Schema change on s5 eqiad T328373
06:59 marostegui: dbmaint Schema change on s7 eqiad T328373
06:57 marostegui: dbmaint Schema change on s4 eqiad T328373
06:52 marostegui: dbmaint Schema change on s8 eqiad T328373
05:02 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.19 (duration: 02m 15s)
05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
05:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.21 refs T325584 (duration: 52m 56s)
04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.21 refs T325584
02:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet,service=ats-be
02:43 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet,service=cdn
02:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3053.esams.wmnet with OS bullseye
02:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
01:59 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
01:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
01:33 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3053.esams.wmnet']
01:31 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3053.esams.wmnet']
00:50 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet
00:42 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5027.eqsin.wmnet with OS bullseye
00:14 mutante: etherpad - maintenance downtime for about 5 minutes to test monitoring
00:09 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
00:06 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage

2023-01-30

23:30 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS bullseye
23:29 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3053.esams.wmnet with OS bullseye
23:07 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
22:58 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
22:50 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3053.esams.wmnet with OS bullseye
22:38 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
22:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-be
22:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=cdn
22:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2030.codfw.wmnet with OS bullseye
21:56 urbanecm@deploy1002: Finished scap: Backport for Try to determine what's adding to Parsoid init times (T328201), Update interwiki cache (duration: 12m 24s)
21:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
21:46 urbanecm@deploy1002: arlolra and urbanecm: Backport for Try to determine what's adding to Parsoid init times (T328201), Update interwiki cache synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
21:44 urbanecm@deploy1002: Started scap: Backport for Try to determine what's adding to Parsoid init times (T328201), Update interwiki cache
21:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
21:42 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Update campaign configuration (T321370) (duration: 08m 47s)
21:35 urbanecm@deploy1002: tgr and urbanecm: Backport for GrowthExperiments: Update campaign configuration (T321370) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:34 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2020.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
21:34 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Update campaign configuration (T321370)
21:33 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
21:31 urbanecm@deploy1002: Finished scap: Backport for Enable WelcomeSurvey at viwiki (T325376), Fix grid blowout with limited width turned off (T327423), Support new style of table of contents (T327942) (duration: 09m 52s)
21:26 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bullseye
21:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2030.codfw.wmnet with OS bullseye
21:24 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2020.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
21:23 urbanecm@deploy1002: tgr and urbanecm and jdlrobson and legoktm: Backport for Enable WelcomeSurvey at viwiki (T325376), Fix grid blowout with limited width turned off (T327423), Support new style of table of contents (T327942) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:21 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2019.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
21:21 urbanecm@deploy1002: Started scap: Backport for Enable WelcomeSurvey at viwiki (T325376), Fix grid blowout with limited width turned off (T327423), Support new style of table of contents (T327942)
21:21 urbanecm@deploy1002: Finished scap: Backport for InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387) (duration: 19m 51s)
21:11 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2019.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
21:03 urbanecm@deploy1002: urbanecm and musikanimal: Backport for InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-be
21:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=cdn
21:01 urbanecm@deploy1002: Started scap: Backport for InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387)
20:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
20:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
20:51 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
20:35 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
20:35 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
20:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2033.codfw.wmnet with OS bullseye
20:23 zabe@deploy1002: Finished scap: Backport for slwiki: Raise AF emergency disable treshold+count (T328366) (duration: 07m 32s)
20:17 zabe@deploy1002: zabe: Backport for slwiki: Raise AF emergency disable treshold+count (T328366) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:16 zabe@deploy1002: Started scap: Backport for slwiki: Raise AF emergency disable treshold+count (T328366)
20:15 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
20:14 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
20:12 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS bullseye
20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
20:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
19:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet,service=ats-be
19:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet,service=cdn
19:50 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
19:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3052.esams.wmnet with OS bullseye
19:47 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
19:44 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2033.codfw.wmnet with OS bullseye
19:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
19:35 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
19:26 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
19:26 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4044.ulsfo.wmnet with OS bullseye
19:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
19:22 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
19:21 cstone: payments-wiki upgraded from 653c7cc8 to f20a2208
19:16 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
19:15 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4051.ulsfo.wmnet
19:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
18:46 sukhe@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp3052.esams.wmnet']
18:46 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
18:46 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3052.esams.wmnet']
18:45 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
18:45 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3052.esams.wmnet with OS bullseye
18:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
18:37 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3052.esams.wmnet with OS bullseye
18:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3052.esams.wmnet']
18:37 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
18:34 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4051.ulsfo.wmnet with OS bullseye
18:29 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
18:29 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
18:19 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
18:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3052.esams.wmnet with OS bullseye
18:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
18:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3052.esams.wmnet
18:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
18:04 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
18:01 urbanecm@deploy1002: Finished scap: Backport for [Growth] Remove wgGERecentChangesUnstarredMenteesFilterEnabled (duration: 07m 59s)
17:53 urbanecm@deploy1002: Started scap: Backport for [Growth] Remove wgGERecentChangesUnstarredMenteesFilterEnabled
17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43517 and previous config saved to /var/cache/conftool/dbconfig/20230130-174957-ladsgroup.json
17:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3052.esams.wmnet
17:43 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS bullseye
17:43 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4051.ulsfo.wmnet with OS bullseye
17:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=ats-be
17:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=cdn
17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43516 and previous config saved to /var/cache/conftool/dbconfig/20230130-173450-ladsgroup.json
17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=ats-be
17:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=cdn
17:31 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS bullseye
17:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet
17:27 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS bullseye
17:24 inflatador: bking@build2001 rebuilding docker images for 884351 complete
17:22 inflatador: bking@build2001 rebuilding docker images for 884351
17:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5026.eqsin.wmnet with OS bullseye
17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43515 and previous config saved to /var/cache/conftool/dbconfig/20230130-171944-ladsgroup.json
17:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3050.esams.wmnet with OS bullseye
17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43514 and previous config saved to /var/cache/conftool/dbconfig/20230130-170437-ladsgroup.json
16:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
16:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43513 and previous config saved to /var/cache/conftool/dbconfig/20230130-165359-ladsgroup.json
16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43512 and previous config saved to /var/cache/conftool/dbconfig/20230130-165348-ladsgroup.json
16:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
16:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
16:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
16:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43511 and previous config saved to /var/cache/conftool/dbconfig/20230130-163842-ladsgroup.json
16:35 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
16:35 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS bullseye
16:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
16:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
16:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43510 and previous config saved to /var/cache/conftool/dbconfig/20230130-162336-ladsgroup.json
16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=ats-be
16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=cdn
16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-be
16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=cdn
16:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3050.esams.wmnet with OS bullseye
16:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
16:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2029.codfw.wmnet with OS bullseye
16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43509 and previous config saved to /var/cache/conftool/dbconfig/20230130-161324-root.json
16:11 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
16:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
16:10 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
16:10 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3050.esams.wmnet,service=ats-be
16:10 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3050.esams.wmnet,service=cdn
16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43508 and previous config saved to /var/cache/conftool/dbconfig/20230130-160829-ladsgroup.json
16:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS bullseye
16:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5026.eqsin.wmnet with OS bullseye
16:03 sukhe: racreset cp3050.esams.wmnet: firmware cookbook iDRAC upgrade test
16:03 moritzm: upgrading idp-test to latest Java security update
15:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
15:59 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43507 and previous config saved to /var/cache/conftool/dbconfig/20230130-155819-root.json
15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43506 and previous config saved to /var/cache/conftool/dbconfig/20230130-155802-ladsgroup.json
15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43505 and previous config saved to /var/cache/conftool/dbconfig/20230130-155747-ladsgroup.json
15:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS bullseye
15:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
15:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43504 and previous config saved to /var/cache/conftool/dbconfig/20230130-154314-root.json
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43503 and previous config saved to /var/cache/conftool/dbconfig/20230130-154241-ladsgroup.json
15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3051.esams.wmnet with OS bullseye
15:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2029.codfw.wmnet with OS bullseye
15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43502 and previous config saved to /var/cache/conftool/dbconfig/20230130-152809-root.json
15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43501 and previous config saved to /var/cache/conftool/dbconfig/20230130-152734-ladsgroup.json
15:14 marostegui: Retrospective: Starting s4 codfw failover from db2110 to db2140 - T328022
15:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43500 and previous config saved to /var/cache/conftool/dbconfig/20230130-151304-root.json
15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43499 and previous config saved to /var/cache/conftool/dbconfig/20230130-151228-ladsgroup.json
15:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
15:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43498 and previous config saved to /var/cache/conftool/dbconfig/20230130-150132-ladsgroup.json
15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
14:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43497 and previous config saved to /var/cache/conftool/dbconfig/20230130-145759-root.json
14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 T328022', diff saved to https://phabricator.wikimedia.org/P43496 and previous config saved to /var/cache/conftool/dbconfig/20230130-145508-root.json
14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary T328022', diff saved to https://phabricator.wikimedia.org/P43495 and previous config saved to /var/cache/conftool/dbconfig/20230130-145421-root.json
14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43494 and previous config saved to /var/cache/conftool/dbconfig/20230130-145229-ladsgroup.json
14:47 moritzm: updating puppetdb 7 hosts to 7.12.1 T321783
14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable Linter write namespace, tag and template from core, group0 (T299612) (duration: 11m 11s)
14:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3051.esams.wmnet with OS bullseye
14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43493 and previous config saved to /var/cache/conftool/dbconfig/20230130-144213-ladsgroup.json
14:38 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43492 and previous config saved to /var/cache/conftool/dbconfig/20230130-143723-ladsgroup.json
14:36 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and sbailey: Backport for Enable Linter write namespace, tag and template from core, group0 (T299612) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable Linter write namespace, tag and template from core, group0 (T299612)
14:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Remove references to mediawiki.Uri" (T328143), Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143) (duration: 12m 07s)
14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43491 and previous config saved to /var/cache/conftool/dbconfig/20230130-142708-ladsgroup.json
14:22 lucaswerkmeister-wmde@deploy1002: matmarex and lucaswerkmeister-wmde: Backport for Revert "Remove references to mediawiki.Uri" (T328143), Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43490 and previous config saved to /var/cache/conftool/dbconfig/20230130-142216-ladsgroup.json
14:21 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Remove references to mediawiki.Uri" (T328143), Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143)
14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 T328022', diff saved to https://phabricator.wikimedia.org/P43489 and previous config saved to /var/cache/conftool/dbconfig/20230130-141822-root.json
14:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43488 and previous config saved to /var/cache/conftool/dbconfig/20230130-141203-ladsgroup.json
14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43487 and previous config saved to /var/cache/conftool/dbconfig/20230130-140710-ladsgroup.json
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43486 and previous config saved to /var/cache/conftool/dbconfig/20230130-135659-ladsgroup.json
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43485 and previous config saved to /var/cache/conftool/dbconfig/20230130-135632-ladsgroup.json
13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43484 and previous config saved to /var/cache/conftool/dbconfig/20230130-134406-ladsgroup.json
13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
13:31 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 23s)
13:29 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
13:29 godog: bounce logstash on logstash1025 -- GC unhappy causing kafka lag
13:29 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 01m 13s)
13:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43483 and previous config saved to /var/cache/conftool/dbconfig/20230130-132701-ladsgroup.json
13:23 awight@deploy1002: Finished scap: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113) (duration: 08m 34s)
13:21 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 11s)
13:21 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
13:21 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 00m 22s)
13:20 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
13:16 awight@deploy1002: awight: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:14 awight@deploy1002: Started scap: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113)
13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43482 and previous config saved to /var/cache/conftool/dbconfig/20230130-131155-ladsgroup.json
13:00 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
12:59 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3004.wikimedia.org
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:57 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43481 and previous config saved to /var/cache/conftool/dbconfig/20230130-125648-ladsgroup.json
12:56 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
12:55 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
12:55 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
12:55 awight@deploy1002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.40.0-wmf.20" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.2oaGSEpQR1"' returned non-zero exit status 255. (duration: 00m 00s)
12:55 awight@deploy1002: Started scap: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113)
12:46 awight@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f]: Roll back kartotherian (duration: 01m 27s)
12:45 awight@deploy1002: Started deploy [kartotherian/deploy@5c58f8f]: Roll back kartotherian
12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43479 and previous config saved to /var/cache/conftool/dbconfig/20230130-124142-ladsgroup.json
12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43478 and previous config saved to /var/cache/conftool/dbconfig/20230130-123004-ladsgroup.json
12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43477 and previous config saved to /var/cache/conftool/dbconfig/20230130-122943-ladsgroup.json
12:25 awight@deploy1002: Finished deploy [kartotherian/deploy@42a07d3]: Disable traffic mirroring from codfw to eqiad (duration: 02m 44s)
12:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:23 awight@deploy1002: Started deploy [kartotherian/deploy@42a07d3]: Disable traffic mirroring from codfw to eqiad
12:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43476 and previous config saved to /var/cache/conftool/dbconfig/20230130-121437-ladsgroup.json
12:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3004.wikimedia.org
11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43475 and previous config saved to /var/cache/conftool/dbconfig/20230130-115930-ladsgroup.json
11:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast6001.wikimedia.org
11:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast6001.wikimedia.org
11:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42473
11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42473
11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43474 and previous config saved to /var/cache/conftool/dbconfig/20230130-114424-ladsgroup.json
11:42 moritzm: installing install4002 T327867
11:42 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1005.eqiad.wmnet
11:41 Amir1: dropping old wikiadmin user (T326802)
11:35 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1005.eqiad.wmnet
11:35 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1004.eqiad.wmnet
11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43473 and previous config saved to /var/cache/conftool/dbconfig/20230130-113319-ladsgroup.json
11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
11:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43472 and previous config saved to /var/cache/conftool/dbconfig/20230130-113254-ladsgroup.json
11:28 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1004.eqiad.wmnet
11:24 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
11:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install4002.wikimedia.org
11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43471 and previous config saved to /var/cache/conftool/dbconfig/20230130-111748-ladsgroup.json
11:17 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
11:11 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
11:09 phedenskog@deploy1002: Finished deploy [performance/navtiming@4e5ff3f]: (no justification provided) (duration: 00m 05s)
11:09 phedenskog@deploy1002: Started deploy [performance/navtiming@4e5ff3f]: (no justification provided)
11:05 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install4002.wikimedia.org on all recursors
11:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install4002.wikimedia.org on all recursors
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4002.wikimedia.org - jmm@cumin2002"
11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4002.wikimedia.org - jmm@cumin2002"
11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43470 and previous config saved to /var/cache/conftool/dbconfig/20230130-110241-ladsgroup.json
11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
10:49 ladsgroup@deploy1002: Finished scap: Backport for Enable write both for externallinks except s4, s7, s8 (T321662) (duration: 13m 10s)
10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43468 and previous config saved to /var/cache/conftool/dbconfig/20230130-104735-ladsgroup.json
10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast4003.wikimedia.org
10:46 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:37 ladsgroup@deploy1002: ladsgroup: Backport for Enable write both for externallinks except s4, s7, s8 (T321662) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
10:36 ladsgroup@deploy1002: Started scap: Backport for Enable write both for externallinks except s4, s7, s8 (T321662)
10:36 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43467 and previous config saved to /var/cache/conftool/dbconfig/20230130-103540-ladsgroup.json
10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
10:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
10:30 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4003.wikimedia.org
10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43466 and previous config saved to /var/cache/conftool/dbconfig/20230130-102500-ladsgroup.json
10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593
10:17 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts bast4003.wikimedia.org
10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4003.wikimedia.org
10:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 14593
10:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 49544
10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43465 and previous config saved to /var/cache/conftool/dbconfig/20230130-100954-ladsgroup.json
10:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 49544
10:00 awight@deploy1002: Finished scap: Backport for Enable kartographer external data parse time fetch for all wikis (T326317) (duration: 07m 53s)
09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43464 and previous config saved to /var/cache/conftool/dbconfig/20230130-095447-ladsgroup.json
09:54 awight@deploy1002: lilients and awight: Backport for Enable kartographer external data parse time fetch for all wikis (T326317) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
09:52 awight@deploy1002: Started scap: Backport for Enable kartographer external data parse time fetch for all wikis (T326317)
09:52 XioNoX: push pfw policies - T328085
09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43463 and previous config saved to /var/cache/conftool/dbconfig/20230130-093941-ladsgroup.json
09:29 jynus: disabling puppet on dbprov2004 to reorganize partitions T327155
09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43462 and previous config saved to /var/cache/conftool/dbconfig/20230130-092804-ladsgroup.json
09:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
09:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43461 and previous config saved to /var/cache/conftool/dbconfig/20230130-092732-ladsgroup.json
09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43460 and previous config saved to /var/cache/conftool/dbconfig/20230130-091225-ladsgroup.json
08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43459 and previous config saved to /var/cache/conftool/dbconfig/20230130-085719-ladsgroup.json
08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43458 and previous config saved to /var/cache/conftool/dbconfig/20230130-085530-ladsgroup.json
08:48 moritzm: installing install1004 T327867
08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43457 and previous config saved to /var/cache/conftool/dbconfig/20230130-084213-ladsgroup.json
08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43456 and previous config saved to /var/cache/conftool/dbconfig/20230130-084024-ladsgroup.json
08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43455 and previous config saved to /var/cache/conftool/dbconfig/20230130-083034-ladsgroup.json
08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43454 and previous config saved to /var/cache/conftool/dbconfig/20230130-082517-ladsgroup.json
08:19 zabe:: Deployed security patch for T278365
08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43452 and previous config saved to /var/cache/conftool/dbconfig/20230130-081011-ladsgroup.json
07:54 phedenskog@deploy1002: Finished deploy [performance/navtiming@bfbd6d7]: (no justification provided) (duration: 00m 05s)
07:54 phedenskog@deploy1002: Started deploy [performance/navtiming@bfbd6d7]: (no justification provided)
07:50 moritzm: installing install2004 T327867
07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43451 and previous config saved to /var/cache/conftool/dbconfig/20230130-074502-ladsgroup.json
07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43450 and previous config saved to /var/cache/conftool/dbconfig/20230130-073827-ladsgroup.json
07:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
07:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43449 and previous config saved to /var/cache/conftool/dbconfig/20230130-073806-ladsgroup.json
07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43448 and previous config saved to /var/cache/conftool/dbconfig/20230130-072956-ladsgroup.json
07:26 marostegui: dbmaint Schema change on s7 eqiad T328236
07:25 marostegui: dbmaint Schema change on s2 eqiad T328236
07:25 marostegui: dbmaint Schema change on s1 eqiad T328236
07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43447 and previous config saved to /var/cache/conftool/dbconfig/20230130-072300-ladsgroup.json
07:21 marostegui: dbmaint Schema change on s1 eqiad T328236
07:17 marostegui: dbmaint Schema change on s4 eqiad T328236
07:16 marostegui: dbmaint Schema change on s6 eqiad T328236
07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43446 and previous config saved to /var/cache/conftool/dbconfig/20230130-071450-ladsgroup.json
07:11 marostegui: dbmaint Schema change on s5 eqiad T328236
07:10 marostegui: dbmaint Schema change on s8 eqiad T328236
07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43445 and previous config saved to /var/cache/conftool/dbconfig/20230130-070753-ladsgroup.json
07:05 marostegui: dbmaint Schema change on s3 eqiad T328086
07:02 marostegui: dbmaint Schema change on s1 eqiad T328086
07:01 marostegui: dbmaint Schema change on s4 eqiad T328086
06:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43444 and previous config saved to /var/cache/conftool/dbconfig/20230130-065943-ladsgroup.json
06:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43443 and previous config saved to /var/cache/conftool/dbconfig/20230130-065247-ladsgroup.json
06:51 marostegui: dbmaint Schema change on s5 eqiad T328086
06:45 marostegui: dbmaint Schema change on s2 eqiad T328086
06:43 marostegui: dbmaint Schema change on s7 eqiad T328086
06:41 marostegui: dbmaint Schema change on s8 eqiad T328086
06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
06:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
06:34 marostegui: dbmaint Schema change on s6 eqiad T328086
06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T318605)', diff saved to https://phabricator.wikimedia.org/P43441 and previous config saved to /var/cache/conftool/dbconfig/20230130-061534-ladsgroup.json
06:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
06:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43440 and previous config saved to /var/cache/conftool/dbconfig/20230130-061401-ladsgroup.json
06:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
06:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43439 and previous config saved to /var/cache/conftool/dbconfig/20230130-053033-ladsgroup.json
05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance

2023-01-29

14:46 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
14:40 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
14:39 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1008.eqiad.wmnet
14:33 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1008.eqiad.wmnet

2023-01-28

00:36 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet
00:35 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet
00:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS bullseye

2023-01-27

23:55 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
23:52 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
23:31 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
23:31 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4050.ulsfo.wmnet with OS bullseye
23:22 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
23:21 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet
22:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS bullseye
22:24 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
22:20 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
22:11 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.2-1+deb11u1_amd64.changes # T328162
22:11 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.2-1_amd64.changes # T328162
22:00 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
21:59 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4042.ulsfo.wmnet with OS bullseye
21:51 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
21:49 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
20:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4049.ulsfo.wmnet with OS bullseye
20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
20:26 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
20:05 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
20:02 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4049.ulsfo.wmnet with OS bullseye
19:38 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
19:38 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4049.ulsfo.wmnet with OS bullseye
19:32 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
19:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
19:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp404.ulsfo.wmnet
19:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS bullseye
19:02 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
18:57 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
18:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
18:37 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4041.ulsfo.wmnet with OS bullseye
18:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
18:24 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4048.ulsfo.wmnet
18:14 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS bullseye
17:52 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
17:49 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
17:38 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@907fe2a]: (no justification provided) (duration: 00m 14s)
17:38 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@907fe2a]: (no justification provided)
17:28 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS bullseye
17:28 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4048.ulsfo.wmnet with OS bullseye
17:15 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS bullseye
15:50 dancy@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 04s)
15:50 dancy@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet,service=ats-be
15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet,service=cdn
15:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS bullseye
15:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet,service=ats-be
15:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet,service=cdn
15:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2027.codfw.wmnet with OS bullseye
15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
15:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
14:58 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
14:55 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
14:55 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
14:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
14:46 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
14:45 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
14:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
14:41 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
14:40 moritzm: installing install3002 T327867
14:39 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
14:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
14:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
14:27 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts clouddb2001-dev.codfw.wmnet
14:27 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:26 andrew@cumin1001: START - Cookbook sre.dns.netbox
14:22 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts clouddb2001-dev.codfw.wmnet
14:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts clouddb2001-dev.codfw.wmnet
14:20 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:20 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
14:17 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
14:13 andrew@cumin1001: START - Cookbook sre.dns.netbox
14:10 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts clouddb2001-dev.codfw.wmnet
13:46 moritzm: installing install5002 T327867
13:08 moritzm: installing install6002 T327867
12:47 hashar: gerrit1001 running Puppet to deploy https://gerrit.wikimedia.org/r/883965 and restarting Apache 2 to change the `Listen` statements # T326125
12:42 hashar: Rebooting gerrit2002
12:38 hashar: Stopped Puppet on gerrit1001 to prevent auto deployment of https://gerrit.wikimedia.org/r/883965
12:25 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
12:25 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
12:23 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
12:03 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@9690bf9]: (no justification provided) (duration: 00m 15s)
12:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@9690bf9]: (no justification provided)
12:01 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
12:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138915
12:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138915
11:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9318
11:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9318
11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55821
11:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55821
11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398143
11:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398143
11:57 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2001-dev.codfw.wmnet with reason: host reimage
11:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26077
11:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26077
11:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 50266
11:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 50266
11:54 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2001-dev.codfw.wmnet with reason: host reimage
11:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14593
11:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14593
11:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56898
11:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56898
11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8368
11:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8368
11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8560
11:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8560
11:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34309
11:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34309
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12033
11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12033
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62537
11:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62537
11:41 XioNoX: restart keyholder on deploy1002
11:41 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
11:40 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
11:38 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
11:36 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
11:27 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:26 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 56s)
11:25 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:25 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
11:24 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:24 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:15 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:15 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
11:15 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
11:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
11:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
11:13 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:12 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
11:11 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-corp1001.wikimedia.org
11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:09 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:08 aborrero@cumin2002: START - Cookbook sre.dns.netbox
11:08 aborrero@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
11:05 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp1001.wikimedia.org
11:04 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: apply on main
11:04 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
11:03 aborrero@cumin2002: START - Cookbook sre.dns.netbox
11:01 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: apply on main
11:01 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
10:53 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ldap-corp1001.wikimedia.org
10:52 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp1001.wikimedia.org
10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
10:38 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
10:37 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
10:37 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:26 aborrero@cumin2002: START - Cookbook sre.dns.netbox
10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-corp2001.wikimedia.org
10:23 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
10:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp2001.wikimedia.org
09:40 moritzm: disabling old bastions bast3005/bast4003/bast5002/bast6001, use bast3006/bast4004/bast5003/bast6002 instead
08:23 marostegui: Apply schema change on labtestwiki (clouddb2002-dev)T328086
08:22 marostegui: Apply schema change on db1106 (s1 enwiki) T328086
08:06 elukey: restart kube-apiserver on ml-staging-ctrl2* nodes as attempt to mitigate some LIST API high latency
07:41 elukey: restart kube-apiserver on ml-serve-ctrl2* nodes as attempt to mitigate some 504 API response errors
01:15 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2*: Applying configuration change to cassandra-dev cluster - eevans@cumin1001
01:11 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet
01:10 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4047.ulsfo.wmnet with OS bullseye
00:56 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2*: Applying configuration change to cassandra-dev cluster - eevans@cumin1001
00:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
00:45 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
00:33 zabe@deploy1002: Finished scap: Backport for Stop setting cul_actor migration var (T233004) (duration: 07m 36s)
00:27 zabe@deploy1002: zabe: Backport for Stop setting cul_actor migration var (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
00:26 zabe@deploy1002: Started scap: Backport for Stop setting cul_actor migration var (T233004)
00:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
00:24 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye
00:16 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
00:15 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye
00:11 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
00:10 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye

2023-01-26

23:59 zabe@deploy1002: Finished scap: Backport for Add a project logo on gorwiktionary (T327987) (duration: 34m 42s)
23:54 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
23:52 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4039.ulsfo.wmnet
23:51 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4039.ulsfo.wmnet with OS bullseye
23:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
23:26 zabe@deploy1002: zabe and superpes: Backport for Add a project logo on gorwiktionary (T327987) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
23:25 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
23:24 zabe@deploy1002: Started scap: Backport for Add a project logo on gorwiktionary (T327987)
23:13 sbassett@deploy1002: Synchronized private/PrivateSettings.php: T326691 - remove mitigation and monitor (duration: 06m 52s)
23:04 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
23:04 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS bullseye
23:03 zabe@deploy1002: Finished scap: Backport for Pin CheckUserEventTablesMigrationStage to read and write old (T324907) (duration: 08m 36s)
22:56 zabe@deploy1002: dreamyjazz and zabe: Backport for Pin CheckUserEventTablesMigrationStage to read and write old (T324907) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
22:54 zabe@deploy1002: Started scap: Backport for Pin CheckUserEventTablesMigrationStage to read and write old (T324907)
22:45 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
22:44 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4046.ulsfo.wmnet
22:44 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS bullseye
22:23 zabe: running migrateRevisionCommentTemp.php in cebwiki in screen with --sleep 2 # T275246
22:22 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
22:18 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
21:58 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
21:47 thcipriani@deploy1002: Finished scap: Backport for Increase threshold for table of contents collapsing (T328045), Remove redundant block for search descriptions (T324859) (duration: 08m 49s)
21:40 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for Increase threshold for table of contents collapsing (T328045), Remove redundant block for search descriptions (T324859) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:39 thcipriani@deploy1002: Started scap: Backport for Increase threshold for table of contents collapsing (T328045), Remove redundant block for search descriptions (T324859)
21:36 thcipriani@deploy1002: Finished scap: Backport for ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704) (duration: 08m 43s)
21:35 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS bullseye
21:34 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
21:33 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS bullseye
21:33 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
21:33 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS bullseye
21:29 thcipriani@deploy1002: matmarex and thcipriani: Backport for ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:27 thcipriani@deploy1002: Started scap: Backport for ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704)
21:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
21:25 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
21:24 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
21:20 thcipriani@deploy1002: Finished scap: Backport for Enable write new for CheckUserLog comment fields everywhere (T233004) (duration: 11m 18s)
21:11 thcipriani@deploy1002: thcipriani and dreamyjazz: Backport for Enable write new for CheckUserLog comment fields everywhere (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
21:09 thcipriani@deploy1002: Started scap: Backport for Enable write new for CheckUserLog comment fields everywhere (T233004)
21:01 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
20:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
20:36 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
20:13 ryankemper: `ryankemper@thanos-fe1001:~$ sudo run-puppet-agent` following merge of wdqs recording rule patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/883610
20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on cp2027.codfw.wmnet with reason: reimaging
20:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on cp2027.codfw.wmnet with reason: reimaging
20:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
19:56 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4038.ulsfo.wmnet with OS bullseye
19:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp2027.codfw.wmnet with reason: reimaging
19:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp2027.codfw.wmnet with reason: reimaging
19:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.20 refs T325583
19:00 brennen: 1.40.0-wmf.20 train (T325583): no current blockers, rolling to all wikis.
18:59 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
18:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet
18:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS bullseye
18:20 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
18:17 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
18:17 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
18:16 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
18:16 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
18:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
18:15 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
18:15 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
18:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
18:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
18:14 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
18:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
18:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
18:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
18:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
18:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
18:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
18:10 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
18:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
18:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:59 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS bullseye
17:55 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet
17:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS bullseye
17:30 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1015.eqiad.wmnet
17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43427 and previous config saved to /var/cache/conftool/dbconfig/20230126-172806-root.json
17:27 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
17:24 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1015.eqiad.wmnet
17:24 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
17:22 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1014.eqiad.wmnet
17:19 dancy@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 11s)
17:19 dancy@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
17:16 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet
17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43426 and previous config saved to /var/cache/conftool/dbconfig/20230126-171302-root.json
17:12 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1013.eqiad.wmnet
17:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
17:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
17:06 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1013.eqiad.wmnet
17:06 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS bullseye
17:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet
17:05 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
17:05 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
17:04 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6007.drmrs.wmnet
17:03 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS bullseye
17:02 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1012.eqiad.wmnet
16:59 cgoubert@deploy1002: Synchronized tox.ini: Rebuilding mediawiki-webserver (duration: 07m 19s)
16:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43425 and previous config saved to /var/cache/conftool/dbconfig/20230126-165757-root.json
16:56 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1012.eqiad.wmnet
16:53 claime: Running scap sync-file -D php_fpm_restart_script:/bin/true tox.ini "Rebuilding mediawiki-webserver image" - T326794
16:51 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
16:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2027']
16:48 sukhe: correcting earlier log: pooling lvs2007 after T326564
16:48 sukhe: pooling lvs2009 after T326564
16:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43424 and previous config saved to /var/cache/conftool/dbconfig/20230126-164252-root.json
16:41 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
16:41 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2027']
16:38 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
16:38 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
16:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
16:31 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet
16:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
16:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
16:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
16:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43423 and previous config saved to /var/cache/conftool/dbconfig/20230126-162747-root.json
16:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
16:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet
16:24 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001-dev
16:23 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001-dev
16:23 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
16:21 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1010.eqiad.wmnet
16:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
16:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
16:20 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
16:19 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet
16:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:19 aborrero@cumin2002: START - Cookbook sre.dns.netbox
16:18 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS bullseye
16:14 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1010.eqiad.wmnet
16:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717
16:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717
16:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43422 and previous config saved to /var/cache/conftool/dbconfig/20230126-161242-root.json
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 T328024', diff saved to https://phabricator.wikimedia.org/P43421 and previous config saved to /var/cache/conftool/dbconfig/20230126-161137-root.json
16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 primary T328024', diff saved to https://phabricator.wikimedia.org/P43420 and previous config saved to /var/cache/conftool/dbconfig/20230126-161058-marostegui.json
16:10 marostegui: Starting s8 codfw failover from db2161 to db2165 - T328024
16:09 moritzm: installing distro-info-data updates from Bullseye point release
16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudgw2001-dev.codfw.wmnet
16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
16:06 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
16:05 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1009.eqiad.wmnet
15:55 jbond: enable-puppet post deploy requestctl ferm chage gerrit:883935
15:55 aborrero@cumin2002: START - Cookbook sre.dns.netbox
15:51 hashar: Restarting CI Jenkins for upgrade
15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s8 T328024
15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 T328024', diff saved to https://phabricator.wikimedia.org/P43419 and previous config saved to /var/cache/conftool/dbconfig/20230126-155000-root.json
15:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s8 T328024
15:49 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudgw2001-dev.codfw.wmnet
15:46 hashar: Restart Jenkins for upgrade
15:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1009.eqiad.wmnet
15:30 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
15:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
15:30 sukhe: install2003: rm /etc/dhcp/automation/ttyS1-115200/cp2027.conf
15:29 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
15:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
15:27 sukhe: poweroff lvs2007: T326564
15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43418 and previous config saved to /var/cache/conftool/dbconfig/20230126-152329-root.json
15:12 jbond: disabl-puppet deplot requestctl ferm chage gerrit:883935
15:09 sukhe: stop pybal on lvs2007: T326564
15:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on lvs2007.codfw.wmnet with reason: powering off for T326564
15:09 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on lvs2007.codfw.wmnet with reason: powering off for T326564
15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43417 and previous config saved to /var/cache/conftool/dbconfig/20230126-150824-root.json
15:04 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
15:04 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
15:02 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
15:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
14:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43415 and previous config saved to /var/cache/conftool/dbconfig/20230126-145319-root.json
14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:40 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
14:39 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43414 and previous config saved to /var/cache/conftool/dbconfig/20230126-143814-root.json
14:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:37 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
14:36 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
14:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:31 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:31 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotating wikiadmin password (T326802) (duration: 07m 04s)
14:27 moritzm: installing containerd security updates
14:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43413 and previous config saved to /var/cache/conftool/dbconfig/20230126-142309-root.json
14:16 Lucas_WMDE: UTC afternoon backport+config window done
14:15 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004) (duration: 09m 16s)
14:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:11 jbond: disable puppet fleet wide to role out etcd ferm change gerrit:883888
14:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:09 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43412 and previous config saved to /var/cache/conftool/dbconfig/20230126-140804-root.json
14:07 lucaswerkmeister-wmde@deploy1002: dreamyjazz and lucaswerkmeister-wmde: Backport for Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123 T328023', diff saved to https://phabricator.wikimedia.org/P43411 and previous config saved to /var/cache/conftool/dbconfig/20230126-140716-root.json
14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2113 to s5 primary T328023', diff saved to https://phabricator.wikimedia.org/P43410 and previous config saved to /var/cache/conftool/dbconfig/20230126-140630-root.json
14:06 marostegui: Starting s5 codfw failover from db2123 to db2113 - T328023
14:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004)
14:00 moritzm: restarting etherpad-lite to pick up nodejs security update
13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Remove vslow from db2113, future s5 codfw master T328023', diff saved to https://phabricator.wikimedia.org/P43409 and previous config saved to /var/cache/conftool/dbconfig/20230126-135509-marostegui.json
13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T328023
13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2113 with weight 0 T328023', diff saved to https://phabricator.wikimedia.org/P43408 and previous config saved to /var/cache/conftool/dbconfig/20230126-135215-root.json
13:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T328023
13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:44 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
13:37 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
13:32 ladsgroup@deploy1002: Finished scap: Backport for Change time zone setting on gorwiktionary (T327986) (duration: 12m 02s)
13:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:25 moritzm: restarting turnilo for nodejs security update
13:22 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Change time zone setting on gorwiktionary (T327986) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
13:20 ladsgroup@deploy1002: Started scap: Backport for Change time zone setting on gorwiktionary (T327986)
13:10 moritzm: installing nodejs security updates on bullseye
13:09 hashar: Rebooting gerrit2002.wikimedia.org host to validate Apache 2 services starts AFTER network went online | T326125
13:06 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:04 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
12:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
12:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp3051.esams.wmnet with reason: T323717
12:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp3051.esams.wmnet with reason: T323717
12:42 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=ats-be
12:42 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=cdn
12:41 sukhe: depool cp3051.esams.wmnet for firmware update testing: T323717
12:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
12:29 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
12:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
12:10 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
12:10 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
12:03 jbond: enable profile::base::firewall::defs_from_etcd: true globally
11:56 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd-client-ssl._tcp.wikimedia.org on all recursors
11:56 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd-client-ssl._tcp.wikimedia.org on all recursors
11:49 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
11:49 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flowspec1001
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
11:46 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
11:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
11:40 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts flowspec1001
11:36 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux
11:29 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
11:29 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43405 and previous config saved to /var/cache/conftool/dbconfig/20230126-110822-root.json
11:03 hashar: Restarted Apache 2 on gerrit.wikimedia.org
10:55 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
10:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"
10:54 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
10:54 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43404 and previous config saved to /var/cache/conftool/dbconfig/20230126-105317-root.json
10:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
10:46 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"
10:45 moritzm: installing postgresql-13 security updates
10:43 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
10:42 joal@deploy1002: Finished deploy [airflow-dags/analytics@e52205b]: (no justification provided) (duration: 00m 11s)
10:42 joal@deploy1002: Started deploy [airflow-dags/analytics@e52205b]: (no justification provided)
10:41 claime: cgoubert@authdns1001:~$ sudo -i authdns-update
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43403 and previous config saved to /var/cache/conftool/dbconfig/20230126-103812-root.json
10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43402 and previous config saved to /var/cache/conftool/dbconfig/20230126-103448-root.json
10:32 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435] (duration: 01m 16s)
10:31 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435]
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43401 and previous config saved to /var/cache/conftool/dbconfig/20230126-102307-root.json
10:21 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435] (duration: 00m 04s)
10:21 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435]
10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43400 and previous config saved to /var/cache/conftool/dbconfig/20230126-101943-root.json
10:08 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts sretest1002.eqiad.wmnet
10:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43399 and previous config saved to /var/cache/conftool/dbconfig/20230126-100802-root.json
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43398 and previous config saved to /var/cache/conftool/dbconfig/20230126-100438-root.json
09:59 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
09:58 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8ed8435] (duration: 01m 08s)
09:57 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8ed8435]
09:57 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (thin): Regular analytics weekly train THIN [analytics/refinery@8ed8435] (duration: 00m 05s)
09:57 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (thin): Regular analytics weekly train THIN [analytics/refinery@8ed8435]
09:56 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435]: Regular analytics weekly train [analytics/refinery@8ed8435] (duration: 07m 00s)
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43397 and previous config saved to /var/cache/conftool/dbconfig/20230126-095257-root.json
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43396 and previous config saved to /var/cache/conftool/dbconfig/20230126-095205-root.json
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43395 and previous config saved to /var/cache/conftool/dbconfig/20230126-094933-root.json
09:49 joal@deploy1002: Started deploy [analytics/refinery@8ed8435]: Regular analytics weekly train [analytics/refinery@8ed8435]
09:48 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
09:48 jbond@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
09:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
09:47 jbond@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
09:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
09:47 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
09:46 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43394 and previous config saved to /var/cache/conftool/dbconfig/20230126-093700-root.json
09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43393 and previous config saved to /var/cache/conftool/dbconfig/20230126-093620-root.json
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43392 and previous config saved to /var/cache/conftool/dbconfig/20230126-093428-root.json
09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43391 and previous config saved to /var/cache/conftool/dbconfig/20230126-093303-root.json
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2144 to x2 primary T313811', diff saved to https://phabricator.wikimedia.org/P43390 and previous config saved to /var/cache/conftool/dbconfig/20230126-092512-root.json
09:24 marostegui: Starting x2 codfw failover from db2142 to db2144 - T328001
09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43389 and previous config saved to /var/cache/conftool/dbconfig/20230126-092155-root.json
09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43388 and previous config saved to /var/cache/conftool/dbconfig/20230126-092115-root.json
09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43387 and previous config saved to /var/cache/conftool/dbconfig/20230126-091923-root.json
09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
09:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43386 and previous config saved to /var/cache/conftool/dbconfig/20230126-091758-root.json
09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43385 and previous config saved to /var/cache/conftool/dbconfig/20230126-090650-root.json
09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43384 and previous config saved to /var/cache/conftool/dbconfig/20230126-090610-root.json
09:05 phedenskog@deploy1002: Finished deploy [performance/navtiming@e5fdd6e]: (no justification provided) (duration: 00m 06s)
09:05 phedenskog@deploy1002: Started deploy [performance/navtiming@e5fdd6e]: (no justification provided)
09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P43383 and previous config saved to /var/cache/conftool/dbconfig/20230126-090418-root.json
09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T328000', diff saved to https://phabricator.wikimedia.org/P43382 and previous config saved to /var/cache/conftool/dbconfig/20230126-090302-root.json
09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43381 and previous config saved to /var/cache/conftool/dbconfig/20230126-090253-root.json
09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 primary T328000', diff saved to https://phabricator.wikimedia.org/P43380 and previous config saved to /var/cache/conftool/dbconfig/20230126-090212-root.json
09:02 marostegui: Starting s7 codfw failover from db2121 to db2118 - T328000
08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43379 and previous config saved to /var/cache/conftool/dbconfig/20230126-085145-root.json
08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43378 and previous config saved to /var/cache/conftool/dbconfig/20230126-085105-root.json
08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43377 and previous config saved to /var/cache/conftool/dbconfig/20230126-084748-root.json
08:44 moritzm: added Eoghan to pwstore
08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T328000
08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2118 with weight 0 T328000', diff saved to https://phabricator.wikimedia.org/P43376 and previous config saved to /var/cache/conftool/dbconfig/20230126-084112-root.json
08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T328000
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43375 and previous config saved to /var/cache/conftool/dbconfig/20230126-083640-root.json
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43374 and previous config saved to /var/cache/conftool/dbconfig/20230126-083600-root.json
08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2105 T327999', diff saved to https://phabricator.wikimedia.org/P43373 and previous config saved to /var/cache/conftool/dbconfig/20230126-083543-root.json
08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2127 to s3 primary T327999', diff saved to https://phabricator.wikimedia.org/P43372 and previous config saved to /var/cache/conftool/dbconfig/20230126-083459-root.json
08:34 marostegui: Starting s3 codfw failover from db2105 to db2127 - T327999
08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43371 and previous config saved to /var/cache/conftool/dbconfig/20230126-083243-root.json
08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T327999
08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2127 with weight 0 T327999', diff saved to https://phabricator.wikimedia.org/P43370 and previous config saved to /var/cache/conftool/dbconfig/20230126-082432-root.json
08:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s3 T327999
08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43369 and previous config saved to /var/cache/conftool/dbconfig/20230126-082055-root.json
08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43368 and previous config saved to /var/cache/conftool/dbconfig/20230126-082038-root.json
08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 T327998', diff saved to https://phabricator.wikimedia.org/P43367 and previous config saved to /var/cache/conftool/dbconfig/20230126-081916-root.json
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2107 to s2 primary T327998', diff saved to https://phabricator.wikimedia.org/P43366 and previous config saved to /var/cache/conftool/dbconfig/20230126-081818-root.json
08:17 marostegui: Starting s2 codfw failover from db2104 to db2107 - T327998
08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43365 and previous config saved to /var/cache/conftool/dbconfig/20230126-081738-root.json
08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43364 and previous config saved to /var/cache/conftool/dbconfig/20230126-080533-root.json
08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T327998
08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T327998
08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2107 with weight 0 T327998', diff saved to https://phabricator.wikimedia.org/P43363 and previous config saved to /var/cache/conftool/dbconfig/20230126-080427-root.json
08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P43362 and previous config saved to /var/cache/conftool/dbconfig/20230126-080233-root.json
08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103 T327997', diff saved to https://phabricator.wikimedia.org/P43361 and previous config saved to /var/cache/conftool/dbconfig/20230126-080159-root.json
08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 primary T327997', diff saved to https://phabricator.wikimedia.org/P43360 and previous config saved to /var/cache/conftool/dbconfig/20230126-080033-root.json
08:00 marostegui: Starting s1 codfw failover from db2103 to db2112 - T327997
07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43359 and previous config saved to /var/cache/conftool/dbconfig/20230126-075028-root.json
07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2012.*
07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2011.*
07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2010.*
07:48 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2009.*
07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2112 with weight 0 T327997', diff saved to https://phabricator.wikimedia.org/P43358 and previous config saved to /var/cache/conftool/dbconfig/20230126-073616-root.json
07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 38 hosts with reason: Primary switchover s1 T327997
07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 38 hosts with reason: Primary switchover s1 T327997
07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43357 and previous config saved to /var/cache/conftool/dbconfig/20230126-073523-root.json
07:25 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Depool pc2011 (T327925) (duration: 11m 19s)
07:25 dcausse: T322869: depooling wdqs2009 wdqs2010 wdqs2011 wdqs2012 these hosts should not serve user traffic yet they don't have the database loaded
07:23 marostegui: Failover m1 from db1195 to db1176 - T327800
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43356 and previous config saved to /var/cache/conftool/dbconfig/20230126-072017-root.json
07:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
07:17 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
07:16 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Depool pc2011 (T327925) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
07:14 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Depool pc2011 (T327925)
07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800
07:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800
07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43354 and previous config saved to /var/cache/conftool/dbconfig/20230126-070512-root.json
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db1103', diff saved to https://phabricator.wikimedia.org/P43353 and previous config saved to /var/cache/conftool/dbconfig/20230126-070220-marostegui.json
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T327861', diff saved to https://phabricator.wikimedia.org/P43352 and previous config saved to /var/cache/conftool/dbconfig/20230126-070158-root.json
07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 primary and set section read-write T327861', diff saved to https://phabricator.wikimedia.org/P43351 and previous config saved to /var/cache/conftool/dbconfig/20230126-070035-marostegui.json
07:00 marostegui: Starting x1 eqiad failover from db1120 to db1103 - T327861
06:48 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6015.drmrs.wmnet
06:48 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS bullseye
06:32 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotating wikiuser password (T326802) (duration: 07m 23s)
06:20 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
06:18 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1103 with weight 0 T327861', diff saved to https://phabricator.wikimedia.org/P43350 and previous config saved to /var/cache/conftool/dbconfig/20230126-061751-root.json
06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327861
06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327861
05:57 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS bullseye
05:53 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6006.drmrs.wmnet
05:53 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6006.drmrs.wmnet with OS bullseye
05:32 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
05:28 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
05:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6006.drmrs.wmnet with OS bullseye
05:09 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6014.drmrs.wmnet
05:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS bullseye
04:45 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
04:42 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
04:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS bullseye
04:22 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6005.drmrs.wmnet
04:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6005.drmrs.wmnet with OS bullseye
03:52 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
03:49 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
03:29 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6005.drmrs.wmnet with OS bullseye
03:27 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6013.drmrs.wmnet
03:27 ejegg: payments-wiki upgraded from 08b8c3bc to 82d89841
03:26 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS bullseye
03:04 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
03:01 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
02:41 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS bullseye
02:30 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
02:17 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
02:17 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
01:58 ejegg: restarted fundraising scheduled jobs after queue server reboot
01:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
01:49 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.codfw.wmnet,service=ats-be
01:49 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.codfw.wmnet,service=cdn
01:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2027.codfw.wmnet with reason: firmware test
01:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2027.codfw.wmnet with reason: firmware test
01:46 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet,service=ats-be
01:46 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet,service=cdn
01:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2028.codfw.wmnet with OS bullseye
01:24 ejegg: payments-wiki upgraded from 15395d05 to 08b8c3bc (upgraded from MW 1.35 to MW 1.39)
01:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
01:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
01:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
01:14 ejegg: disabled fundraising scheduled jobs for queue server reboot
01:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2028.codfw.wmnet with OS bullseye
01:03 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2028.codfw.wmnet
01:00 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
01:00 ejegg: turned pending transaction resolvers back on after civi deploy
00:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2028.codfw.wmnet
00:50 ejegg: civicrm upgraded from 3e6b21b6 to b5d6a790
00:50 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
00:49 sukhe: depool cp2028 for testing firmware update cookbook: T321309
00:49 ejegg: disabled pending transaction resolvers for civi deploy
00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=ats-be
00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=cdn

2023-01-25

23:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6004.drmrs.wmnet
23:57 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS bullseye
23:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
23:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
23:29 zabe@deploy1002: Finished scap: (no justification provided) (duration: 07m 34s)
23:21 zabe@deploy1002: Started scap: (no justification provided)
23:20 zabe@deploy1002: Backport cancelled.
23:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS bullseye
23:13 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6012.drmrs.wmnet
23:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS bullseye
22:43 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
22:40 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
22:21 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS bullseye
22:14 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6003.drmrs.wmnet
21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
21:34 samtar@deploy1002: Finished scap: Backport for Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714) (duration: 09m 27s)
21:26 samtar@deploy1002: jdrewniak and samtar: Backport for Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714) synced to the testservers: mwdebug2002.cod
21:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
21:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
21:24 samtar@deploy1002: Started scap: Backport for Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)
21:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
20:59 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
20:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS bullseye
20:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts cp2028.codfw.wmnet
20:58 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
20:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
20:49 ejegg: updated employers.csv on paymentswiki
20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
20:33 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
20:32 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo-eqiad cluster: Reboot kafka nodes
20:30 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
20:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS bullseye
20:00 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6011.drmrs.wmnet
19:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS bullseye
19:52 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
19:38 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
19:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
19:33 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
19:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
19:21 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
19:17 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.20 refs T325583 (duration: 07m 04s)
19:12 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS bullseye
19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.20 refs T325583
19:06 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6002.drmrs.wmnet
19:01 brennen: 1.40.0-wmf.20 train (T325583): no blockers, rolling to group1.
19:00 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
19:00 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
18:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS bullseye
18:37 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
18:35 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
18:34 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
18:33 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
18:33 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
18:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
18:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS bullseye
18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
18:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
18:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
18:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6010.drmrs.wmnet
17:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS bullseye
17:32 mutante: removing racktables.wikimedia.org from DNS - that's it for this ancient service T327405
16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
16:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2031.codfw.wmnet with OS bullseye
16:50 btullis@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka jumbo-eqiad cluster: Reboot kafka nodes
16:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
16:43 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=ats-be
16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=cdn
16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
16:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
16:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
16:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS bullseye
16:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
16:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
16:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
16:08 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
16:04 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
16:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
15:56 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2031']
15:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:50 robh: db1139 ilom wins/netbios disabled and ilom reset T327877
15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
15:47 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
15:46 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
15:45 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
15:45 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
15:44 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031.codfw.wmnet']
15:44 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031.codfw.wmnet']
15:43 robh: netbios wins disabled on db1140 ilom and ilom reset T327877
15:43 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
15:38 papaul: on going maintenance on fasw-c-eqiad
15:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
15:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
15:33 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
15:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
15:23 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
15:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-be
15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=cdn
15:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS bullseye
15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
15:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:12 urbanecm@deploy1002: Finished scap: triggering i18n refresh for T327824 (duration: 07m 57s)
15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
15:04 urbanecm@deploy1002: Started scap: triggering i18n refresh for T327824
15:04 urbanecm@deploy1002: Finished scap: Backport for Enable the Wikibase REST API on Wikidata (T324999) (duration: 08m 43s)
15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-be
15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=cdn
15:01 urbanecm: Overrunning B&C window
14:57 urbanecm@deploy1002: urbanecm and migr: Backport for Enable the Wikibase REST API on Wikidata (T324999) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
14:55 urbanecm@deploy1002: Started scap: Backport for Enable the Wikibase REST API on Wikidata (T324999)
14:53 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
14:53 urbanecm@deploy1002: Finished scap: Backport for REST: Use error log level for unexpected errors (T327490), User impact: amend incorrect parameter for the single day streak text (T327824) (duration: 32m 21s)
14:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
14:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install6002.wikimedia.org
14:40 urbanecm@deploy1002: jakob and sgimeno and urbanecm: Backport for REST: Use error log level for unexpected errors (T327490), User impact: amend incorrect parameter for the single day streak text (T327824) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install6002.wikimedia.org on all recursors
14:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install6002.wikimedia.org on all recursors
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
14:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
14:25 jmm@cumin2002: START - Cookbook sre.dns.netbox
14:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install6002.wikimedia.org
14:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5002.wikimedia.org
14:21 urbanecm@deploy1002: Started scap: Backport for REST: Use error log level for unexpected errors (T327490), User impact: amend incorrect parameter for the single day streak text (T327824)
14:16 urbanecm@deploy1002: Finished scap: Backport for Enable Draft namespace on Serbo-Croatian Wikipedia (T327864) (duration: 12m 59s)
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5002.wikimedia.org on all recursors
14:09 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5002.wikimedia.org on all recursors
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
14:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
14:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
14:05 urbanecm@deploy1002: aleksandar and urbanecm: Backport for Enable Draft namespace on Serbo-Croatian Wikipedia (T327864) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
14:04 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5002.wikimedia.org
14:03 urbanecm@deploy1002: Started scap: Backport for Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)
13:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
13:51 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
13:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install3002.wikimedia.org
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install3002.wikimedia.org on all recursors
13:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install3002.wikimedia.org on all recursors
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
13:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install3002.wikimedia.org
13:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install2004.wikimedia.org
13:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4037.ulsfo.wmnet with OS bullseye
13:04 jbond: puppet now using vendored version of augeas-core https://gerrit.wikimedia.org/r/c/operations/puppet/+/883233
13:04 jbond: enable puppet fleet wide to post deploy gerrit:883233
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install2004.wikimedia.org on all recursors
13:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install2004.wikimedia.org on all recursors
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
12:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
12:54 jbond: disable puppet fleet wide to deploy gerrit:883233
12:54 jnuche@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 21s)
12:54 jnuche@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
12:45 moritzm: restarting Exim on MXes to pick up new libtasn
12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2003.codfw.wmnet
12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2002.codfw.wmnet
12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1003.eqiad.wmnet
12:42 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1002.eqiad.wmnet
12:41 moritzm: restarting slapd on r/w servers to pick up new libtasn
12:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:37 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install2004.wikimedia.org
12:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install1004.wikimedia.org
12:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install1004.wikimedia.org on all recursors
12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install1004.wikimedia.org on all recursors
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
12:12 moritzm: installing libtasn security updates on buster
11:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install1004.wikimedia.org
11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
11:34 Lucas_WMDE: Updated the Wikidata property suggester with data from 20230102's JSON dump (T325942)
11:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
11:27 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
11:12 hnowlan: restarting lvs on lvs1019 for thumbor healthcheck change
11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43344 and previous config saved to /var/cache/conftool/dbconfig/20230125-111059-root.json
11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43343 and previous config saved to /var/cache/conftool/dbconfig/20230125-110924-root.json
11:08 hnowlan: restarting lvs on lvs2009 for thumbor healthcheck change
11:00 hnowlan: restarting lvs on lvs1020 for thumbor healthcheck change
11:00 hnowlan: restarting lvs on lvs1010 for thumbor healthcheck change
10:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43342 and previous config saved to /var/cache/conftool/dbconfig/20230125-105554-root.json
10:54 hnowlan: restarting lvs on lvs2010 for thumbor healthcheck change
10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43341 and previous config saved to /var/cache/conftool/dbconfig/20230125-105443-root.json
10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43340 and previous config saved to /var/cache/conftool/dbconfig/20230125-105419-root.json
10:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
10:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
10:43 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
10:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43338 and previous config saved to /var/cache/conftool/dbconfig/20230125-104049-root.json
10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43337 and previous config saved to /var/cache/conftool/dbconfig/20230125-103938-root.json
10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43336 and previous config saved to /var/cache/conftool/dbconfig/20230125-103914-root.json
10:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43335 and previous config saved to /var/cache/conftool/dbconfig/20230125-102544-root.json
10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43334 and previous config saved to /var/cache/conftool/dbconfig/20230125-102433-root.json
10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43333 and previous config saved to /var/cache/conftool/dbconfig/20230125-102409-root.json
10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43332 and previous config saved to /var/cache/conftool/dbconfig/20230125-101039-root.json
10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43331 and previous config saved to /var/cache/conftool/dbconfig/20230125-100928-root.json
10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43330 and previous config saved to /var/cache/conftool/dbconfig/20230125-100904-root.json
09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43329 and previous config saved to /var/cache/conftool/dbconfig/20230125-095534-root.json
09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43328 and previous config saved to /var/cache/conftool/dbconfig/20230125-095423-root.json
09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43327 and previous config saved to /var/cache/conftool/dbconfig/20230125-095400-root.json
09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43326 and previous config saved to /var/cache/conftool/dbconfig/20230125-094029-root.json
09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43325 and previous config saved to /var/cache/conftool/dbconfig/20230125-093918-root.json
09:30 Emperor: rolling depool & update of thanos front-ends T327871
08:40 XioNoX: bump SGIX max prefix limit
08:13 ladsgroup@deploy1002: Finished scap: Backport for Add sandbox link to Serbo-Croatian Wikipedia (T327833) (duration: 10m 13s)
08:05 ladsgroup@deploy1002: ladsgroup and aleksandar: Backport for Add sandbox link to Serbo-Croatian Wikipedia (T327833) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
08:03 ladsgroup@deploy1002: Started scap: Backport for Add sandbox link to Serbo-Croatian Wikipedia (T327833)
07:49 marostegui: Cloning db1196 from db1206 (lag will appear on s1 wiki replicas) T327859
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206 to clone db1196 T327859', diff saved to https://phabricator.wikimedia.org/P43322 and previous config saved to /var/cache/conftool/dbconfig/20230125-074601-marostegui.json
07:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@bfff15d]: (no justification provided) (duration: 00m 05s)
07:34 phedenskog@deploy1002: Started deploy [performance/navtiming@bfff15d]: (no justification provided)
07:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
07:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 to clone db1198', diff saved to https://phabricator.wikimedia.org/P43320 and previous config saved to /var/cache/conftool/dbconfig/20230125-072033-marostegui.json
07:08 AndyRussG: updated payments (config only) revision 15395d05, config 418160e9
04:10 eileen: config revision changed from dc0a0d3a to 089d0acb
04:01 eileen: civicrm upgraded from 9197ca29 to 3e6b21b6
03:27 eileen: civicrm upgraded from f6093fb2 to 9197ca29
03:05 eileen: config revision changed from 3f641fce to dc0a0d3a
01:17 legoktm: adjusting Gerrit group "Campaigns Team" so it is not recursively a member of itself
00:10 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
00:10 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye

2023-01-24

23:10 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id on testcommonswiki (T299954) (duration: 08m 02s)
23:04 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id on testcommonswiki (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
23:02 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id on testcommonswiki (T299954)
22:47 TheresNoTime: closing UTC late backport window
22:47 samtar@deploy1002: Finished scap: Backport for Add temporary extra grid-area for content translation extension (T327715), Add temporary extra grid-area for content translation extension (T327715) (duration: 09m 04s)
22:39 samtar@deploy1002: jdrewniak and samtar: Backport for Add temporary extra grid-area for content translation extension (T327715), Add temporary extra grid-area for content translation extension (T327715) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
22:37 samtar@deploy1002: Started scap: Backport for Add temporary extra grid-area for content translation extension (T327715), Add temporary extra grid-area for content translation extension (T327715)
22:30 samtar@deploy1002: Finished scap: Backport for [BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724), newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114) (duration: 07m 59s)
22:23 samtar@deploy1002: jforrester and samtar and stang: Backport for [BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724), newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
22:22 samtar@deploy1002: Started scap: Backport for [BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724), newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)
22:20 samtar@deploy1002: Finished scap: Backport for newiki: Add new permissions to group reviewer (T327114) (duration: 09m 02s)
22:19 mutante: DNS - adding new project language "gur" (Gurenɛ) - Gurenɛ is a major language of northern Ghana and the predominant language of the Upper East Region of Ghana. It is also widely spoken in Burkina Faso.. T327813
22:13 samtar@deploy1002: samtar and stang: Backport for newiki: Add new permissions to group reviewer (T327114) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
22:11 samtar@deploy1002: Started scap: Backport for newiki: Add new permissions to group reviewer (T327114)
22:08 samtar@deploy1002: Finished scap: Backport for Fix Wikitext editor preview layout in Vector 2022 (T327778), Fix Wikitext editor preview layout in Vector 2022 (T327778) (duration: 09m 36s)
22:06 TheresNoTime: extending UTC late backport window due to late start
22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=ats-be
22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=cdn
22:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS bullseye
22:00 samtar@deploy1002: samtar and jdrewniak: Backport for Fix Wikitext editor preview layout in Vector 2022 (T327778), Fix Wikitext editor preview layout in Vector 2022 (T327778) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:59 samtar@deploy1002: Started scap: Backport for Fix Wikitext editor preview layout in Vector 2022 (T327778), Fix Wikitext editor preview layout in Vector 2022 (T327778)
21:56 samtar@deploy1002: Finished scap: Backport for Work around sticky-positioned layers disabling subpixel rendering (T327460) (duration: 13m 31s)
21:45 samtar@deploy1002: nray and samtar: Backport for Work around sticky-positioned layers disabling subpixel rendering (T327460) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1009.eqiad.wmnet with OS bullseye
21:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:43 samtar@deploy1002: Started scap: Backport for Work around sticky-positioned layers disabling subpixel rendering (T327460)
21:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
21:38 zabe: running migrateRevisionCommentTemp.php on testcommonswiki (s4) with --sleep 10 # T275246
21:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
21:32 samtar@deploy1002: backport aborted: (duration: 06m 28s)
21:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
21:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
21:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS bullseye
21:05 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
21:03 TheresNoTime: holding UTC late backport window for outage, T327815
21:01 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
20:50 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
20:50 urandom: rebooting sessionstore1001.eqiad.wmnet -- T325132
20:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host sessionstore1001.eqiad.wmnet
20:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
20:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2027.codfw.wmnet
20:32 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2027.codfw.wmnet
20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-be
20:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=cdn
20:29 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet
20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5025.eqsin.wmnet with OS bullseye
20:28 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet
20:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
20:20 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
20:20 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=ats-be
20:19 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=cdn
20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=cdn
20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=ats-be
20:16 bblack: pool cp5032
20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=ats-be
20:16 mutante: contint2001 - restarted zuul
20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=cdn
20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=cdn
20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=ats-be
20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=cdn
20:12 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS bullseye
20:09 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-be
20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=cdn
20:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=ats-be
20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=cdn
20:05 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS bullseye
20:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
19:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
19:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
19:54 sukhe: reprepro -C main include bullseye-wikimedia libvmod-netmapper_1.9-3_amd64.changes: T326634
19:53 sukhe: reprepro -C main include bullseye-wikimedia libvmod-re2_1.5.3-4_amd64.changes: T326634
19:53 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
19:51 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
19:47 sukhe: reprepro -C main include bullseye-wikimedia libvmod-querysort_0.4_amd64.changes: T326634
19:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
19:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
19:39 urandom: rebooting restbase cassandra nodes, row d -- T325132
19:33 bblack: cp5032: restart varnish-frontend
19:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2025.codfw.wmnet
19:28 sukhe: reprepro -C main include bullseye-wikimedia varnish-modules_0.15.0-3_amd64.changes: T326634
19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
19:24 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
19:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2025.codfw.wmnet
19:19 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
19:19 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5025.eqsin.wmnet with OS bullseye
19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.20 refs T325583
19:06 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1011.eqiad.wmnet with OS bullseye
19:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1010.eqiad.wmnet with OS bullseye
19:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
19:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
19:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
18:55 jynus: deploy new dump grants for analytics dbs at db1108 T327155
18:43 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
18:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS bullseye
18:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
18:14 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
18:12 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
18:05 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
17:44 bblack: cp5032: upgrading packages (varnish, trafficserver
17:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2020.codfw.wmnet
17:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
17:36 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5017.eqsin.wmnet with OS bullseye
17:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
17:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
17:19 thcipriani: restarting ci jenkins for updates
17:13 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
17:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
17:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
17:04 urandom: rebooting restbase cassandra nodes, row c -- T325132
16:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
16:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
16:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2042.codfw.wmnet with OS bullseye
16:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
16:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
16:23 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
16:23 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
16:22 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
16:22 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
16:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
16:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
16:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
16:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
15:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:53 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2042.codfw.wmnet with OS bullseye
15:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:31 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
15:26 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 40s)
15:15 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
15:12 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring" (duration: 00m 33s)
15:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring"
14:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:57 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
14:55 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
14:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:41 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
14:39 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
14:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
14:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:36 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:35 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:34 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
14:33 volans@cumin1001: START - Cookbook sre.dns.netbox
14:29 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
14:29 effie: switch maps (kartotherian) from eqiad to codfw (attempt #2)
14:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:25 TheresNoTime: close UTC afternoon backport window
14:24 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:20 XioNoX: repool ulsfo (maintenance over)
14:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1010.eqiad.wmnet with OS bullseye
14:17 samtar@deploy1002: Finished scap: Backport for Increase PC writes from parsoid API to 10% (T320534) (duration: 07m 41s)
14:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:11 samtar@deploy1002: daniel and samtar: Backport for Increase PC writes from parsoid API to 10% (T320534) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:09 samtar@deploy1002: Started scap: Backport for Increase PC writes from parsoid API to 10% (T320534)
13:50 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:44 XioNoX: reboot ulsfo switches for software upgrade
13:40 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1002.eqiad.wmnet
13:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1002.eqiad.wmnet
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2002.codfw.wmnet
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:10 topranks: enabling tunnel services on cr2-eqdfw fpc 0 pic 1
13:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:04 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping2002.codfw.wmnet
12:56 zabe@deploy1002: Finished scap: Backport for Remove PoolCounter from extension-list (T327336) (duration: 44m 09s)
12:51 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
12:51 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
12:48 XioNoX: restart ulsfo switches for network maintenance
12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
12:43 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
12:40 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
12:38 zabe@deploy1002: zabe: Backport for Remove PoolCounter from extension-list (T327336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
12:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor2004.codfw.wmnet
12:12 zabe@deploy1002: Started scap: Backport for Remove PoolCounter from extension-list (T327336)
11:54 volans: uploaded python3-gjson_1.0.0 to apt.wikimedia.org bullseye-wikimedia,unstable-wikimedia
11:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43311 and previous config saved to /var/cache/conftool/dbconfig/20230124-114255-root.json
11:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:36 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3002.esams.wmnet
11:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43310 and previous config saved to /var/cache/conftool/dbconfig/20230124-112750-root.json
11:26 zabe@deploy1002: Finished scap: Backport for Stop loading PoolCounter extension (T327336) (duration: 09m 19s)
11:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bullseye
11:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping3002.esams.wmnet
11:19 zabe@deploy1002: zabe: Backport for Stop loading PoolCounter extension (T327336) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
11:17 zabe@deploy1002: Started scap: Backport for Stop loading PoolCounter extension (T327336)
11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43308 and previous config saved to /var/cache/conftool/dbconfig/20230124-111245-root.json
11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
11:03 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
11:03 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
11:03 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
11:02 effie: depooling maps (kartotherian) from codfw, leaving eqiad as pooled
11:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:59 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
10:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:58 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
10:58 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43306 and previous config saved to /var/cache/conftool/dbconfig/20230124-105740-root.json
10:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bullseye
10:52 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
10:49 XioNoX: depool ulsfo for network maintenance - T316532
10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl in s1 T326116', diff saved to https://phabricator.wikimedia.org/P43305 and previous config saved to /var/cache/conftool/dbconfig/20230124-104336-marostegui.json
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43304 and previous config saved to /var/cache/conftool/dbconfig/20230124-104235-root.json
10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1176 from s1 T326116', diff saved to https://phabricator.wikimedia.org/P43303 and previous config saved to /var/cache/conftool/dbconfig/20230124-104219-root.json
10:33 vgutierrez: repool cp4046
10:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
10:31 vgutierrez: restarting varnish on cp4046
10:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:29 vgutierrez: depool cp4046
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43302 and previous config saved to /var/cache/conftool/dbconfig/20230124-102730-root.json
10:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
10:22 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
10:19 moritzm: rolling Apache/FPM restarts on mw canaries to pick up libtasn security update
10:19 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2165 T327754', diff saved to https://phabricator.wikimedia.org/P43301 and previous config saved to /var/cache/conftool/dbconfig/20230124-101825-root.json
10:17 effie: depooling maps from equad && pooling maps on codfw
10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary T327754', diff saved to https://phabricator.wikimedia.org/P43300 and previous config saved to /var/cache/conftool/dbconfig/20230124-101727-root.json
10:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:14 marostegui: Starting s8 codfw failover from db2165 to db2161 - T327754
10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2041.codfw.wmnet with OS bullseye
10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43299 and previous config saved to /var/cache/conftool/dbconfig/20230124-101025-root.json
09:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
09:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
09:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
09:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43298 and previous config saved to /var/cache/conftool/dbconfig/20230124-095520-root.json
09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 35 hosts with reason: Primary switchover s8 T327754
09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 T327754', diff saved to https://phabricator.wikimedia.org/P43297 and previous config saved to /var/cache/conftool/dbconfig/20230124-095235-marostegui.json
09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 35 hosts with reason: Primary switchover s8 T327754
09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43296 and previous config saved to /var/cache/conftool/dbconfig/20230124-094725-root.json
09:41 moritzm: installing libtasn1-6 security updates on buster
09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43295 and previous config saved to /var/cache/conftool/dbconfig/20230124-094016-root.json
09:39 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
09:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2041.codfw.wmnet with OS bullseye
09:39 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43294 and previous config saved to /var/cache/conftool/dbconfig/20230124-093220-root.json
09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43293 and previous config saved to /var/cache/conftool/dbconfig/20230124-092511-root.json
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43292 and previous config saved to /var/cache/conftool/dbconfig/20230124-091715-root.json
09:14 kart_: Done: UTC morning backport window
09:13 kartik@deploy1002: Finished scap: Backport for Remove Kartographer versioned mapdata flags (T326288) (duration: 09m 44s)
09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43291 and previous config saved to /var/cache/conftool/dbconfig/20230124-091006-root.json
09:05 kartik@deploy1002: awight and kartik: Backport for Remove Kartographer versioned mapdata flags (T326288) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
09:03 kartik@deploy1002: Started scap: Backport for Remove Kartographer versioned mapdata flags (T326288)
09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43290 and previous config saved to /var/cache/conftool/dbconfig/20230124-090210-root.json
09:01 kartik@deploy1002: Finished scap: Backport for Deprecate the EnableMapFrame feature flag (T326288) (duration: 10m 42s)
08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43289 and previous config saved to /var/cache/conftool/dbconfig/20230124-085501-root.json
08:52 kartik@deploy1002: awight and kartik: Backport for Deprecate the EnableMapFrame feature flag (T326288) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
08:50 kartik@deploy1002: Started scap: Backport for Deprecate the EnableMapFrame feature flag (T326288)
08:48 kartik@deploy1002: Finished scap: Backport for Enable write new for CheckUserLog comment fields on testwikis (T233004) (duration: 15m 20s)
08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43288 and previous config saved to /var/cache/conftool/dbconfig/20230124-084705-root.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db2115 in x1 codfw', diff saved to https://phabricator.wikimedia.org/P43287 and previous config saved to /var/cache/conftool/dbconfig/20230124-084552-marostegui.json
08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096 T327745', diff saved to https://phabricator.wikimedia.org/P43286 and previous config saved to /var/cache/conftool/dbconfig/20230124-084508-marostegui.json
08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2115 to x1 codfw T327745', diff saved to https://phabricator.wikimedia.org/P43285 and previous config saved to /var/cache/conftool/dbconfig/20230124-084206-marostegui.json
08:39 marostegui: Starting x1 codfw failover from db2096 to db2115 - T327745
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2115 with weight 0 T327745', diff saved to https://phabricator.wikimedia.org/P43284 and previous config saved to /var/cache/conftool/dbconfig/20230124-083643-marostegui.json
08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327745
08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327745
08:35 kartik@deploy1002: dreamyjazz and kartik: Backport for Enable write new for CheckUserLog comment fields on testwikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
08:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@8c87ca6]: (no justification provided) (duration: 00m 06s)
08:34 phedenskog@deploy1002: Started deploy [performance/navtiming@8c87ca6]: (no justification provided)
08:33 kartik@deploy1002: Started scap: Backport for Enable write new for CheckUserLog comment fields on testwikis (T233004)
08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43283 and previous config saved to /var/cache/conftool/dbconfig/20230124-083200-root.json
08:28 kartik@deploy1002: Finished scap: Backport for Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727) (duration: 09m 09s)
08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2110 from API T327739', diff saved to https://phabricator.wikimedia.org/P43282 and previous config saved to /var/cache/conftool/dbconfig/20230124-082440-marostegui.json
08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 T327739', diff saved to https://phabricator.wikimedia.org/P43281 and previous config saved to /var/cache/conftool/dbconfig/20230124-082138-marostegui.json
08:21 kartik@deploy1002: kartik and matmarex: Backport for Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary T327739', diff saved to https://phabricator.wikimedia.org/P43280 and previous config saved to /var/cache/conftool/dbconfig/20230124-082025-root.json
08:19 kartik@deploy1002: Started scap: Backport for Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)
08:18 marostegui: Starting s4 codfw failover from db2140 to db2110 - T327739
08:16 kartik@deploy1002: Finished scap: Backport for Content Translation: Add campaign for Wiki Loves Living Heritage (T327587) (duration: 10m 25s)
08:07 kartik@deploy1002: kartik: Backport for Content Translation: Add campaign for Wiki Loves Living Heritage (T327587) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
08:05 kartik@deploy1002: Started scap: Backport for Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)
07:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T327739
07:58 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T327739
07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 T327739', diff saved to https://phabricator.wikimedia.org/P43279 and previous config saved to /var/cache/conftool/dbconfig/20230124-075824-root.json
07:50 moritzm: installing Linux 5.10.162 on Bullseye hosts
07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl T327616', diff saved to https://phabricator.wikimedia.org/P43278 and previous config saved to /var/cache/conftool/dbconfig/20230124-074323-marostegui.json
06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43277 and previous config saved to /var/cache/conftool/dbconfig/20230124-064905-ladsgroup.json
06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43276 and previous config saved to /var/cache/conftool/dbconfig/20230124-064554-ladsgroup.json
06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43275 and previous config saved to /var/cache/conftool/dbconfig/20230124-063358-ladsgroup.json
06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43274 and previous config saved to /var/cache/conftool/dbconfig/20230124-063048-ladsgroup.json
06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43273 and previous config saved to /var/cache/conftool/dbconfig/20230124-061852-ladsgroup.json
06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43272 and previous config saved to /var/cache/conftool/dbconfig/20230124-061541-ladsgroup.json
06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43271 and previous config saved to /var/cache/conftool/dbconfig/20230124-060345-ladsgroup.json
06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43270 and previous config saved to /var/cache/conftool/dbconfig/20230124-060129-ladsgroup.json
06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43269 and previous config saved to /var/cache/conftool/dbconfig/20230124-060035-ladsgroup.json
05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43268 and previous config saved to /var/cache/conftool/dbconfig/20230124-055816-ladsgroup.json
05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
04:57 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.18 (duration: 02m 07s)
04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.20 refs T325583 (duration: 53m 01s)
04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.20 refs T325583
03:30 AndyRussG: payments-wiki upgraded from 3d882ac7 to 15395d05
02:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2024.codfw.wmnet
02:27 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2024.codfw.wmnet
02:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
02:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
02:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2019.codfw.wmnet
02:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
02:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
01:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
01:51 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
01:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
01:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
01:26 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
01:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1032.eqiad.wmnet
01:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
01:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1031.eqiad.wmnet
01:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1031.eqiad.wmnet
01:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
00:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
00:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
00:47 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
00:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
00:38 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
00:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
00:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
00:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1018.eqiad.wmnet
00:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1018.eqiad.wmnet
00:14 zabe@deploy1002: Finished scap: Backport for Use core's PoolCounterClient (T327336) (duration: 12m 47s)
00:03 zabe@deploy1002: zabe: Backport for Use core's PoolCounterClient (T327336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
00:01 zabe@deploy1002: Started scap: Backport for Use core's PoolCounterClient (T327336)

2023-01-23

23:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
23:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
23:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
23:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
23:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
23:07 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
22:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
22:57 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
22:56 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@544f5f3]: 0.3.119 (duration: 07m 30s)
22:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
22:49 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.119` on canary `wdqs1003`; proceeding to rest of fleet
22:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@544f5f3]: 0.3.119
22:46 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.119`. Pre-deploy tests passing on canary `wdqs1003`
22:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
22:37 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
22:31 maryum: Deployed patch for T285159
21:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
21:40 zabe@deploy1002: Finished scap: Backport for throttle: Remove expired rule (duration: 07m 45s)
21:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
21:34 zabe@deploy1002: zabe: Backport for throttle: Remove expired rule synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:32 zabe@deploy1002: Started scap: Backport for throttle: Remove expired rule
21:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
21:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
21:12 kindrobot: close UTC late backport window
21:12 kindrobot@deploy1002: Finished scap: Backport for Enable Page Tools for logged-in users on enwiki (T327686) (duration: 09m 00s)
21:04 kindrobot@deploy1002: jdrewniak and kindrobot: Backport for Enable Page Tools for logged-in users on enwiki (T327686) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:03 kindrobot@deploy1002: Started scap: Backport for Enable Page Tools for logged-in users on enwiki (T327686)
21:01 kindrobot: start UTC late backport window
20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
20:45 taavi: restart T315510 on group1 after mwmaint restart, currently running on wikidatawiki
19:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
19:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
19:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
19:30 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
19:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
19:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
19:17 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
19:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
19:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
19:16 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
18:48 mutante: miscweb1002 - unload CAS apache module and config; apt-get remove libapache2-mod-auth-cas
18:19 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load - apt-get remove libapache2-mod-auth-cas - T327405
18:08 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load
18:05 mutante: miscweb1002 - disabling puppet because latest merge would break apache if it runs, debugging in progress on inactive miscweb2002
18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43265 and previous config saved to /var/cache/conftool/dbconfig/20230123-175241-ladsgroup.json
17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43264 and previous config saved to /var/cache/conftool/dbconfig/20230123-173736-ladsgroup.json
17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43263 and previous config saved to /var/cache/conftool/dbconfig/20230123-172231-ladsgroup.json
17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43262 and previous config saved to /var/cache/conftool/dbconfig/20230123-170726-ladsgroup.json
17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
16:48 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 48s)
16:42 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 48s)
16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43261 and previous config saved to /var/cache/conftool/dbconfig/20230123-163207-root.json
16:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43260 and previous config saved to /var/cache/conftool/dbconfig/20230123-163138-root.json
16:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43259 and previous config saved to /var/cache/conftool/dbconfig/20230123-161702-root.json
16:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43258 and previous config saved to /var/cache/conftool/dbconfig/20230123-161633-root.json
16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43257 and previous config saved to /var/cache/conftool/dbconfig/20230123-160157-root.json
16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43256 and previous config saved to /var/cache/conftool/dbconfig/20230123-160126-root.json
15:53 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.11-1wm1_amd64.changes: T326634
15:50 urbanecm: Deploy security patch for T327613
15:48 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
15:48 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43255 and previous config saved to /var/cache/conftool/dbconfig/20230123-154652-root.json
15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43254 and previous config saved to /var/cache/conftool/dbconfig/20230123-154621-root.json
15:44 papaul: on going maintenance on fasw-codfw
15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43253 and previous config saved to /var/cache/conftool/dbconfig/20230123-153147-root.json
15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43252 and previous config saved to /var/cache/conftool/dbconfig/20230123-153116-root.json
15:17 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.1.4-1wm1_amd64.changes: T325563
15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43251 and previous config saved to /var/cache/conftool/dbconfig/20230123-151642-root.json
15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43250 and previous config saved to /var/cache/conftool/dbconfig/20230123-151611-root.json
15:09 taavi@deploy1002: Finished scap: Backport for Revert "Enable Linter write namespace tag and template using core config" (duration: 07m 28s)
15:03 taavi@deploy1002: taavi and trainbranchbot: Backport for Revert "Enable Linter write namespace tag and template using core config" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
15:02 taavi@deploy1002: Started scap: Backport for Revert "Enable Linter write namespace tag and template using core config"
15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P43248 and previous config saved to /var/cache/conftool/dbconfig/20230123-150110-marostegui.json
15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P43247 and previous config saved to /var/cache/conftool/dbconfig/20230123-150018-marostegui.json
15:00 taavi@deploy1002: Finished scap: Backport for Enable Linter write namespace tag and template using core config (T299612) (duration: 07m 56s)
14:59 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
14:59 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
14:53 taavi@deploy1002: taavi and sbailey: Backport for Enable Linter write namespace tag and template using core config (T299612) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
14:52 taavi@deploy1002: Started scap: Backport for Enable Linter write namespace tag and template using core config (T299612)
14:46 taavi@deploy1002: Finished scap: Backport for SpecialUserrights: Allow updating the expiry of user groups (T327605) (duration: 08m 48s)
14:42 sukhe: rolling out pybal 1.15.10: T321191
14:39 taavi@deploy1002: taavi and func: Backport for SpecialUserrights: Allow updating the expiry of user groups (T327605) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:37 taavi@deploy1002: Started scap: Backport for SpecialUserrights: Allow updating the expiry of user groups (T327605)
14:37 taavi@deploy1002: Finished scap: Backport for zhwiki: Install PageAssessments (T326387) (duration: 11m 24s)
14:27 taavi@deploy1002: stang and taavi: Backport for zhwiki: Install PageAssessments (T326387) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
14:25 taavi@deploy1002: Started scap: Backport for zhwiki: Install PageAssessments (T326387)
14:25 taavi@deploy1002: Finished scap: Backport for bnwikiquote: Update logo (T323131), shnwikibooks: Add project logo (T327380) (duration: 09m 22s)
14:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
14:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:18 taavi: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zhwiki pageassessments # T326387
14:17 taavi@deploy1002: taavi and stang: Backport for bnwikiquote: Update logo (T323131), shnwikibooks: Add project logo (T327380) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:16 taavi@deploy1002: Started scap: Backport for bnwikiquote: Update logo (T323131), shnwikibooks: Add project logo (T327380)
12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43246 and previous config saved to /var/cache/conftool/dbconfig/20230123-124532-ladsgroup.json
12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43245 and previous config saved to /var/cache/conftool/dbconfig/20230123-123025-ladsgroup.json
12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43242 and previous config saved to /var/cache/conftool/dbconfig/20230123-121519-ladsgroup.json
12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:06 marostegui: dbmaint Reboot db2135 (m5 codfw master)
12:06 marostegui: dbmaint Reboot db2134 (m3 codfw master)
12:05 Emperor: removing /usr/local/bin/prometheus-puppet-agent-stats from prometheus crontab on snapshot1014
12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43241 and previous config saved to /var/cache/conftool/dbconfig/20230123-120012-ladsgroup.json
11:58 marostegui: dbmaint Reboot db2133 (m2 codfw master)
11:57 marostegui: dbmaint Reboot db2132 (m1 codfw master)
11:57 marostegui: Reboot db2132 (m1 codfw master)
11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43239 and previous config saved to /var/cache/conftool/dbconfig/20230123-113506-ladsgroup.json
11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2114 T327644', diff saved to https://phabricator.wikimedia.org/P43236 and previous config saved to /var/cache/conftool/dbconfig/20230123-112134-ladsgroup.json
11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43235 and previous config saved to /var/cache/conftool/dbconfig/20230123-112001-ladsgroup.json
11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2129 to s6 primary T327644', diff saved to https://phabricator.wikimedia.org/P43234 and previous config saved to /var/cache/conftool/dbconfig/20230123-111813-ladsgroup.json
11:17 Amir1: Starting s6 codfw failover from db2114 to db2129 - T327644
11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43233 and previous config saved to /var/cache/conftool/dbconfig/20230123-111147-ladsgroup.json
11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43232 and previous config saved to /var/cache/conftool/dbconfig/20230123-110456-ladsgroup.json
10:55 XioNoX: update management routers ACLs to add new bast hosts
10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2129 with weight 0 T327644', diff saved to https://phabricator.wikimedia.org/P43231 and previous config saved to /var/cache/conftool/dbconfig/20230123-105520-ladsgroup.json
10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
10:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43230 and previous config saved to /var/cache/conftool/dbconfig/20230123-104951-ladsgroup.json
10:48 vgutierrez: rolling upgrade to HAProxy 2.4.20 on ulsfo
10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 06s)
10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 20s)
10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
10:39 btullis@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
10:39 btullis@deploy1002: Installing scap version "4.33.1" for 1 hosts
10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-tool1010.eqiad.wmnet with OS bullseye
10:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
10:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
10:07 ladsgroup@deploy1002: Finished scap: Backport for Remove Flow as default in techconductwiki (duration: 07m 51s)
10:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-tool1010.eqiad.wmnet with OS bullseye
10:01 ladsgroup@deploy1002: ladsgroup: Backport for Remove Flow as default in techconductwiki synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
09:59 ladsgroup@deploy1002: Started scap: Backport for Remove Flow as default in techconductwiki
09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
08:49 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:49 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
08:48 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
08:46 volans@cumin1001: START - Cookbook sre.dns.netbox
08:45 zabe@deploy1002: Finished scap: Backport for Remove oversight group from privileged groups (T112147), Start reading from cuc_comment_id on wikidatawiki (T233004) (duration: 07m 48s)
08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43229 and previous config saved to /var/cache/conftool/dbconfig/20230123-084326-marostegui.json
08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43228 and previous config saved to /var/cache/conftool/dbconfig/20230123-084239-marostegui.json
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43227 and previous config saved to /var/cache/conftool/dbconfig/20230123-084055-root.json
08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43226 and previous config saved to /var/cache/conftool/dbconfig/20230123-084045-root.json
08:39 zabe@deploy1002: zabe: Backport for Remove oversight group from privileged groups (T112147), Start reading from cuc_comment_id on wikidatawiki (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
08:37 zabe@deploy1002: Started scap: Backport for Remove oversight group from privileged groups (T112147), Start reading from cuc_comment_id on wikidatawiki (T233004)
08:37 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 08s)
08:36 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
08:30 ladsgroup@deploy1002: Finished scap: Backport for Tweaks for new heading HTML structure (T327328 T327469) (duration: 17m 12s)
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43225 and previous config saved to /var/cache/conftool/dbconfig/20230123-082550-root.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43224 and previous config saved to /var/cache/conftool/dbconfig/20230123-082540-root.json
08:22 ladsgroup@deploy1002: ladsgroup and matmarex: Backport for Tweaks for new heading HTML structure (T327328 T327469) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
08:12 ladsgroup@deploy1002: Started scap: Backport for Tweaks for new heading HTML structure (T327328 T327469)
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43223 and previous config saved to /var/cache/conftool/dbconfig/20230123-081045-root.json
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43222 and previous config saved to /var/cache/conftool/dbconfig/20230123-081035-root.json
08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43221 and previous config saved to /var/cache/conftool/dbconfig/20230123-080824-ladsgroup.json
07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43220 and previous config saved to /var/cache/conftool/dbconfig/20230123-075540-root.json
07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43219 and previous config saved to /var/cache/conftool/dbconfig/20230123-075530-root.json
07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43218 and previous config saved to /var/cache/conftool/dbconfig/20230123-075319-ladsgroup.json
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43217 and previous config saved to /var/cache/conftool/dbconfig/20230123-074035-root.json
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43216 and previous config saved to /var/cache/conftool/dbconfig/20230123-074025-root.json
07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43215 and previous config saved to /var/cache/conftool/dbconfig/20230123-073814-ladsgroup.json
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43214 and previous config saved to /var/cache/conftool/dbconfig/20230123-072530-root.json
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43213 and previous config saved to /var/cache/conftool/dbconfig/20230123-072520-root.json
07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43212 and previous config saved to /var/cache/conftool/dbconfig/20230123-072309-ladsgroup.json
07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 db1206 T326669', diff saved to https://phabricator.wikimedia.org/P43211 and previous config saved to /var/cache/conftool/dbconfig/20230123-071323-marostegui.json
07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
06:23 kart_: Updated cxserver to 2023-01-20-051603-production (T323840, T326236)
06:19 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:18 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
06:17 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
06:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
06:16 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:12 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
05:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
04:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2113 T327611', diff saved to https://phabricator.wikimedia.org/P43210 and previous config saved to /var/cache/conftool/dbconfig/20230123-045939-ladsgroup.json
04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2123 to s5 primary T327611', diff saved to https://phabricator.wikimedia.org/P43209 and previous config saved to /var/cache/conftool/dbconfig/20230123-045740-ladsgroup.json
04:57 Amir1: Starting s5 codfw failover from db2113 to db2123 - T327611
04:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
04:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2123 with weight 0 T327611', diff saved to https://phabricator.wikimedia.org/P43208 and previous config saved to /var/cache/conftool/dbconfig/20230123-043324-ladsgroup.json
04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
04:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
04:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2107 T327609', diff saved to https://phabricator.wikimedia.org/P43207 and previous config saved to /var/cache/conftool/dbconfig/20230123-035458-ladsgroup.json
03:52 Amir1: Starting s2 codfw failover from db2107 to db2104 - T327609

2023-01-20

18:22 jynus: deploying new grants for backups on m1 T327155
16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:28 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
14:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
13:08 moritzm: installing node-minimatch security updates
13:01 moritzm: installing libxstream-java security updates
13:00 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1~wmf1_amd64.changes: T325557
12:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
12:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2040.codfw.wmnet with OS bullseye
12:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
12:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
12:17 moritzm: installing ping1003 T273509
12:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2040.codfw.wmnet with OS bullseye
12:03 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
12:02 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
10:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
10:32 elukey: restart kubelet on ml-staging200* nodes (some fs-inotify-related issues with the istio-proxy of newly created containers)
10:27 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:13 moritzm: installing emacs security updates on bullseye
10:13 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:12 moritzm: imported jenkins 2.375-2 to thirdparty/ci T326531
10:00 jnuche@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
10:00 jnuche@deploy1002: Installing scap version "4.33.1" for 1 hosts
08:59 moritzm: installing ping2003 T273509
08:10 elukey: restart kubelet on kubernetes2007 - node reported issues with it, marked as "notready" by the control plane
07:58 elukey: `apt-get clean` on doh4001 to free space (root partition almost filled)
01:55 ejegg: payments-wiki upgraded from 3cf03933 to 3d882ac7
01:12 ejegg: payments-wiki upgraded from fcb9ab60 to 3cf03933

2023-01-19

21:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2039.codfw.wmnet with OS bullseye
21:42 jdrewniak@deploy1002: Finished scap: Backport for Enable Page tools on viwiki and itwiki (T327348) (duration: 10m 38s)
21:33 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for Enable Page tools on viwiki and itwiki (T327348) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
21:31 jdrewniak@deploy1002: Started scap: Backport for Enable Page tools on viwiki and itwiki (T327348)
21:27 jdrewniak@deploy1002: Finished scap: Backport for Fix grid blowout with limited width turned off (T327423) (duration: 08m 26s)
21:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
21:20 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 13s)
21:20 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
21:20 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for Fix grid blowout with limited width turned off (T327423) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:18 jdrewniak@deploy1002: Started scap: Backport for Fix grid blowout with limited width turned off (T327423)
21:11 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2039.codfw.wmnet with OS bullseye
20:13 zabe@deploy1002: Finished scap: fix k8s drift (duration: 08m 02s)
20:05 zabe@deploy1002: Started scap: fix k8s drift
20:02 zabe@deploy1002: Finished scap: Backport for Start reading from cuc_comment_id everywhere except wikidatawiki (T233004) (duration: 14m 01s)
19:49 zabe@deploy1002: zabe: Backport for Start reading from cuc_comment_id everywhere except wikidatawiki (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
19:48 zabe@deploy1002: Started scap: Backport for Start reading from cuc_comment_id everywhere except wikidatawiki (T233004)
18:36 zabe: re-start populateCucComment on wikidatawiki post-mwmaint-reboot in screen with --sleep 2, will take ~30 hours # T233004
18:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
18:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
18:16 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
18:16 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
18:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
18:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
18:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
18:08 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
18:06 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
18:05 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
18:02 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
18:01 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
17:36 Amir1: bash Krinkle> Vatican Interm Papacy Runbook, § 5.1: Notify Wikipedia about incoming traffic.
17:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2038.codfw.wmnet with OS bullseye
17:13 zabe@deploy1002: Finished scap: T233004 (duration: 18m 50s)
17:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
16:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
16:54 zabe@deploy1002: Started scap: T233004
16:54 zabe@deploy1002: backport aborted: (duration: 15m 22s)
16:48 godog: roll-restart opensearch-dashboards in logstash collectors eqiad - T327161
16:44 zabe@deploy1002: Started scap: Backport for Add ability to start from cuc_id to populateCucComment (T233004)
16:42 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2038.codfw.wmnet with OS bullseye
16:27 moritzm: installing cryptsetup updates for bullseye
16:18 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=1) rolling restart_daemons on A:logstash-collector
16:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
16:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
16:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
16:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
16:06 jclark@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
15:55 sukhe: update pybal to 1.15.10 on lvs4010: T321191
15:45 effie: enable puppet on C:memcached hosts
15:42 godog: bounce opensearch on logstash102[34] - T327161
15:30 sukhe: reprepro -C main include buster-wikimedia pybal_1.15.10_amd64.changes: T321191
15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43194 and previous config saved to /var/cache/conftool/dbconfig/20230119-151917-ladsgroup.json
15:17 effie: disable puppet on all C:memcached servers to deploy 812173
15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43193 and previous config saved to /var/cache/conftool/dbconfig/20230119-150412-ladsgroup.json
14:57 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43192 and previous config saved to /var/cache/conftool/dbconfig/20230119-144907-ladsgroup.json
14:47 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
14:40 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43191 and previous config saved to /var/cache/conftool/dbconfig/20230119-143402-ladsgroup.json
14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
14:32 zabe: run populateCulComment on group2 wikis # T327290
14:30 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
14:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:58 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:27 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps2009.codfw.wmnet
12:19 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps2009.codfw.wmnet
12:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
12:06 moritzm: stopping/masking slapd on ldap-corp1001/ldap-corp2001 T323820
11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1054.eqiad.wmnet with OS bullseye
11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
11:29 hnowlan: rebooting maps-codfw for updates
11:29 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps1009.eqiad.wmnet
11:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf2004.codfw.wmnet
11:24 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:24 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet
11:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
11:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
11:18 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
11:13 filippo@cumin1001: START - Cookbook sre.dns.netbox
11:09 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf2004.codfw.wmnet
11:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf1004.eqiad.wmnet
11:08 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:08 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:06 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
11:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1054.eqiad.wmnet with OS bullseye
11:02 filippo@cumin1001: START - Cookbook sre.dns.netbox
10:58 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf1004.eqiad.wmnet
10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:44 hnowlan: rebooting maps-eqiad for updates
10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
10:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
10:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
10:19 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
10:17 claime: Restarted maintenance scripts on mwmaint1002.eqiad.wmnet
10:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
10:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
10:15 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint1002.eqiad.wmnet
10:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint1002.eqiad.wmnet
10:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
10:06 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
10:05 claime: Stopping maintenance scripts on mwmaint1002.eqiad.wmnet for reboot
09:55 moritzm: installing ping3003 T273509
09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
09:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
09:24 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.19 refs T325582
09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
08:26 moritzm: installing sudo security updates
07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2118 T327372', diff saved to https://phabricator.wikimedia.org/P43190 and previous config saved to /var/cache/conftool/dbconfig/20230119-060449-ladsgroup.json
06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2121 to s7 primary T327372', diff saved to https://phabricator.wikimedia.org/P43189 and previous config saved to /var/cache/conftool/dbconfig/20230119-060316-ladsgroup.json
06:02 Amir1: Starting s7 codfw failover from db2118 to db2121 - T327372
05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2121 with weight 0 T327372', diff saved to https://phabricator.wikimedia.org/P43188 and previous config saved to /var/cache/conftool/dbconfig/20230119-054243-ladsgroup.json
05:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372
05:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372

2023-01-18

23:47 zabe: run populateCulComment.php on all group0 and group1 wikis # T327290
23:42 cstone: civicrm upgraded from 164270b0 to f6093fb2
22:35 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646
22:03 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646
21:50 kindrobot: close UTC late backport window
21:50 kindrobot@deploy1002: Finished scap: Backport for [config]: Undeploy GDI Safety Survey Wave 4 (T327296) (duration: 10m 45s)
21:41 kindrobot@deploy1002: essexigyan and kindrobot: Backport for [config]: Undeploy GDI Safety Survey Wave 4 (T327296) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:39 kindrobot@deploy1002: Started scap: Backport for [config]: Undeploy GDI Safety Survey Wave 4 (T327296)
21:36 kindrobot@deploy1002: Finished scap: Backport for Bump English Wikipedia event logging from 0.5 to 1% (T326892), Legacy Vector is not a responsive skin (T327256) (duration: 13m 01s)
21:25 kindrobot@deploy1002: kindrobot and jdlrobson: Backport for Bump English Wikipedia event logging from 0.5 to 1% (T326892), Legacy Vector is not a responsive skin (T327256) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:23 kindrobot@deploy1002: Started scap: Backport for Bump English Wikipedia event logging from 0.5 to 1% (T326892), Legacy Vector is not a responsive skin (T327256)
21:08 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS bullseye
21:05 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS bullseye
21:03 kindrobot: start UTC late backport window
20:54 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
20:51 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
20:49 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
20:48 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
20:36 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS bullseye
20:35 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS bullseye
20:34 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
20:34 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS buster
19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:52 bblack: db1129 and lvs1017: removed misconfigured IP address in wrong vlan from eno1 and /e/n/i
19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS buster
19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
19:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
19:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
19:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS buster
18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS buster
18:21 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable the REST API on test-wikidata (T324999) (duration: 09m 38s)
18:14 lucaswerkmeister-wmde@deploy1002: migr and lucaswerkmeister-wmde: Backport for Enable the REST API on test-wikidata (T324999) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
18:12 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable the REST API on test-wikidata (T324999)
17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
17:44 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 560 hosts
17:44 jnuche@deploy1002: Installing scap version "4.33.0" for 560 hosts
17:42 jnuche@deploy1002: install-world aborted: (duration: 07m 17s)
17:42 btullis@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
17:41 btullis@deploy1002: Installing scap version "4.33.0" for 1 hosts
17:35 jnuche@deploy1002: Installing scap version "4.33.0" for 561 hosts
17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1037']
17:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
17:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1037']
17:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1036']
16:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1036']
16:45 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
16:45 jnuche@deploy1002: Installing scap version "4.33.0" for 1 hosts
16:39 jdrewniak@deploy1002: Finished scap: Backport for [100%] English Wikipedia uses Vector 2022 skin (duration: 09m 27s)
16:31 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [100%] English Wikipedia uses Vector 2022 skin synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
16:29 jdrewniak@deploy1002: Started scap: Backport for [100%] English Wikipedia uses Vector 2022 skin
16:20 jdrewniak@deploy1002: Finished scap: Backport for [75%] English Wikipedia uses Vector 2022 skin (T326892) (duration: 09m 24s)
16:13 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [75%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
16:11 jdrewniak@deploy1002: Started scap: Backport for [75%] English Wikipedia uses Vector 2022 skin (T326892)
16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
15:58 jdrewniak@deploy1002: Finished scap: Backport for [50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892) (duration: 08m 52s)
15:51 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
15:49 jdrewniak@deploy1002: Started scap: Backport for [50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)
15:44 jdrewniak@deploy1002: Finished scap: Backport for [25%] English Wikipedia uses Vector 2022 skin (T326892) (duration: 09m 06s)
15:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1052.eqiad.wmnet with OS bullseye
15:37 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
15:37 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
15:36 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [25%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
15:35 jdrewniak@deploy1002: Started scap: Backport for [25%] English Wikipedia uses Vector 2022 skin (T326892)
15:31 urandom: re-enabling Cassandra hinted-handoff for codfw -- T327001
15:29 jdrewniak@deploy1002: Finished scap: Backport for [10%] English Wikipedia uses Vector 2022 skin (T326892) (duration: 11m 30s)
15:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
15:19 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [10%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
15:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
15:17 jdrewniak@deploy1002: Started scap: Backport for [10%] English Wikipedia uses Vector 2022 skin (T326892)
15:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990) (duration: 09m 11s)
15:13 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
15:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1052.eqiad.wmnet with OS bullseye
15:06 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
15:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990)
15:04 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert gallery changes in 1.40.0-wmf.18 (T326990) (duration: 13m 04s)
15:01 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
14:57 moritzm: uploaded python-jose 3.3.0+dfsg-4~wmf11u1 to apt.wikmedia.org (needed by python-social-auth/Bitu)
14:53 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for Revert gallery changes in 1.40.0-wmf.18 (T326990) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:51 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert gallery changes in 1.40.0-wmf.18 (T326990)
14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Breaking upgrade: mapdata" (T327151) (duration: 10m 33s)
14:37 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and wmde-fisch: Backport for Revert "Breaking upgrade: mapdata" (T327151) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Breaking upgrade: mapdata" (T327151)
14:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Write to cul_reason[_plaintext]_id everywhere (T233004) (duration: 19m 54s)
14:23 moritzm: installing mod-wsgi security updates
14:16 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and dreamyjazz: Backport for Write to cul_reason[_plaintext]_id everywhere (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:14 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Write to cul_reason[_plaintext]_id everywhere (T233004)
13:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
13:16 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
12:20 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
11:54 volans: upgraded cumin on cumin1001 to 4.2.0-1+deb11u1
11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
11:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
11:42 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
11:27 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
11:16 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
11:16 volans@cumin1001: START - Cookbook sre.network.cf
11:15 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
11:15 volans@cumin1001: START - Cookbook sre.network.cf
11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1050.eqiad.wmnet with OS bullseye
11:11 volans@cumin2002: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
11:11 volans@cumin2002: START - Cookbook sre.network.cf
11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
11:10 volans@cumin1001: START - Cookbook sre.network.cf
11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
11:10 volans@cumin1001: START - Cookbook sre.network.cf
11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 T326116', diff saved to https://phabricator.wikimedia.org/P43185 and previous config saved to /var/cache/conftool/dbconfig/20230118-110716-marostegui.json
10:59 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
10:59 volans@cumin1001: START - Cookbook sre.network.cf
10:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
10:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43184 and previous config saved to /var/cache/conftool/dbconfig/20230118-105106-marostegui.json
10:49 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
10:48 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
10:43 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1050.eqiad.wmnet with OS bullseye
10:21 zabe@deploy1002: Finished scap: Backport for Start reading from cuc_comment_id from a few wikis (T233004) (duration: 09m 17s)
10:14 zabe@deploy1002: zabe and zabe: Backport for Start reading from cuc_comment_id from a few wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
10:12 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
10:12 zabe@deploy1002: Started scap: Backport for Start reading from cuc_comment_id from a few wikis (T233004)
09:51 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:51 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:49 godog: start migration from webperf1004 to arclamp1001 - T319434
09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
09:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
09:33 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
09:24 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.19 refs T325582 (duration: 08m 20s)
09:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.19 refs T325582
08:54 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet
08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=ms-fe2010.codfw.wmnet
08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query,name=codfw
08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-swift,name=codfw
08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
08:30 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
07:56 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
02:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
02:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
01:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
01:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
01:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
00:28 zabe: enwiki: rename the "discretionary sanctions alert" tag to "contentious topics alert" # T327118
00:26 zabe@deploy1002: Finished scap: Backport for Add script to rename a change tag in wmf prod (T327118) (duration: 08m 29s)
00:20 zabe@deploy1002: zabe and zabe: Backport for Add script to rename a change tag in wmf prod (T327118) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
00:18 zabe@deploy1002: Started scap: Backport for Add script to rename a change tag in wmf prod (T327118)
00:08 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=180p.vp9.webm # T312153
00:07 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=120p.vp9.webm # T312153

2023-01-17

23:51 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "User:Amire80/frg" "Movement Multilingual Termbase" "Zabe" "per request T327149" # T327149
23:33 zabe@deploy1002: Finished scap: Backport for Start reading from cuc_comment_id on testwiki (T233004), Start reading from cuc_actor everywhere (T233004) (duration: 09m 58s)
23:25 zabe@deploy1002: zabe and zabe: Backport for Start reading from cuc_comment_id on testwiki (T233004), Start reading from cuc_actor everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
23:24 zabe@deploy1002: Started scap: Backport for Start reading from cuc_comment_id on testwiki (T233004), Start reading from cuc_actor everywhere (T233004)
23:19 zabe@deploy1002: Finished scap: Backport for Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004), Revert "Add read new support for cu_log comment ID columns" (T327219) (duration: 11m 46s)
23:09 zabe@deploy1002: zabe and dreamyjazz and zabe: Backport for Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004), Revert "Add read new support for cu_log comment ID columns" (T327219) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
23:07 zabe@deploy1002: Started scap: Backport for Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004), Revert "Add read new support for cu_log comment ID columns" (T327219)
23:06 zabe@deploy1002: Finished scap: Backport for Stop writing to cul_user and cul_user_text everywhere (T233004), Start writing to rev_comment_id everywhere (T299954) (duration: 10m 29s)
22:57 zabe@deploy1002: zabe and zabe: Backport for Stop writing to cul_user and cul_user_text everywhere (T233004), Start writing to rev_comment_id everywhere (T299954) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
22:55 zabe@deploy1002: Started scap: Backport for Stop writing to cul_user and cul_user_text everywhere (T233004), Start writing to rev_comment_id everywhere (T299954)
22:51 bblack: repooling codfw
22:48 ebernhardson@deploy1002: Finished scap: Backport for Make sticky header edit button default for all wikis (T324799) (duration: 10m 34s)
22:39 ebernhardson@deploy1002: ebernhardson and jdrewniak: Backport for Make sticky header edit button default for all wikis (T324799) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
22:38 ebernhardson@deploy1002: Started scap: Backport for Make sticky header edit button default for all wikis (T324799)
22:30 volans@cumin1001: conftool action : set/pooled=inactive; selector: name=non-existent1001
22:27 ebernhardson@deploy1002: Finished scap: Backport for Resolve deprecations and type changes in elastica 7.3.0, UpdateSuggesterIndex: Properly cleanup bad indices (duration: 09m 42s)
22:25 bblack: cp2031: restart ats-be
22:20 ebernhardson@deploy1002: ebernhardson and ebernhardson: Backport for Resolve deprecations and type changes in elastica 7.3.0, UpdateSuggesterIndex: Properly cleanup bad indices synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
22:18 ebernhardson@deploy1002: Started scap: Backport for Resolve deprecations and type changes in elastica 7.3.0, UpdateSuggesterIndex: Properly cleanup bad indices
22:14 ebernhardson@deploy1002: Finished scap: Backport for Show edit button in sticky header for desktop-improvement wikis (T324799) (duration: 10m 43s)
22:05 ebernhardson@deploy1002: ebernhardson and jdrewniak: Backport for Show edit button in sticky header for desktop-improvement wikis (T324799) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
22:04 ebernhardson@deploy1002: Started scap: Backport for Show edit button in sticky header for desktop-improvement wikis (T324799)
21:54 ebernhardson: Finished scap: Backport for Table of contents Collapse/Expand not working (T327064)
21:54 ebernhardson@deploy1002: Finished scap: Backport for Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis" (duration: 09m 20s)
21:52 zabe: zabe@mwmaint1002:~$ mwscript extensions/CheckUser/maintenance/populateCulComment.php --wiki testwiki
21:46 ebernhardson@deploy1002: ebernhardson and trainbranchbot: Backport for Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:44 ebernhardson@deploy1002: Started scap: Backport for Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"
21:42 ebernhardson@deploy1002: Sync cancelled.
21:35 ebernhardson@deploy1002: ebernhardson and dreamyjazz: Backport for Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:34 ebernhardson: scap also backporting Table of contents Collapse/Expand not working (T327064)
21:34 ebernhardson@deploy1002: Started scap: Backport for Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis (T233004)
21:29 ebernhardson@deploy1002: Finished scap: Backport for Enable Phonos on afwiktionary and arwiki (T324561) (duration: 12m 21s)
21:18 ebernhardson@deploy1002: ebernhardson and hmonroy: Backport for Enable Phonos on afwiktionary and arwiki (T324561) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
21:17 ebernhardson@deploy1002: Started scap: Backport for Enable Phonos on afwiktionary and arwiki (T324561)
21:00 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo pool` (had been left depooled from previous powercycle)
20:47 ryankemper: [WDQS] Depooled `wdqs1016`
20:25 herron: ran preferred-replica-election on kafka-logging codfw to clear replica imbalance
20:18 ryankemper: [WDQS] Restart blazegraph on `wdqs1016` to clear alert: `ryankemper@wdqs1016:~$ sudo systemctl restart wdqs-blazegraph`
20:06 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.19 refs T325582
20:04 eileen: config revision changed from 2e5cee3c to 7425df0b
19:50 ryankemper: T327175 Reprocessing last several hours of updates (`2023-01-17T12:00:00Z` -> `2023-01-17T17:30:00Z`) on codfw elasticsearch, running on `ryankemper@mwmaint2002` tmux session `reindex`
19:43 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
19:43 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
19:41 zabe@deploy1002: Finished scap: Backport for Revert "Revert "Enable visual enhancements on all talk namespaces"" (duration: 10m 25s)
19:32 zabe@deploy1002: zabe and zabe: Backport for Revert "Revert "Enable visual enhancements on all talk namespaces"" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
19:30 zabe@deploy1002: Started scap: Backport for Revert "Revert "Enable visual enhancements on all talk namespaces""
18:48 zabe@deploy1002: Finished scap: Backport for Revert "Enable visual enhancements on all talk namespaces" (duration: 09m 08s)
18:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
18:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
18:41 zabe@deploy1002: zabe and zabe: Backport for Revert "Enable visual enhancements on all talk namespaces" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
18:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
18:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
18:39 zabe@deploy1002: Started scap: Backport for Revert "Enable visual enhancements on all talk namespaces"
18:39 zabe@deploy1002: backport aborted: (duration: 00m 26s)
18:35 zabe@deploy1002: backport aborted: (duration: 19m 41s)
18:29 otto@deploy1002: Finished deploy [analytics/refinery@55f90ac]: Regular analytics weekly train [analytics/refinery@55f90ac] (duration: 04m 28s)
18:29 otto@deploy1002: Finished deploy [airflow-dags/analytics@8d0e919]: Regular analytics weekly train @8d0e919] (duration: 00m 15s)
18:29 otto@deploy1002: Started deploy [airflow-dags/analytics@8d0e919]: Regular analytics weekly train @8d0e919]
18:25 otto@deploy1002: Started deploy [analytics/refinery@55f90ac]: Regular analytics weekly train [analytics/refinery@55f90ac]
{{safesubst:SAL entry|1=18:25 zabe@deploy1002: zabe and matmarex and zabe: Backport for objectcache: Fix DI for MultiWriteBagOStuff sub caches (T327158), Use new DiscussionTools heading markup on enwiki (T314714), Add "Clear Affordances" to DiscussionTools beta feature on remaining wikis (T321955), Add "Page Frame" to DiscussionTools beta feature on partner wikis (T317907), [[}}
{{safesubst:SAL entry|1=18:23 zabe@deploy1002: Started scap: Backport for objectcache: Fix DI for MultiWriteBagOStuff sub caches (T327158), Use new DiscussionTools heading markup on enwiki (T314714), Add "Clear Affordances" to DiscussionTools beta feature on remaining wikis (T321955), Add "Page Frame" to DiscussionTools beta feature on partner wikis (T317907), [[gerrit:879103|}}
18:13 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
18:10 mutante: gerrit1002/gerrit2002: sudo rmdir /srv/gerrit/jvmlogs
18:07 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
18:07 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
18:05 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
18:01 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-wikikube-rw,name=codfw
17:58 jynus: restarted es5 codfw backup
17:54 bblack: authdns1001: restart confd
17:27 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=aqs,name=codfw
17:19 effie: pooling back codfw services
17:17 bblack: removing errant 2620:0:860:118: IPs from primary interfaces of hosts in B2
17:01 effie: restarting confd on deploy1002
16:59 effie: pooling back depooled mw servers in codfw
16:44 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
16:44 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
16:32 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1_amd64.changes: T325557
16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43179 and previous config saved to /var/cache/conftool/dbconfig/20230117-162100-ladsgroup.json
16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43178 and previous config saved to /var/cache/conftool/dbconfig/20230117-160555-ladsgroup.json
15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43177 and previous config saved to /var/cache/conftool/dbconfig/20230117-155050-ladsgroup.json
15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43175 and previous config saved to /var/cache/conftool/dbconfig/20230117-153545-ladsgroup.json
15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
15:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
14:56 urandom: truncating hints for Cassandra nodes in codfw row b -- T327001
14:52 urandom: disabling Cassandra hinted-handoff for codfw -- T327001
14:27 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
14:26 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
14:12 _joe_: try to restart cassandra-a on aqs2005
13:37 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=recommendation-api,name=codfw
13:35 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-query,name=codfw
13:35 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-swift,name=codfw
13:27 jynus: restarting manually replication on es2020, may require data check afterwards
13:26 _joe_: depooling all services in codfw
13:19 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool mobileapps in codfw: maintenance
13:15 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
13:14 oblivian@cumin1001: START - Cookbook sre.discovery.service-route depool mobileapps in codfw: maintenance
13:13 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check citoid: maintenance
13:13 oblivian@cumin1001: START - Cookbook sre.discovery.service-route check citoid: maintenance
13:08 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
13:01 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
13:01 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=.*
12:35 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
12:35 moritzm: installing ipython security updates
11:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1048.eqiad.wmnet with OS bullseye
11:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
11:16 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
11:08 volans: upgraded cumin on cumin2002 to 4.2.0-1+deb11u1
11:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1048.eqiad.wmnet with OS bullseye
10:16 godog: restart opensearch_2@production-elk7-eqiad.service on logstash102[34]
10:12 jnuche@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
10:11 jnuche@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.19 refs T325582 (duration: 42m 26s)
09:42 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9568478]: (no justification provided) (duration: 00m 12s)
09:42 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9568478]: (no justification provided)
09:28 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.19 refs T325582
09:26 jnuche@deploy1002: scap failed: PermissionError [Errno 13] Permission denied: '/home/jnuche/scap-image-build-and-push-log' (duration: 00m 50s)
09:26 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.19 refs T325582
08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
08:47 ladsgroup@deploy1002: Finished scap: Backport for Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004) (duration: 13m 50s)
08:35 ladsgroup@deploy1002: ladsgroup and dreamyjazz: Backport for Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
08:33 ladsgroup@deploy1002: Started scap: Backport for Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)
08:29 kartik@deploy1002: Finished scap: Backport for testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667) (duration: 20m 56s)
08:26 zabe: zabe@mwmaint1002:~$ mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=zhwiki --namespaceName='USER_TALK' # T327146
08:13 kartik@deploy1002: kartik and kartik: Backport for testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:08 kartik@deploy1002: Started scap: Backport for testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)
07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43168 and previous config saved to /var/cache/conftool/dbconfig/20230117-075222-ladsgroup.json
07:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43167 and previous config saved to /var/cache/conftool/dbconfig/20230117-073717-ladsgroup.json
07:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43166 and previous config saved to /var/cache/conftool/dbconfig/20230117-072212-ladsgroup.json
07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43165 and previous config saved to /var/cache/conftool/dbconfig/20230117-070707-ladsgroup.json
07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1173 T326134', diff saved to https://phabricator.wikimedia.org/P43164 and previous config saved to /var/cache/conftool/dbconfig/20230117-070532-ladsgroup.json
07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1131 to s6 primary and set section read-write T326134', diff saved to https://phabricator.wikimedia.org/P43163 and previous config saved to /var/cache/conftool/dbconfig/20230117-070102-ladsgroup.json
07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T326134', diff saved to https://phabricator.wikimedia.org/P43162 and previous config saved to /var/cache/conftool/dbconfig/20230117-070035-ladsgroup.json
07:00 Amir1: Starting s6 eqiad failover from db1173 to db1131 - T326134
06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T326134', diff saved to https://phabricator.wikimedia.org/P43160 and previous config saved to /var/cache/conftool/dbconfig/20230117-060710-ladsgroup.json
06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T326134
06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T326134

2023-01-16

17:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
17:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
17:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
17:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
17:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
16:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
16:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1044.eqiad.wmnet with OS bullseye
16:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
16:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
16:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1044.eqiad.wmnet with OS bullseye
16:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1042.eqiad.wmnet with OS bullseye
16:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
15:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
15:47 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1042.eqiad.wmnet with OS bullseye
13:35 XioNoX: disable one of 3 cr1-cr2 eqiad links - T304712
13:34 XioNoX: repool eqiad-eqord link - T304712
12:56 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
12:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
12:50 XioNoX: drain eqiad-eqord link - T304712
12:47 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
12:43 Amir1: power cycled db1198
12:36 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[5-9].eqiad.wmnet
12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102[012].eqiad.wmnet
12:34 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102.eqiad.wmnet
12:05 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
12:02 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
11:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:32 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
11:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
10:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
10:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:48 moritzm: installing libtasn1-6 security updates on Bullseye
10:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
08:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
08:46 elukey: powercycle an-worker1125 - soft lockup traces registered in the tty, host frozen
08:14 oblivian@deploy1002: Synchronized README: test null deployment for T327041 (duration: 07m 12s)
08:09 Emperor: stopped swift_rclone_sync on ms-be1069
07:48 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse20(0[6-9]|10).codfw.wmnet
07:44 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw23([12][0-9]|3[0-4]).codfw.wmnet
07:41 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw22(59|6[0-9]|70).codfw.wmnet
07:26 _joe_: restarting pybal on lvs2009
07:10 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=(mw.*|appservers|api)-ro,name=codfw
07:10 _joe_: depooling mediawiki in codfw
06:47 XioNoX: add 2001:67c:930::/48 to network:external in data.yaml
06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
06:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1198 maint', diff saved to https://phabricator.wikimedia.org/P43157 and previous config saved to /var/cache/conftool/dbconfig/20230116-062211-ladsgroup.json
02:25 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=parsoid-php
02:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,service=nginx
02:01 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver,service=nginx
01:51 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw2283.codfw.wmnet
01:35 Amir1: rolling restart of php-fpm across the fleet
01:30 thcipriani: 01:29:56 php-fpm-restart: 100% (in-flight: 0; ok: 184; fail: 112; left: 0)
01:29 thcipriani@deploy1002: Finished scap: Backport for LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788) (duration: 24m 47s)
01:15 thcipriani@deploy1002: thcipriani and func: Backport for LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
01:05 thcipriani@deploy1002: Started scap: Backport for LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)

2023-01-14

09:46 godog: issue 'request system reboot member 2' - T327001
09:20 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
09:19 Emperor: depool thanos-fe2002 T327001
09:19 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=ms-fe2010.codfw.wmnet
09:19 Emperor: depool ms-fe2010 T327001

2023-01-13

23:39 mutante: people2002 - systemctl reset-failed after removing auto_restart_rsync timers
22:26 mutante: mirror1001 - systemctl start update-ubuntu-mirror (sometimes sync fails)
20:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1011']
20:58 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
20:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1011']
20:49 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
20:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['druid1011']
20:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
20:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1010']
20:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
20:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1010']
20:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
20:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
20:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1009']
20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
20:04 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict2001.codfw.wmnet
19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bullseye
19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict2001.codfw.wmnet on all recursors
19:54 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache aphlict2001.codfw.wmnet on all recursors
19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:54 dzahn@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
19:52 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:49 dzahn@cumin2002: START - Cookbook sre.dns.netbox
19:49 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host aphlict2001.codfw.wmnet
19:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1001.eqiad.wmnet with OS bullseye
19:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
19:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
19:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
19:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bullseye
18:25 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Green Giant" "Cromium" # T298707
17:34 thcipriani@deploy1002: Finished scap: Backport for TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125) (duration: 13m 25s)
17:22 thcipriani@deploy1002: thcipriani and abi: Backport for TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
17:20 thcipriani@deploy1002: Started scap: Backport for TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)
15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1004.eqiad.wmnet with OS bullseye
15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:24 jynus: restarted again update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
15:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastion - jmm@cumin2002"
15:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1001.eqiad.wmnet with OS bullseye
15:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastion - jmm@cumin2002"
15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1003.eqiad.wmnet with OS bullseye
15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
15:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
15:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1004.eqiad.wmnet with OS bullseye
14:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
14:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
14:49 volans: uploaded cumin_4.2.0 to apt.wikimedia.org bullseye-wikimedia
14:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1003.eqiad.wmnet with OS bullseye
12:48 moritzm: installing bast6002 T324974
12:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
12:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
11:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastions - jmm@cumin2002"
11:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastions - jmm@cumin2002"
10:53 moritzm: installing bast5003 T324974
10:49 jynus: restarting update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
09:41 moritzm: installing bast4004 T324974
09:06 moritzm: installing bast3006 T324974
02:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
02:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
02:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
02:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
01:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1002']
01:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
01:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1001']
01:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1002']
01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1001']
01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
01:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1004']
01:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1003']
01:02 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1004']
00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1003']
00:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
00:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
00:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
00:15 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
00:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED

2023-01-12

23:53 zabe: start running cuc_comment_id population script on rest of sections in screens with --sleep 2 # T233004
23:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
23:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED
23:13 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3 (duration: 02m 31s)
23:10 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3
23:08 sbassett: Deployed (temporary) security mitigations for T326691
22:45 mutante: people2002 - apt-get remove --purge rsync
22:08 zabe: start of "foreachwikiindblist s3.dblist extensions/CheckUser/maintenance/populateCucComment.php" in a screen in mwmaint1002 # T233004
22:07 thcipriani: end UTC late backport
22:06 thcipriani@deploy1002: Finished scap: Backport for cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), cirrus: Disable incoming link counting (T317023) (duration: 09m 23s)
21:59 krinkle@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 08s)
21:59 krinkle@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
21:59 Krinkle: krinkle@deploy1002$ `scap install-world -v --limit-hosts` for webperf1003.eqiad and webperf2003.codfw, ref T326668
21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
21:58 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), cirrus: Disable incoming link counting (T317023) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
21:57 thcipriani@deploy1002: Started scap: Backport for cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), cirrus: Disable incoming link counting (T317023)
21:56 zabe: run populateCucComment.php on testwiki # T233004
21:48 thcipriani@deploy1002: Finished scap: Backport for nlwiki: Add block right to checkuser group (T326355) (duration: 09m 04s)
21:41 thcipriani@deploy1002: thcipriani and stang: Backport for nlwiki: Add block right to checkuser group (T326355) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:39 thcipriani@deploy1002: Started scap: Backport for nlwiki: Add block right to checkuser group (T326355)
21:37 thcipriani@deploy1002: Finished scap: Backport for looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757) (duration: 09m 10s)
21:30 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:28 thcipriani@deploy1002: Started scap: Backport for looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)
21:27 thcipriani@deploy1002: Finished scap: Backport for etwikiquote: Switch logo variant back (T313698) (duration: 09m 25s)
21:21 ejegg: restarted fundraising scheduled jobs
21:19 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
21:19 thcipriani@deploy1002: thcipriani and stang: Backport for etwikiquote: Switch logo variant back (T313698) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
21:17 thcipriani@deploy1002: Started scap: Backport for etwikiquote: Switch logo variant back (T313698)
21:16 thcipriani@deploy1002: Finished scap: Backport for Remove Beta Feature for Realtime Preview and enable on plwiki (T323033) (duration: 10m 43s)
21:07 thcipriani@deploy1002: thcipriani and samwilson: Backport for Remove Beta Feature for Realtime Preview and enable on plwiki (T323033) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
21:05 thcipriani@deploy1002: Started scap: Backport for Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)
20:43 ejegg: rolled back CiviCRM to 9afd2789
20:31 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
20:29 ejegg: disabled fundraising scheduled jobs for civi deploy
20:08 brett: Setting thread_pool_max for varnish-frontend to 12000
19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 T326116', diff saved to https://phabricator.wikimedia.org/P43148 and previous config saved to /var/cache/conftool/dbconfig/20230112-195922-marostegui.json
19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43147 and previous config saved to /var/cache/conftool/dbconfig/20230112-195651-marostegui.json
19:55 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 (mariadb 11) to dbctl, depooled T326116', diff saved to https://phabricator.wikimedia.org/P43146 and previous config saved to /var/cache/conftool/dbconfig/20230112-195514-marostegui.json
19:11 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.18 refs T325581
18:36 mutante: stat1008 - systemctl reset-failed - clears Icinga alerts from failed things of the past
18:35 mutante: stat1007 - systemctl reset-failed - clears Icinga alerts
18:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
18:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
17:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
17:45 mutante: powercycling mc2040 via mgmt ocnsole
17:34 ejegg: civicrm rolled back from 7ecb5038 to 9afd2789
17:08 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
17:08 btullis@cumin1001: Added views for new wiki: aswikiquote T321294
17:05 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
16:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
16:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
16:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
16:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
16:43 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
16:34 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
16:31 zabe@deploy1002: Finished scap: Backport for Stop writing to cul_user and cul_user_text on a few wikis (T233004), Start writing to rev_comment_id on group1 wikis (T299954) (duration: 09m 49s)
16:23 zabe@deploy1002: zabe and zabe: Backport for Stop writing to cul_user and cul_user_text on a few wikis (T233004), Start writing to rev_comment_id on group1 wikis (T299954) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
16:21 zabe@deploy1002: Started scap: Backport for Stop writing to cul_user and cul_user_text on a few wikis (T233004), Start writing to rev_comment_id on group1 wikis (T299954)
16:14 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
16:08 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
16:08 btullis@cumin1001: Added views for new wiki: bjnwiktionary T312214
15:47 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
15:46 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
15:44 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
15:36 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
15:36 btullis@cumin1001: Added views for new wiki: shnwikibooks T321256
15:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
15:28 effie: Planet import in codfw (on maps2009) started at 15:26 UTC - T314472
15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
15:11 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
15:05 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2002.codfw.wmnet
14:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
14:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43138 and previous config saved to /var/cache/conftool/dbconfig/20230112-145441-marostegui.json
14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe2002.codfw.wmnet
14:50 moritzm: installing postgresql-11 security updates on puppetdb1002
14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1002.eqiad.wmnet
14:42 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
14:42 btullis@cumin1001: Added views for new wiki: guwwikiquote T321288
14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43137 and previous config saved to /var/cache/conftool/dbconfig/20230112-143934-marostegui.json
14:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1002.eqiad.wmnet
14:37 moritzm: installing sqlite3 security updates on buster
14:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1040.eqiad.wmnet with OS bullseye
14:34 taavi: UTC afternoon backports done
14:28 taavi@deploy1002: Finished scap: Backport for Track callers of parseRevisionParsoidHtml. (duration: 09m 34s)
14:26 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43136 and previous config saved to /var/cache/conftool/dbconfig/20230112-142428-marostegui.json
14:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
14:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
14:20 taavi@deploy1002: taavi and matmarex: Backport for Track callers of parseRevisionParsoidHtml. synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
14:18 taavi@deploy1002: Started scap: Backport for Track callers of parseRevisionParsoidHtml.
14:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
14:17 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
14:16 taavi@deploy1002: Finished scap: Backport for Allow administrators to revoke autopatroller rights on sh.WP (T325938) (duration: 13m 30s)
14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43135 and previous config saved to /var/cache/conftool/dbconfig/20230112-140921-marostegui.json
14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43134 and previous config saved to /var/cache/conftool/dbconfig/20230112-140659-marostegui.json
14:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1040.eqiad.wmnet with OS bullseye
14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43133 and previous config saved to /var/cache/conftool/dbconfig/20230112-140649-marostegui.json
14:05 taavi@deploy1002: taavi and aleksandar: Backport for Allow administrators to revoke autopatroller rights on sh.WP (T325938) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
14:03 taavi@deploy1002: Started scap: Backport for Allow administrators to revoke autopatroller rights on sh.WP (T325938)
13:53 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43132 and previous config saved to /var/cache/conftool/dbconfig/20230112-135143-marostegui.json
13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43131 and previous config saved to /var/cache/conftool/dbconfig/20230112-133636-marostegui.json
13:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
13:29 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
13:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
13:28 ladsgroup@deploy1002: Finished scap: Backport for Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY. (duration: 21m 44s)
13:26 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
13:26 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43130 and previous config saved to /var/cache/conftool/dbconfig/20230112-132130-marostegui.json
13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43129 and previous config saved to /var/cache/conftool/dbconfig/20230112-131908-marostegui.json
13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
13:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43128 and previous config saved to /var/cache/conftool/dbconfig/20230112-131847-marostegui.json
13:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
13:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
13:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
13:08 ladsgroup@deploy1002: ladsgroup and daniel: Backport for Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY. synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:06 ladsgroup@deploy1002: Started scap: Backport for Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY.
13:05 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
13:05 btullis@cumin1001: Added views for new wiki: gorwiktionary T326138
13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43127 and previous config saved to /var/cache/conftool/dbconfig/20230112-130341-marostegui.json
12:58 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
12:56 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43125 and previous config saved to /var/cache/conftool/dbconfig/20230112-124834-marostegui.json
12:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
12:41 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43123 and previous config saved to /var/cache/conftool/dbconfig/20230112-123328-marostegui.json
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43122 and previous config saved to /var/cache/conftool/dbconfig/20230112-123106-marostegui.json
12:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43121 and previous config saved to /var/cache/conftool/dbconfig/20230112-123045-marostegui.json
12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43120 and previous config saved to /var/cache/conftool/dbconfig/20230112-121538-marostegui.json
12:13 XioNoX: repool esams
12:10 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:09 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:09 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:08 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:08 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43119 and previous config saved to /var/cache/conftool/dbconfig/20230112-120032-marostegui.json
11:54 XioNoX: re-seating cr2-esams fpc0 linecard - T318783
11:52 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43116 and previous config saved to /var/cache/conftool/dbconfig/20230112-114524-marostegui.json
11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43115 and previous config saved to /var/cache/conftool/dbconfig/20230112-114302-marostegui.json
11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43114 and previous config saved to /var/cache/conftool/dbconfig/20230112-114212-marostegui.json
11:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
11:39 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
11:37 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
11:29 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43113 and previous config saved to /var/cache/conftool/dbconfig/20230112-112705-marostegui.json
11:24 urbanecm@deploy1002: Finished scap: Backport for throttle: Add new rule for cswiki course (T326792) (duration: 07m 47s)
11:17 urbanecm@deploy1002: Started scap: Backport for throttle: Add new rule for cswiki course (T326792)
11:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
11:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
11:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3303
11:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
11:12 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3302
11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43112 and previous config saved to /var/cache/conftool/dbconfig/20230112-111159-marostegui.json
11:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
11:11 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Defender" "Elton" # T298707
10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43111 and previous config saved to /var/cache/conftool/dbconfig/20230112-105652-marostegui.json
10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43110 and previous config saved to /var/cache/conftool/dbconfig/20230112-105430-marostegui.json
10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43109 and previous config saved to /var/cache/conftool/dbconfig/20230112-105358-marostegui.json
10:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 36 hosts
10:49 ayounsi@cumin1001: START - Cookbook sre.hosts.remove-downtime for 36 hosts
10:41 hashar@deploy1002: Finished deploy [integration/docroot@577d68a]: zuul: Link to report_url if available (duration: 00m 14s)
10:41 hashar@deploy1002: Started deploy [integration/docroot@577d68a]: zuul: Link to report_url if available
10:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
10:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8932
10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8932
10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43108 and previous config saved to /var/cache/conftool/dbconfig/20230112-103852-marostegui.json
10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
10:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
10:24 XioNoX: rollback redirect ns2 to authdns1001 - T316532
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43107 and previous config saved to /var/cache/conftool/dbconfig/20230112-102345-marostegui.json
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43106 and previous config saved to /var/cache/conftool/dbconfig/20230112-100839-marostegui.json
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43105 and previous config saved to /var/cache/conftool/dbconfig/20230112-100616-marostegui.json
10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43104 and previous config saved to /var/cache/conftool/dbconfig/20230112-100456-marostegui.json
10:01 XioNoX: reboot asw2-esams for upgrade - T316532
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3003.esams.wmnet
09:58 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
09:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping3003.esams.wmnet on all recursors
09:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping3003.esams.wmnet on all recursors
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
09:53 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
09:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping3003.esams.wmnet
09:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43103 and previous config saved to /var/cache/conftool/dbconfig/20230112-094950-marostegui.json
09:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2003.codfw.wmnet
09:47 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
09:47 btullis@cumin1001: Added views for new wiki: pcmwiki T310879
09:46 XioNoX: redirect ns2 to authdns1001 - T316532
09:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping2003.codfw.wmnet on all recursors
09:43 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping2003.codfw.wmnet on all recursors
09:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
09:41 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
09:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:39 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping2003.codfw.wmnet
09:37 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43102 and previous config saved to /var/cache/conftool/dbconfig/20230112-093443-marostegui.json
09:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
09:31 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
09:25 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc1039.eqiad.wmnet
09:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
09:24 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mc1039.eqiad.wmnet
09:22 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43101 and previous config saved to /var/cache/conftool/dbconfig/20230112-091937-marostegui.json
09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43100 and previous config saved to /var/cache/conftool/dbconfig/20230112-091716-marostegui.json
09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43099 and previous config saved to /var/cache/conftool/dbconfig/20230112-091654-marostegui.json
09:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43098 and previous config saved to /var/cache/conftool/dbconfig/20230112-090148-marostegui.json
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1003.eqiad.wmnet
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping1003.eqiad.wmnet on all recursors
08:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping1003.eqiad.wmnet on all recursors
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
08:55 phedenskog@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 22s)
08:54 phedenskog@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
08:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
08:54 phedenskog@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 17s)
08:53 phedenskog@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
08:50 XioNoX: depool esams for network maintenance - T316532
08:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping1003.eqiad.wmnet
08:49 zabe: deployed updated patch for T311337
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43097 and previous config saved to /var/cache/conftool/dbconfig/20230112-084641-marostegui.json
08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast5003.wikimedia.org
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43096 and previous config saved to /var/cache/conftool/dbconfig/20230112-083135-marostegui.json
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43095 and previous config saved to /var/cache/conftool/dbconfig/20230112-082813-marostegui.json
08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43094 and previous config saved to /var/cache/conftool/dbconfig/20230112-082752-marostegui.json
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5003.wikimedia.org on all recursors
08:17 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast5003.wikimedia.org on all recursors
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
08:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43093 and previous config saved to /var/cache/conftool/dbconfig/20230112-081245-marostegui.json
07:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast5003.wikimedia.org
07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43092 and previous config saved to /var/cache/conftool/dbconfig/20230112-075739-marostegui.json
07:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584
07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43091 and previous config saved to /var/cache/conftool/dbconfig/20230112-074232-marostegui.json
07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9584
07:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 37002
07:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 37002
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43090 and previous config saved to /var/cache/conftool/dbconfig/20230112-074010-marostegui.json
07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
07:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43089 and previous config saved to /var/cache/conftool/dbconfig/20230112-073949-marostegui.json
07:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
07:38 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43088 and previous config saved to /var/cache/conftool/dbconfig/20230112-072443-marostegui.json
07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43087 and previous config saved to /var/cache/conftool/dbconfig/20230112-070936-marostegui.json
06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43086 and previous config saved to /var/cache/conftool/dbconfig/20230112-065430-marostegui.json
06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43085 and previous config saved to /var/cache/conftool/dbconfig/20230112-065208-marostegui.json
06:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43084 and previous config saved to /var/cache/conftool/dbconfig/20230112-065147-marostegui.json
06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43083 and previous config saved to /var/cache/conftool/dbconfig/20230112-063640-marostegui.json
06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43082 and previous config saved to /var/cache/conftool/dbconfig/20230112-062134-marostegui.json
06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43081 and previous config saved to /var/cache/conftool/dbconfig/20230112-060627-marostegui.json
06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43080 and previous config saved to /var/cache/conftool/dbconfig/20230112-060404-marostegui.json
06:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43079 and previous config saved to /var/cache/conftool/dbconfig/20230112-060343-marostegui.json
05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43078 and previous config saved to /var/cache/conftool/dbconfig/20230112-054837-marostegui.json
05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43077 and previous config saved to /var/cache/conftool/dbconfig/20230112-053330-marostegui.json
05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43076 and previous config saved to /var/cache/conftool/dbconfig/20230112-051823-marostegui.json
05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43075 and previous config saved to /var/cache/conftool/dbconfig/20230112-051601-marostegui.json
05:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
05:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43074 and previous config saved to /var/cache/conftool/dbconfig/20230112-051539-marostegui.json
05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43073 and previous config saved to /var/cache/conftool/dbconfig/20230112-050033-marostegui.json
04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43072 and previous config saved to /var/cache/conftool/dbconfig/20230112-044526-marostegui.json
04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43071 and previous config saved to /var/cache/conftool/dbconfig/20230112-043020-marostegui.json
04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43070 and previous config saved to /var/cache/conftool/dbconfig/20230112-042757-marostegui.json
04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43069 and previous config saved to /var/cache/conftool/dbconfig/20230112-042741-marostegui.json
04:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43068 and previous config saved to /var/cache/conftool/dbconfig/20230112-041234-marostegui.json
03:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43067 and previous config saved to /var/cache/conftool/dbconfig/20230112-035727-marostegui.json
03:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43066 and previous config saved to /var/cache/conftool/dbconfig/20230112-034221-marostegui.json
03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43065 and previous config saved to /var/cache/conftool/dbconfig/20230112-033958-marostegui.json
03:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
03:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43064 and previous config saved to /var/cache/conftool/dbconfig/20230112-033937-marostegui.json
03:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43063 and previous config saved to /var/cache/conftool/dbconfig/20230112-032430-marostegui.json
03:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43062 and previous config saved to /var/cache/conftool/dbconfig/20230112-030924-marostegui.json
02:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43061 and previous config saved to /var/cache/conftool/dbconfig/20230112-025417-marostegui.json
02:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43060 and previous config saved to /var/cache/conftool/dbconfig/20230112-025153-marostegui.json
02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
02:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
02:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
02:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43059 and previous config saved to /var/cache/conftool/dbconfig/20230112-020046-marostegui.json
01:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43058 and previous config saved to /var/cache/conftool/dbconfig/20230112-014539-marostegui.json
01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43057 and previous config saved to /var/cache/conftool/dbconfig/20230112-013033-marostegui.json
01:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43056 and previous config saved to /var/cache/conftool/dbconfig/20230112-011526-marostegui.json
01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43055 and previous config saved to /var/cache/conftool/dbconfig/20230112-011302-marostegui.json
01:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
01:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43054 and previous config saved to /var/cache/conftool/dbconfig/20230112-011241-marostegui.json
00:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43053 and previous config saved to /var/cache/conftool/dbconfig/20230112-005734-marostegui.json
00:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43052 and previous config saved to /var/cache/conftool/dbconfig/20230112-004228-marostegui.json
00:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43051 and previous config saved to /var/cache/conftool/dbconfig/20230112-002721-marostegui.json
00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43050 and previous config saved to /var/cache/conftool/dbconfig/20230112-002457-marostegui.json
00:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43049 and previous config saved to /var/cache/conftool/dbconfig/20230112-002436-marostegui.json
00:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43048 and previous config saved to /var/cache/conftool/dbconfig/20230112-000929-marostegui.json

2023-01-11

23:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43047 and previous config saved to /var/cache/conftool/dbconfig/20230111-235423-marostegui.json
23:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43045 and previous config saved to /var/cache/conftool/dbconfig/20230111-233916-marostegui.json
23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43044 and previous config saved to /var/cache/conftool/dbconfig/20230111-233652-marostegui.json
23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43043 and previous config saved to /var/cache/conftool/dbconfig/20230111-233616-marostegui.json
23:22 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.18 refs T325581 (duration: 06m 57s)
23:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43042 and previous config saved to /var/cache/conftool/dbconfig/20230111-232109-marostegui.json
23:15 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.18 refs T325581
23:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43041 and previous config saved to /var/cache/conftool/dbconfig/20230111-230603-marostegui.json
22:51 zabe@deploy1002: Finished scap: Backport for Start reading from cuc_actor on group0 and group1 wikis (T233004), Start writing to rev_comment_id on group0 wikis (T299954), Stop writing to cul_user and cul_user_text on testwiki (T233004) (duration: 09m 28s)
22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43040 and previous config saved to /var/cache/conftool/dbconfig/20230111-225056-marostegui.json
22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43039 and previous config saved to /var/cache/conftool/dbconfig/20230111-224832-marostegui.json
22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43038 and previous config saved to /var/cache/conftool/dbconfig/20230111-224810-marostegui.json
22:44 zabe@deploy1002: zabe and zabe: Backport for Start reading from cuc_actor on group0 and group1 wikis (T233004), Start writing to rev_comment_id on group0 wikis (T299954), Stop writing to cul_user and cul_user_text on testwiki (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
22:42 zabe@deploy1002: Started scap: Backport for Start reading from cuc_actor on group0 and group1 wikis (T233004), Start writing to rev_comment_id on group0 wikis (T299954), Stop writing to cul_user and cul_user_text on testwiki (T233004)
22:40 effie: upload memkeys_20181031-2~bullseye0_ on bullseye-wikimedia
22:39 kindrobot: close UTC late backport window
{{safesubst:SAL entry|1=22:38 kindrobot@deploy1002: Finished scap: Backport for Fix exception in `<gallery mode="slideshow">` with missing images, Fix phan error when Excimer is enabled, Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T30106}}
22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43037 and previous config saved to /var/cache/conftool/dbconfig/20230111-223304-marostegui.json
{{safesubst:SAL entry|1=22:21 kindrobot@deploy1002: kindrobot and matmarex: Backport for Fix exception in `<gallery mode="slideshow">` with missing images, Fix phan error when Excimer is enabled, Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view}}
22:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43036 and previous config saved to /var/cache/conftool/dbconfig/20230111-221757-marostegui.json
22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43035 and previous config saved to /var/cache/conftool/dbconfig/20230111-220251-marostegui.json
22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43034 and previous config saved to /var/cache/conftool/dbconfig/20230111-220026-marostegui.json
22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43033 and previous config saved to /var/cache/conftool/dbconfig/20230111-220005-marostegui.json
{{safesubst:SAL entry|1=21:58 kindrobot@deploy1002: Started scap: Backport for Fix exception in `<gallery mode="slideshow">` with missing images, Fix phan error when Excimer is enabled, Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063}}
21:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43031 and previous config saved to /var/cache/conftool/dbconfig/20230111-214458-marostegui.json
21:34 kindrobot@deploy1002: Finished scap: Backport for Fix mustache template rendering when TOC is rerendered after an edit (T326682), Enable page tools on beta cluster (duration: 10m 17s)
21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43030 and previous config saved to /var/cache/conftool/dbconfig/20230111-212952-marostegui.json
21:25 kindrobot@deploy1002: kindrobot and jdrewniak and jdlrobson: Backport for Fix mustache template rendering when TOC is rerendered after an edit (T326682), Enable page tools on beta cluster synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
21:23 kindrobot@deploy1002: Started scap: Backport for Fix mustache template rendering when TOC is rerendered after an edit (T326682), Enable page tools on beta cluster
21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43029 and previous config saved to /var/cache/conftool/dbconfig/20230111-211445-marostegui.json
21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43028 and previous config saved to /var/cache/conftool/dbconfig/20230111-211222-marostegui.json
21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43027 and previous config saved to /var/cache/conftool/dbconfig/20230111-211200-marostegui.json
21:06 kindrobot: start UTC late backport window
20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43025 and previous config saved to /var/cache/conftool/dbconfig/20230111-205654-marostegui.json
20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43024 and previous config saved to /var/cache/conftool/dbconfig/20230111-204147-marostegui.json
20:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43023 and previous config saved to /var/cache/conftool/dbconfig/20230111-203141-root.json
20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43022 and previous config saved to /var/cache/conftool/dbconfig/20230111-202641-marostegui.json
20:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43021 and previous config saved to /var/cache/conftool/dbconfig/20230111-202417-marostegui.json
20:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
20:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
20:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43020 and previous config saved to /var/cache/conftool/dbconfig/20230111-202345-marostegui.json
20:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43019 and previous config saved to /var/cache/conftool/dbconfig/20230111-201636-root.json
20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43018 and previous config saved to /var/cache/conftool/dbconfig/20230111-200838-marostegui.json
20:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43017 and previous config saved to /var/cache/conftool/dbconfig/20230111-200131-root.json
19:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43016 and previous config saved to /var/cache/conftool/dbconfig/20230111-195332-marostegui.json
19:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43015 and previous config saved to /var/cache/conftool/dbconfig/20230111-194626-root.json
19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43014 and previous config saved to /var/cache/conftool/dbconfig/20230111-193825-marostegui.json
19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43013 and previous config saved to /var/cache/conftool/dbconfig/20230111-193601-marostegui.json
19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43012 and previous config saved to /var/cache/conftool/dbconfig/20230111-193506-marostegui.json
19:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43011 and previous config saved to /var/cache/conftool/dbconfig/20230111-193121-root.json
19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43010 and previous config saved to /var/cache/conftool/dbconfig/20230111-192000-marostegui.json
19:19 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
19:19 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43009 and previous config saved to /var/cache/conftool/dbconfig/20230111-191616-root.json
19:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43008 and previous config saved to /var/cache/conftool/dbconfig/20230111-190453-marostegui.json
19:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43007 and previous config saved to /var/cache/conftool/dbconfig/20230111-190111-root.json
18:57 marostegui: dbmaint deploy schema change with replication on s3 eqiad T321391
18:52 brett: Removing legacy vips from dns servers - T239993
18:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43006 and previous config saved to /var/cache/conftool/dbconfig/20230111-184946-marostegui.json
18:47 marostegui: dbmaint deploy schema change with replication on s2 eqiad T321391
18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43005 and previous config saved to /var/cache/conftool/dbconfig/20230111-184723-marostegui.json
18:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
18:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P43004 and previous config saved to /var/cache/conftool/dbconfig/20230111-184701-marostegui.json
18:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43003 and previous config saved to /var/cache/conftool/dbconfig/20230111-184051-root.json
18:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level (duration: 02m 33s)
18:33 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level
18:33 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
18:32 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43002 and previous config saved to /var/cache/conftool/dbconfig/20230111-183155-marostegui.json
18:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
18:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:28 bblack: repool eqsin edge DC
18:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43001 and previous config saved to /var/cache/conftool/dbconfig/20230111-182546-root.json
18:22 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
18:22 btullis@cumin1001: Added views for new wiki: blkwiki T310872
18:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43000 and previous config saved to /var/cache/conftool/dbconfig/20230111-181648-marostegui.json
18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42999 and previous config saved to /var/cache/conftool/dbconfig/20230111-181041-root.json
18:09 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
18:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
18:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
18:07 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P42998 and previous config saved to /var/cache/conftool/dbconfig/20230111-180142-marostegui.json
18:01 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
17:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
17:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P42997 and previous config saved to /var/cache/conftool/dbconfig/20230111-175919-marostegui.json
17:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
17:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42996 and previous config saved to /var/cache/conftool/dbconfig/20230111-175857-marostegui.json
17:58 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
17:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
17:55 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42995 and previous config saved to /var/cache/conftool/dbconfig/20230111-175536-root.json
17:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
17:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42994 and previous config saved to /var/cache/conftool/dbconfig/20230111-174351-marostegui.json
17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42993 and previous config saved to /var/cache/conftool/dbconfig/20230111-174031-root.json
17:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
17:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
17:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
17:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42992 and previous config saved to /var/cache/conftool/dbconfig/20230111-172844-marostegui.json
17:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42991 and previous config saved to /var/cache/conftool/dbconfig/20230111-172526-root.json
17:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
17:20 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
17:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
17:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
17:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42989 and previous config saved to /var/cache/conftool/dbconfig/20230111-171338-marostegui.json
17:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42988 and previous config saved to /var/cache/conftool/dbconfig/20230111-171114-marostegui.json
17:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 1%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42987 and previous config saved to /var/cache/conftool/dbconfig/20230111-171021-root.json
17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
17:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
17:04 marostegui: dbmaint deploy schema change with replication on s7 eqiad T321391
17:03 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
17:03 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:38 marostegui: dbmaint deploy schema change with replication on s5 eqiad T321391
16:31 marostegui: dbmaint deploy schema change with replication on s4 eqiad T321391
16:25 marostegui: dbmaint deploy schema change with replication on s8 eqiad T321391
16:22 marostegui: dbmaint deploy schema change with replication on s6 eqiad T321391
16:06 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:06 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
16:05 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
16:03 volans@cumin1001: START - Cookbook sre.dns.netbox
16:01 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host mc1038.eqiad.wmnet with OS bullseye
16:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:53 zabe@deploy1002: Finished scap: T233004 (duration: 07m 54s)
15:45 zabe@deploy1002: Started scap: T233004
15:38 zabe@deploy1002: backport aborted: (duration: 04m 25s)
15:38 zabe@deploy1002: sync-world aborted: Backport for Start reading from cul_actor everywhere (T233004) (duration: 04m 00s)
15:36 zabe@deploy1002: zabe and zabe: Backport for Start reading from cul_actor everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
15:34 zabe@deploy1002: Started scap: Backport for Start reading from cul_actor everywhere (T233004)
15:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:21 marostegui: Stop mariadb on db1106 to reclone db1206 (there will be lag on s1 on wikireplicas) T326669
15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P42982 and previous config saved to /var/cache/conftool/dbconfig/20230111-151712-marostegui.json
14:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
14:47 Lucas_WMDE: UTC afternoon backport+config window done
14:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1005.eqiad.wmnet with OS bullseye
14:46 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
14:46 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.18/extensions/Wikibase/repo/tests/jest/wikibase.vector.searchClient.spec.js: Backport: Add missing parentheses to vector search match text (T326633) (2/2) (duration: 06m 46s)
14:42 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
14:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.18/extensions/Wikibase/repo/resources/wikibase.vector.searchClient.js: Backport: Add missing parentheses to vector search match text (T326633) (1/2) (duration: 07m 09s)
14:28 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix test constructing HTMLFormField without parent (T326621) (duration: 08m 38s)
14:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
14:22 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
14:21 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for Fix test constructing HTMLFormField without parent (T326621) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:19 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix test constructing HTMLFormField without parent (T326621)
14:14 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
14:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
14:10 moritzm: installing postgresql 11 security updates on maps/eqiad
14:06 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bullseye
14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1004.eqiad.wmnet with OS bullseye
14:02 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
14:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
13:55 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
13:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37002
13:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37002
13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3302
13:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9584
13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9584
13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35753
13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35753
13:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
13:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast6002.wikimedia.org
13:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
13:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
13:12 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast6002.wikimedia.org on all recursors
13:11 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast6002.wikimedia.org on all recursors
13:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
13:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
13:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bullseye
13:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc1038.eqiad.wmnet with OS bullseye
13:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast6002.wikimedia.org
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast4004.wikimedia.org
12:42 moritzm: installing postgresql 11 security updates on maps/codfw
12:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8849
12:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8849
12:35 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast4004.wikimedia.org on all recursors
12:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast4004.wikimedia.org on all recursors
12:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
12:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
12:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56630
12:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56630
12:24 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
12:24 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
12:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:18 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast4004.wikimedia.org
12:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
12:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
12:10 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bullseye
12:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1003.eqiad.wmnet with OS bullseye
12:10 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
12:08 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
11:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
11:51 claime: repooled mw1486 in api_appserver eqiad after hardware investigation - T326425
11:50 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
11:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1486.eqiad.wmnet
11:50 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1486.eqiad.wmnet
11:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast3006.wikimedia.org
11:47 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1486.eqiad.wmnet
11:38 cgoubert@cumin1001: conftool action : set/pooled=yes:weight=10; selector: cluster=aux-k8s,service=kubesvc
11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
11:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
11:30 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast3006.wikimedia.org on all recursors
11:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast3006.wikimedia.org on all recursors
11:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
11:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
11:22 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
11:22 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
11:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bullseye
11:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:19 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast3006.wikimedia.org
11:16 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
11:15 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
11:15 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.reboot-workers (exit_code=99) for Druid test cluster: Reboot Druid nodes
11:12 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1003.eqiad.wmnet with OS bullseye
10:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
10:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
10:31 zabe@deploy1002: Finished scap: Backport for Simplify expensive check (T326690), Start reading from cuc_actor on test wikis (T233004) (duration: 09m 34s)
10:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
10:24 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
10:23 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid test cluster: Reboot Druid nodes
10:23 zabe@deploy1002: zabe and zabe: Backport for Simplify expensive check (T326690), Start reading from cuc_actor on test wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
10:21 zabe@deploy1002: Started scap: Backport for Simplify expensive check (T326690), Start reading from cuc_actor on test wikis (T233004)
10:18 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
10:16 moritzm: installing postgresql-11 security updates
10:02 XioNoX: asw1-eqsin> request system reboot all-members - T316532
09:49 moritzm: installing python3.7 security updates
08:31 kartik@deploy1002: Finished scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) (duration: 11m 45s)
08:21 kartik@deploy1002: kartik and kartik: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:20 kartik@deploy1002: Started scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278)
05:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
05:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
05:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
05:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet

2023-01-10

23:58 krinkle@deploy1002: Finished deploy [integration/docroot@b7c82a3]: (no justification provided) (duration: 00m 15s)
23:58 krinkle@deploy1002: Started deploy [integration/docroot@b7c82a3]: (no justification provided)
23:46 mutante: cumin2002 - sudo systemctl status httpbb_hourly_appserver
23:30 zabe@deploy1002: Finished scap: Backport for Start writing to rev_comment_id on test wikis (T299954) (duration: 09m 39s)
23:22 zabe@deploy1002: zabe and zabe: Backport for Start writing to rev_comment_id on test wikis (T299954) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
23:21 zabe@deploy1002: Started scap: Backport for Start writing to rev_comment_id on test wikis (T299954)
22:42 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.18 refs T325581
22:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
22:28 jhuneidi@deploy1002: Pruned MediaWiki: 1.40.0-wmf.14, 1.40.0-wmf.13 (duration: 02m 35s)
22:21 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.18 refs T325581 (duration: 45m 04s)
22:10 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
22:09 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
22:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206 T325046', diff saved to https://phabricator.wikimedia.org/P42980 and previous config saved to /var/cache/conftool/dbconfig/20230110-220942-marostegui.json
22:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
22:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
21:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
21:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
21:54 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
21:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
21:36 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.18 refs T325581
21:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
21:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
21:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
21:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
21:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42979 and previous config saved to /var/cache/conftool/dbconfig/20230110-211826-root.json
21:18 zabe@deploy1002: Finished scap: Backport for Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), Start reading from cul_actor on group1 wikis (T233004) (duration: 10m 08s)
21:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
21:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
21:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
21:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
21:09 zabe@deploy1002: zabe and zabe and matmarex: Backport for Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), Start reading from cul_actor on group1 wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:08 zabe@deploy1002: Started scap: Backport for Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), Start reading from cul_actor on group1 wikis (T233004)
21:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42978 and previous config saved to /var/cache/conftool/dbconfig/20230110-210321-root.json
20:55 mutante: repooling eqsin
20:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42977 and previous config saved to /var/cache/conftool/dbconfig/20230110-204816-root.json
20:37 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:37 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:33 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42976 and previous config saved to /var/cache/conftool/dbconfig/20230110-203311-root.json
20:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:29 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:26 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42975 and previous config saved to /var/cache/conftool/dbconfig/20230110-201807-ladsgroup.json
20:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42974 and previous config saved to /var/cache/conftool/dbconfig/20230110-201806-root.json
20:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
20:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
20:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
20:07 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:06 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
20:04 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:04 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42972 and previous config saved to /var/cache/conftool/dbconfig/20230110-200302-ladsgroup.json
20:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42971 and previous config saved to /var/cache/conftool/dbconfig/20230110-200301-root.json
20:02 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 42s)
20:01 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:01 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:00 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
20:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
19:58 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
19:52 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 06s)
19:51 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
19:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
19:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42970 and previous config saved to /var/cache/conftool/dbconfig/20230110-194757-ladsgroup.json
19:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42969 and previous config saved to /var/cache/conftool/dbconfig/20230110-194756-root.json
19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42968 and previous config saved to /var/cache/conftool/dbconfig/20230110-194750-ladsgroup.json
19:43 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:42 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:39 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:38 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:38 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:38 dancy@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
19:37 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:37 dancy@deploy1002: Installing scap version "4.32.0" for 1 hosts
19:35 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42965 and previous config saved to /var/cache/conftool/dbconfig/20230110-193253-ladsgroup.json
19:32 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42964 and previous config saved to /var/cache/conftool/dbconfig/20230110-193245-ladsgroup.json
19:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
19:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
19:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1158 maint', diff saved to https://phabricator.wikimedia.org/P42963 and previous config saved to /var/cache/conftool/dbconfig/20230110-192929-ladsgroup.json
19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42962 and previous config saved to /var/cache/conftool/dbconfig/20230110-191740-ladsgroup.json
19:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
19:08 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42958 and previous config saved to /var/cache/conftool/dbconfig/20230110-190235-ladsgroup.json
19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
19:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
18:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
18:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
18:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bullseye
18:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bullseye
18:29 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
18:23 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagemaster2001.codfw.wmnet with OS bullseye
18:23 jayme@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
18:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
18:20 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
18:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
18:16 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
18:16 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
18:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2001.codfw.wmnet with reason: host reimage
18:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2001.codfw.wmnet with reason: host reimage
18:01 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bullseye
18:01 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bullseye
17:55 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagemaster2001.codfw.wmnet with OS bullseye
17:51 zabe: run populateCulActor on all wikis # T325484
17:48 claime: Finished rolling reboots of eqiad appservers
17:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
17:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
17:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 maint', diff saved to https://phabricator.wikimedia.org/P42956 and previous config saved to /var/cache/conftool/dbconfig/20230110-173807-ladsgroup.json
17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 T325652', diff saved to https://phabricator.wikimedia.org/P42955 and previous config saved to /var/cache/conftool/dbconfig/20230110-173027-marostegui.json
17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42954 and previous config saved to /var/cache/conftool/dbconfig/20230110-173002-ladsgroup.json
17:29 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 11s)
17:28 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
17:28 ayounsi@deploy1002: deploy aborted: help (duration: 00m 01s)
17:28 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: help
17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42953 and previous config saved to /var/cache/conftool/dbconfig/20230110-171457-ladsgroup.json
17:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:10 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:03 ayounsi@deploy1002: deploy aborted: netbox-next to 3.2.9 (duration: 00m 07s)
17:03 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42952 and previous config saved to /var/cache/conftool/dbconfig/20230110-165952-ladsgroup.json
16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After the incident', diff saved to https://phabricator.wikimedia.org/P42951 and previous config saved to /var/cache/conftool/dbconfig/20230110-165406-root.json
16:48 bblack: depooling eqsin from DNS
16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42950 and previous config saved to /var/cache/conftool/dbconfig/20230110-164447-ladsgroup.json
16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After the incident', diff saved to https://phabricator.wikimedia.org/P42949 and previous config saved to /var/cache/conftool/dbconfig/20230110-163901-root.json
16:36 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2003.codfw.wmnet with OS bullseye
16:24 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After the incident', diff saved to https://phabricator.wikimedia.org/P42948 and previous config saved to /var/cache/conftool/dbconfig/20230110-162356-root.json
16:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2003.codfw.wmnet with reason: host reimage
16:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2003.codfw.wmnet with reason: host reimage
16:14 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2002.codfw.wmnet with OS bullseye
16:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After the incident', diff saved to https://phabricator.wikimedia.org/P42947 and previous config saved to /var/cache/conftool/dbconfig/20230110-160851-root.json
16:08 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:08 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2003.codfw.wmnet with OS bullseye
16:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2002.codfw.wmnet with reason: host reimage
16:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
16:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
16:01 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2002.codfw.wmnet with reason: host reimage
15:59 SandraEbele: reran failed pageview-druid-hourly-coord oozie job for 2023-1-10-10.
15:59 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:58 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1373,1384-1385,1387].eqiad.wmnet
15:55 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1373,1384-1385,1387].eqiad.wmnet
15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After the incident', diff saved to https://phabricator.wikimedia.org/P42946 and previous config saved to /var/cache/conftool/dbconfig/20230110-155346-root.json
15:52 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2002.codfw.wmnet with OS bullseye
15:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After the incident', diff saved to https://phabricator.wikimedia.org/P42945 and previous config saved to /var/cache/conftool/dbconfig/20230110-153841-root.json
15:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
15:30 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:29 claime: Restarting rolling reboots of eqiad appservers
15:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
15:25 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:25 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After the incident', diff saved to https://phabricator.wikimedia.org/P42944 and previous config saved to /var/cache/conftool/dbconfig/20230110-152336-root.json
15:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2001.codfw.wmnet
15:17 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader2001.codfw.wmnet
15:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2001.codfw.wmnet with reason: host reimage
15:11 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2001.codfw.wmnet with reason: host reimage
15:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
15:02 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
15:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2037.codfw.wmnet
15:01 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:01 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2037.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:56 XioNoX: start VC link maintenance in eqiad - T325803
14:55 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:55 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1001.eqiad.wmnet
14:49 zabe: UTC afternoon deploys done
14:49 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader1001.eqiad.wmnet
14:48 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2037.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:47 zabe@deploy1002: Finished scap: Backport for Start reading from cul_actor on remaining test wikis and group0 wikis (T233004) (duration: 08m 59s)
14:46 jiji@cumin1001: START - Cookbook sre.dns.netbox
14:40 zabe@deploy1002: zabe and zabe: Backport for Start reading from cul_actor on remaining test wikis and group0 wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:38 zabe@deploy1002: Started scap: Backport for Start reading from cul_actor on remaining test wikis and group0 wikis (T233004)
14:36 zabe: run populateCulActor on group0 wikis # T325484
14:35 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
14:35 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2037.codfw.wmnet
14:34 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host apifeatureusage2001.codfw.wmnet
14:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2036.codfw.wmnet
14:33 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:33 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2036.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:28 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2036.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
14:28 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:28 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
14:26 jiji@cumin1001: START - Cookbook sre.dns.netbox
14:25 zabe@deploy1002: Finished scap: Backport for [config]: GDI Safety Survey Wave 4 (T325136) (duration: 17m 42s)
14:21 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage2001.codfw.wmnet
14:19 claime: Pausing reboots of eqiad appservers for deployments
14:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1369-1372].eqiad.wmnet
14:18 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1369-1372].eqiad.wmnet
14:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage1001.eqiad.wmnet
14:11 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2036.codfw.wmnet
14:10 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
14:09 zabe@deploy1002: zabe and essexigyan: Backport for [config]: GDI Safety Survey Wave 4 (T325136) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:07 zabe@deploy1002: Started scap: Backport for [config]: GDI Safety Survey Wave 4 (T325136)
14:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-codfw with k8s 1.23
14:06 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage1001.eqiad.wmnet
14:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-codfw with k8s 1.23
14:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2035.codfw.wmnet
14:03 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:03 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2035.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
13:49 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2035.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
13:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1002.eqiad.wmnet with OS bullseye
13:46 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
13:46 jiji@cumin1001: START - Cookbook sre.dns.netbox
13:44 godog: delete grafana dashboards from "sre dashboards for deletion" folder - T178690
13:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
13:37 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2035.codfw.wmnet
13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
13:34 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2001.wikimedia.org
13:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2001.wikimedia.org
13:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
13:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetdb-test2001.codfw.wmnet
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb-test2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:59 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
12:59 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1002.eqiad.wmnet with OS bullseye
12:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb-test2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:53 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
12:50 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
12:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
12:50 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetdb-test2001.codfw.wmnet
12:49 claime: Starting rolling reboot of eqiad appservers
12:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
12:36 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
12:34 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1002.eqiad.wmnet with OS bullseye
12:31 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
12:31 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
12:31 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
12:31 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
12:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
12:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
12:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2034.codfw.wmnet
12:18 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:18 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
12:12 claime: Finished rolling reboot of eqiad jobrunners
12:07 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:06 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:06 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
12:05 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
12:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
11:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
11:58 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:57 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:57 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:56 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:53 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:52 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:48 jiji@cumin1001: START - Cookbook sre.dns.netbox
11:35 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
11:33 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
11:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2047.codfw.wmnet
11:00 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2034.codfw.wmnet
10:31 godog: upgrade thanos to 0.30.1 on thanos-fe2* - T303154
10:24 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
10:23 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
10:21 claime: Starting rolling reboot of eqiad jobrunners
10:21 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:18 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1002.eqiad.wmnet with OS bullseye
10:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
10:14 claime: repooled parse1002.eqiad.wmnet - T326119
10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
10:13 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
10:07 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
10:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2033.codfw.wmnet
10:06 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:06 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
10:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
09:59 cgoubert@cumin1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1002.eqiad.wmnet
09:55 godog: upgrade thanos to 0.30.1 on prometheus hosts - T303154
09:53 moritzm: installing systemd bugfix updates from Bullseye point release
09:45 aqu@deploy1002: Finished deploy [airflow-dags/analytics@9568478]: Fix bug fix in HDFS usage pipeline [airflow-dags@9568478] (duration: 00m 13s)
09:45 aqu@deploy1002: Started deploy [airflow-dags/analytics@9568478]: Fix bug fix in HDFS usage pipeline [airflow-dags@9568478]
09:43 godog: upgrade thanos to 0.30.1 on thanos-fe100[2-3] - T303154
09:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9568478]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@9568478] (duration: 00m 11s)
09:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
09:34 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9568478]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@9568478]
09:25 XioNoX: repool ulsfo (maintenance cancelled) - T316532
09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
09:22 taavi: added zabe to wmf-deployment gerrit group T326327
09:19 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2033.codfw.wmnet
09:18 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
09:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2032.codfw.wmnet
09:17 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:17 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
09:15 kart_: Done: UTC morning backport window
09:14 kartik@deploy1002: Finished scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) (duration: 09m 20s)
09:07 kartik@deploy1002: kartik and kartik: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
09:05 kartik@deploy1002: Started scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278)
08:58 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
08:56 godog: upgrade thanos to 0.30.1 on thanos-fe1001 - T303154
08:54 godog: upgrade thanos to 0.30.1 on prometheus2006 - T303154
08:49 kartik@deploy1002: Finished scap: Backport for CX: Fix usage of categories translation unit as array (T326278) (duration: 12m 08s)
08:38 kartik@deploy1002: kartik and kartik: Backport for CX: Fix usage of categories translation unit as array (T326278) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
08:37 kartik@deploy1002: Started scap: Backport for CX: Fix usage of categories translation unit as array (T326278)
08:20 kartik@deploy1002: Finished scap: Backport for ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721) (duration: 17m 21s)
08:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
08:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
08:08 kartik@deploy1002: kartik and kartik: Backport for ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:03 kartik@deploy1002: Started scap: Backport for ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721)
08:02 jiji@cumin1001: START - Cookbook sre.dns.netbox
07:45 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2032.codfw.wmnet
07:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2031.codfw.wmnet
07:37 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:37 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
07:36 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
07:33 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mc2044.codfw.wmnet
07:28 XioNoX: depool ulsfo for network maintenance - T316532
07:27 jiji@cumin1001: START - Cookbook sre.dns.netbox
07:22 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2031.codfw.wmnet
07:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if dns update is needed after change of rec-dns-lb IPs status - ayounsi@cumin1001"
07:14 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if dns update is needed after change of rec-dns-lb IPs status - ayounsi@cumin1001"
07:11 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
07:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 T326133', diff saved to https://phabricator.wikimedia.org/P42941 and previous config saved to /var/cache/conftool/dbconfig/20230110-070628-ladsgroup.json
07:03 XioNoX: remove static routes for legacy dns-rec-lb IPs - T239993
07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write T326133', diff saved to https://phabricator.wikimedia.org/P42940 and previous config saved to /var/cache/conftool/dbconfig/20230110-070223-ladsgroup.json
07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T326133', diff saved to https://phabricator.wikimedia.org/P42939 and previous config saved to /var/cache/conftool/dbconfig/20230110-070152-ladsgroup.json
07:01 Amir1: Starting s5 eqiad failover from db1130 to db1100 - T326133
06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 T326133', diff saved to https://phabricator.wikimedia.org/P42938 and previous config saved to /var/cache/conftool/dbconfig/20230110-062309-ladsgroup.json
06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T326133
06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T326133
05:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Sync idm-test1001 - slyngshede@cumin1001"
05:38 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Sync idm-test1001 - slyngshede@cumin1001"
03:14 eileen: civicrm upgraded from 391e8482 to 9afd2789
03:12 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
02:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
02:08 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
01:50 krinkle@deploy1002: Finished deploy [integration/docroot@f59119c]: (no justification provided) (duration: 00m 14s)
01:50 krinkle@deploy1002: Started deploy [integration/docroot@f59119c]: (no justification provided)
01:28 eileen: civicrm upgraded from e3405a4e to 391e8482
00:48 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247

2023-01-09

22:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
22:33 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
22:32 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
22:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
22:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2030.codfw.wmnet
22:25 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:25 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
22:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
22:11 jiji@cumin1001: START - Cookbook sre.dns.netbox
22:05 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2030.codfw.wmnet
22:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2029.codfw.wmnet
22:03 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:03 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
22:00 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
21:52 kindrobot: close UTC late backport window
21:50 jiji@cumin1001: START - Cookbook sre.dns.netbox
21:47 kindrobot@deploy1002: Sync cancelled.
21:47 kindrobot@deploy1002: kindrobot and trainbranchbot: Backport for Revert "[config]: Deploy GDI Safety Survey Wave 4" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
21:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
21:45 kindrobot@deploy1002: Started scap: Backport for Revert "[config]: Deploy GDI Safety Survey Wave 4"
21:39 kindrobot@deploy1002: Sync cancelled.
21:38 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2029.codfw.wmnet
21:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2027.codfw.wmnet
21:37 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:37 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:34 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - bking@cumin1001 - T324247
21:29 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:27 jiji@cumin1001: START - Cookbook sre.dns.netbox
21:26 kindrobot@deploy1002: kindrobot and essexigyan: Backport for [config]: Deploy GDI Safety Survey Wave 4 (T325136) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:24 kindrobot@deploy1002: Started scap: Backport for [config]: Deploy GDI Safety Survey Wave 4 (T325136)
21:21 kindrobot: starting UTC late backport window
21:21 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2027.codfw.wmnet
21:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2026.codfw.wmnet
21:18 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:18 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:09 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
21:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P42936 and previous config saved to /var/cache/conftool/dbconfig/20230109-210940-marostegui.json
21:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
21:03 jiji@cumin1001: START - Cookbook sre.dns.netbox
20:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
20:57 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2026.codfw.wmnet
20:52 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - bking@cumin1001 - T324247
20:52 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:44 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:44 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:44 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:36 Amir1: deleting global usage coming from commons in commons (T322588)
20:36 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:35 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:34 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
20:33 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
20:25 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:24 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
20:21 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
20:20 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
20:20 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
20:20 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
19:37 bblack: cp5032: set param transit_buffer=1M via varnishadm
19:33 bblack: cp5032: set param transit_buffer=4M via varnishadm
19:26 bblack: cp5032: set param transit_buffer=1M via varnishadm
19:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2025.codfw.wmnet
19:22 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:22 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2025.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
19:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2025.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
19:11 jiji@cumin1001: START - Cookbook sre.dns.netbox
19:05 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2025.codfw.wmnet
19:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2024.codfw.wmnet
19:04 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:04 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
19:00 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:57 jiji@cumin1001: START - Cookbook sre.dns.netbox
18:48 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2024.codfw.wmnet
18:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2023.codfw.wmnet
18:43 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:43 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:41 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
18:36 jiji@cumin1001: START - Cookbook sre.dns.netbox
18:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
18:30 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2023.codfw.wmnet
18:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2022.codfw.wmnet
18:07 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:07 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2022.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
18:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2022.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
18:00 jiji@cumin1001: START - Cookbook sre.dns.netbox
17:56 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2022.codfw.wmnet
17:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
17:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
17:46 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc2021.codfw.wmnet
17:46 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:46 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2021.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
17:42 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
17:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
17:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
17:41 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
17:36 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2021.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
17:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
17:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
17:34 claime: Finished codfw jobrunner rolling reboot
17:32 jiji@cumin1001: START - Cookbook sre.dns.netbox
17:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
16:59 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2021.codfw.wmnet
16:49 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
16:48 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
16:46 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc2020.codfw.wmnet
16:46 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:46 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
16:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
16:40 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:32 jiji@cumin1001: START - Cookbook sre.dns.netbox
16:11 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2020.codfw.wmnet
16:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2019.codfw.wmnet
16:11 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:11 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:08 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
16:04 XioNoX: start VC link maintenance in eqiad - T325803
16:03 jiji@cumin1001: START - Cookbook sre.dns.netbox
15:58 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2019.codfw.wmnet
15:37 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:37 claime: Starting codfw jobrunner rolling reboot
15:35 Lucas_WMDE: UTC afternoon backport+config window done
15:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for CX: Allow composer/installers plugin (duration: 10m 03s)
15:29 claime: Not starting codfw jobrunner rolling reboot, deploy in progress
15:28 claime: Starting codfw jobrunner rolling reboot
15:26 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and kartik: Backport for CX: Allow composer/installers plugin synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
15:24 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for CX: Allow composer/installers plugin
15:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps2009.codfw.wmnet,maps1009.eqiad.wmnet with reason: Removing redis service
15:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps2009.codfw.wmnet,maps1009.eqiad.wmnet with reason: Removing redis service
15:11 effie: disable puppet on all 'P:mediawiki::mcrouter_wancache' hosts to merge 875894
15:09 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for extwiki: Install SandboxLink extension (T326450) (duration: 08m 37s)
15:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
15:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
15:02 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for extwiki: Install SandboxLink extension (T326450) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
15:00 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for extwiki: Install SandboxLink extension (T326450)
15:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2003.codfw.wmnet
14:59 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ echo 'https://en.wikipedia.org/static/images/project-logos/jawikisource.png' | mwscript purgeList.php # T326488
14:56 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for jawikisource: Update project logo and wordmark (T326488) (duration: 09m 24s)
14:55 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2003.codfw.wmnet
14:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1004.eqiad.wmnet
14:48 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for jawikisource: Update project logo and wordmark (T326488) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:47 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1004.eqiad.wmnet
14:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for jawikisource: Update project logo and wordmark (T326488)
14:45 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for arwiki: Create extendedmover group (T326434) (duration: 08m 56s)
14:38 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for arwiki: Create extendedmover group (T326434) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
14:36 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for arwiki: Create extendedmover group (T326434)
14:31 godog: upgrade thanos to 0.30.1 on prometheus2005 - T303154
14:27 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for mediawikiwiki: Disable Flow on new pages by default (T325907) (duration: 18m 19s)
14:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for mediawikiwiki: Disable Flow on new pages by default (T325907) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:09 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for mediawikiwiki: Disable Flow on new pages by default (T325907)
13:55 moritzm: installing systemd bugfix updates from Bullseye point release
13:41 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1003.eqiad.wmnet
13:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1003.eqiad.wmnet
13:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
13:35 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
12:53 hnowlan@deploy1002: Finished deploy [restbase/deploy@bcb0a69]: New wikis T321284 T321290 T321296 T326140 (duration: 18m 56s)
12:34 hnowlan@deploy1002: Started deploy [restbase/deploy@bcb0a69]: New wikis T321284 T321290 T321296 T326140
12:18 vgutierrez: repool cp5025
11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15954
11:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15954
11:29 vgutierrez: restart purged on cp5025
11:28 vgutierrez: depool cp5025 due to purging issues
11:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum1001.eqiad.wmnet
11:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum1001.eqiad.wmnet
11:06 XioNoX: repool ulsfo - T316532
11:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
10:55 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:55 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
10:54 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
10:54 claime: Starting codfw appserver rolling reboot
10:54 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
10:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
10:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet
10:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
10:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet
10:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
10:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode2001.codfw.wmnet
10:46 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
10:46 effie: switching maps to eqiad
10:45 moritzm: installing avahi security updates
10:44 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
10:41 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
09:35 dcausse: restarting blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
09:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
09:04 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
08:58 moritzm: installing glibc security updates
08:56 XioNoX: depool ulsfo for network maintenance - T316532
08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 327700
08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 327700
08:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 48237
08:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 48237
08:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32035
08:21 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idm-test1001.wikimedia.org
08:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32035
08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idm-test1001.wikimedia.org on all recursors
08:12 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache idm-test1001.wikimedia.org on all recursors
08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"
08:08 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"
08:06 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
08:06 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host idm-test1001.wikimedia.org

2023-01-06

18:57 mutante: systemctl start docker-gc on all gitlab-runners via cumin T310593
18:56 mutante: gitlab-runner1002 - systemctl start docker-gc; run puppet on all gitlab-runners T310593
18:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: debugging
18:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: debugging
18:36 sukhe: pool cp5032 [bullseye upgrade completed]: T325797
18:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5032.eqsin.wmnet,service=ats-be
18:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5032.eqsin.wmnet,service=cdn
18:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425
18:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425
18:13 Krinkle: krinkle@cloudweb1003$ Run `UPDATE actor SET actor_user=31136 WHERE actor_id=14640;` to partially fix T326431
17:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5032.eqsin.wmnet with OS bullseye
17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
17:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
16:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
16:53 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
16:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
16:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
16:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
15:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1486.eqiad.wmnet
15:53 claime: depooling mw1486.eqiad.wmnet for hardware troubleshooting
15:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
15:30 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
15:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
15:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts cp5032.eqsin.wmnet
15:08 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp5032.eqsin.wmnet
15:07 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5032.eqsin.wmnet,service=ats-be
15:07 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5032.eqsin.wmnet,service=cdn
15:07 sukhe: depool cp5032 for bullseye upgrade (starting with NIC firmware upgrade): T325797
14:42 jbond: remove bgpalerter from apt
14:06 reedy@deploy1002: Synchronized php-1.40.0-wmf.17/extensions/SecurePoll/cli/wm-scripts/ucoc2023/populateEditCount.php: T326408 (duration: 07m 09s)
12:42 stevemunene@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
12:36 tzatziki: running extensions/SecurePoll/cli/wm-scripts/ucoc2023/ucoc2023_tables.sql on each wiki
12:29 stevemunene@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
11:38 jbond: upload bgpalerter to bullseye-wikimedia
11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2113.codfw.wmnet with reason: Maintenance
11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2113.codfw.wmnet with reason: Maintenance
11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1130.eqiad.wmnet with reason: Maintenance
11:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1130.eqiad.wmnet with reason: Maintenance
10:10 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 21245
10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 21245
09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36994
09:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36994
09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266925
09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 266925
09:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9038
09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9038
09:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5713
09:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5713
09:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37473
09:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37473
09:03 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 4788
09:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4788
09:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32035
09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32035
09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15954
09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15954
09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 60427
09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 60427
09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58717
09:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58717
09:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45489
08:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45489
08:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 24482
08:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 24482
08:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9119
08:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9119
08:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049
08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 64049
08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263237
08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 263237
08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 51185
08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 51185
08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 201746
08:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 201746
08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62597
08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62597
08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 327700
08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 327700
08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56630
08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56630
08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21245
08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21245
08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37282
08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37282
08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37558
08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37558
08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13113
08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13113
08:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 41095
08:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 41095
08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61573
08:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 61573
08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21320
08:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21320
08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 39405
08:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 39405
08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 48237
08:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 48237
08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 47794
08:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 47794
08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22822
08:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22822
08:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58715
08:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58715
08:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 51254
08:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 51254
08:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35432
08:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35432
08:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132602
08:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 132602
08:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42473
08:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42473
08:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16347
08:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16347
08:05 XioNoX: drmrs offload Vodafone from Tata - T324955
01:08 urbanecm@deploy1002: Finished scap: Backport for Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394) (duration: 08m 48s)
01:01 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
00:59 urbanecm@deploy1002: Started scap: Backport for Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394)
00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42928 and previous config saved to /var/cache/conftool/dbconfig/20230106-004102-ladsgroup.json
00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42927 and previous config saved to /var/cache/conftool/dbconfig/20230106-002556-ladsgroup.json
00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42926 and previous config saved to /var/cache/conftool/dbconfig/20230106-001049-ladsgroup.json

2023-01-05

23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42925 and previous config saved to /var/cache/conftool/dbconfig/20230105-235543-ladsgroup.json
23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42924 and previous config saved to /var/cache/conftool/dbconfig/20230105-235325-ladsgroup.json
23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42923 and previous config saved to /var/cache/conftool/dbconfig/20230105-235304-ladsgroup.json
23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42922 and previous config saved to /var/cache/conftool/dbconfig/20230105-233758-ladsgroup.json
23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42921 and previous config saved to /var/cache/conftool/dbconfig/20230105-232251-ladsgroup.json
23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42920 and previous config saved to /var/cache/conftool/dbconfig/20230105-230745-ladsgroup.json
23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42919 and previous config saved to /var/cache/conftool/dbconfig/20230105-230629-ladsgroup.json
23:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
23:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42918 and previous config saved to /var/cache/conftool/dbconfig/20230105-230607-ladsgroup.json
22:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42917 and previous config saved to /var/cache/conftool/dbconfig/20230105-225101-ladsgroup.json
22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42916 and previous config saved to /var/cache/conftool/dbconfig/20230105-223554-ladsgroup.json
22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42915 and previous config saved to /var/cache/conftool/dbconfig/20230105-222048-ladsgroup.json
22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42914 and previous config saved to /var/cache/conftool/dbconfig/20230105-221932-ladsgroup.json
22:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42913 and previous config saved to /var/cache/conftool/dbconfig/20230105-221911-ladsgroup.json
22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42912 and previous config saved to /var/cache/conftool/dbconfig/20230105-220404-ladsgroup.json
21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42911 and previous config saved to /var/cache/conftool/dbconfig/20230105-214858-ladsgroup.json
21:43 TheresNoTime: closing UTC late backport window
21:42 samtar@deploy1002: Finished scap: Backport for Turn off wgNavigationTimingOversampleFactor campaigns (T286703) (duration: 08m 45s)
21:35 samtar@deploy1002: samtar and krinkle: Backport for Turn off wgNavigationTimingOversampleFactor campaigns (T286703) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:33 samtar@deploy1002: Started scap: Backport for Turn off wgNavigationTimingOversampleFactor campaigns (T286703)
21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42910 and previous config saved to /var/cache/conftool/dbconfig/20230105-213351-ladsgroup.json
21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42909 and previous config saved to /var/cache/conftool/dbconfig/20230105-213235-ladsgroup.json
21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2137.codfw.wmnet with reason: Maintenance
21:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2137.codfw.wmnet with reason: Maintenance
21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42908 and previous config saved to /var/cache/conftool/dbconfig/20230105-213214-ladsgroup.json
21:31 samtar@deploy1002: Finished scap: Backport for actions: Actually store CommentFormatter in McrUndoAction (T326336) (duration: 10m 31s)
21:23 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
21:23 samtar@deploy1002: samtar and zabe: Backport for actions: Actually store CommentFormatter in McrUndoAction (T326336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:21 samtar@deploy1002: Started scap: Backport for actions: Actually store CommentFormatter in McrUndoAction (T326336)
21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42907 and previous config saved to /var/cache/conftool/dbconfig/20230105-211707-ladsgroup.json
21:16 samtar@deploy1002: Finished scap: Backport for Start writing to cuc_comment_id everywhere (T233004) (duration: 10m 07s)
21:08 samtar@deploy1002: samtar and zabe: Backport for Start writing to cuc_comment_id everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:06 samtar@deploy1002: Started scap: Backport for Start writing to cuc_comment_id everywhere (T233004)
21:04 samtar@deploy1002: backport aborted: (duration: 01m 22s)
21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42906 and previous config saved to /var/cache/conftool/dbconfig/20230105-210201-ladsgroup.json
20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42905 and previous config saved to /var/cache/conftool/dbconfig/20230105-204654-ladsgroup.json
20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42904 and previous config saved to /var/cache/conftool/dbconfig/20230105-204438-ladsgroup.json
20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42903 and previous config saved to /var/cache/conftool/dbconfig/20230105-204403-ladsgroup.json
20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42902 and previous config saved to /var/cache/conftool/dbconfig/20230105-202856-ladsgroup.json
20:17 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@9568478]: Bumping platform_eng airflow instance to latest (duration: 00m 09s)
20:17 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@9568478]: Bumping platform_eng airflow instance to latest
20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42901 and previous config saved to /var/cache/conftool/dbconfig/20230105-201350-ladsgroup.json
19:59 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.17 refs T325580
19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42900 and previous config saved to /var/cache/conftool/dbconfig/20230105-195843-ladsgroup.json
19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42899 and previous config saved to /var/cache/conftool/dbconfig/20230105-195627-ladsgroup.json
19:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
19:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42898 and previous config saved to /var/cache/conftool/dbconfig/20230105-195606-ladsgroup.json
19:48 taavi@deploy1002: Finished scap: Backport for actions: Pass CommentFormatter to McrRestoreAction (T326275) (duration: 10m 11s)
19:41 taavi@deploy1002: taavi and zabe: Backport for actions: Pass CommentFormatter to McrRestoreAction (T326275) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42897 and previous config saved to /var/cache/conftool/dbconfig/20230105-194059-ladsgroup.json
19:38 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.10-1wm3_amd64.changes: T325797
19:37 taavi@deploy1002: Started scap: Backport for actions: Pass CommentFormatter to McrRestoreAction (T326275)
19:31 Amir1: creating new cu tables
19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42896 and previous config saved to /var/cache/conftool/dbconfig/20230105-192553-ladsgroup.json
19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42895 and previous config saved to /var/cache/conftool/dbconfig/20230105-191046-ladsgroup.json
19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42894 and previous config saved to /var/cache/conftool/dbconfig/20230105-190830-ladsgroup.json
19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2111.codfw.wmnet with reason: Maintenance
19:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2111.codfw.wmnet with reason: Maintenance
19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2101.codfw.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2101.codfw.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42893 and previous config saved to /var/cache/conftool/dbconfig/20230105-190724-ladsgroup.json
18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42892 and previous config saved to /var/cache/conftool/dbconfig/20230105-185217-ladsgroup.json
18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42891 and previous config saved to /var/cache/conftool/dbconfig/20230105-183711-ladsgroup.json
18:22 taavi: delete some nostalgiawiki pages using maintenance/deleteBatch.php for T326334
18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42890 and previous config saved to /var/cache/conftool/dbconfig/20230105-182204-ladsgroup.json
18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42889 and previous config saved to /var/cache/conftool/dbconfig/20230105-181949-ladsgroup.json
18:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
18:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42888 and previous config saved to /var/cache/conftool/dbconfig/20230105-181928-ladsgroup.json
18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42887 and previous config saved to /var/cache/conftool/dbconfig/20230105-180421-ladsgroup.json
17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42886 and previous config saved to /var/cache/conftool/dbconfig/20230105-174915-ladsgroup.json
17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42885 and previous config saved to /var/cache/conftool/dbconfig/20230105-173408-ladsgroup.json
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42884 and previous config saved to /var/cache/conftool/dbconfig/20230105-173154-ladsgroup.json
17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42883 and previous config saved to /var/cache/conftool/dbconfig/20230105-173133-ladsgroup.json
17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42882 and previous config saved to /var/cache/conftool/dbconfig/20230105-171626-ladsgroup.json
17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42880 and previous config saved to /var/cache/conftool/dbconfig/20230105-170119-ladsgroup.json
16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42878 and previous config saved to /var/cache/conftool/dbconfig/20230105-164612-ladsgroup.json
16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42877 and previous config saved to /var/cache/conftool/dbconfig/20230105-164358-ladsgroup.json
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42876 and previous config saved to /var/cache/conftool/dbconfig/20230105-164258-ladsgroup.json
16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42875 and previous config saved to /var/cache/conftool/dbconfig/20230105-162751-ladsgroup.json
16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42874 and previous config saved to /var/cache/conftool/dbconfig/20230105-161245-ladsgroup.json
16:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
16:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
16:04 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
16:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42873 and previous config saved to /var/cache/conftool/dbconfig/20230105-155738-ladsgroup.json
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42872 and previous config saved to /var/cache/conftool/dbconfig/20230105-155524-ladsgroup.json
15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42871 and previous config saved to /var/cache/conftool/dbconfig/20230105-155503-ladsgroup.json
15:52 matthiasmullie: UTC afternoon backports done
15:51 mlitn@deploy1002: Finished scap: Backport for Fix URL construction (duration: 12m 21s)
15:41 mlitn@deploy1002: mlitn and mlitn: Backport for Fix URL construction synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42870 and previous config saved to /var/cache/conftool/dbconfig/20230105-153956-ladsgroup.json
15:39 mlitn@deploy1002: Started scap: Backport for Fix URL construction
15:37 mlitn@deploy1002: Finished scap: Backport for Fix URL construction (duration: 08m 04s)
15:31 mlitn@deploy1002: mlitn and mlitn: Backport for Fix URL construction synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
15:29 mlitn@deploy1002: Started scap: Backport for Fix URL construction
15:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
15:26 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42869 and previous config saved to /var/cache/conftool/dbconfig/20230105-152447-ladsgroup.json
15:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
15:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:14 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
15:10 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
15:10 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
15:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42868 and previous config saved to /var/cache/conftool/dbconfig/20230105-150939-ladsgroup.json
15:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42867 and previous config saved to /var/cache/conftool/dbconfig/20230105-150825-ladsgroup.json
15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance
15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance
15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42866 and previous config saved to /var/cache/conftool/dbconfig/20230105-150804-ladsgroup.json
14:58 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
14:58 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
14:56 claime: hard resetting mw1486
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42865 and previous config saved to /var/cache/conftool/dbconfig/20230105-145257-ladsgroup.json
14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42864 and previous config saved to /var/cache/conftool/dbconfig/20230105-143751-ladsgroup.json
14:30 mlitn@deploy1002: Finished scap: Backport for Also get central description (T325831) (duration: 08m 32s)
14:23 mlitn@deploy1002: mlitn and mlitn: Backport for Also get central description (T325831) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42862 and previous config saved to /var/cache/conftool/dbconfig/20230105-142244-ladsgroup.json
14:21 mlitn@deploy1002: Started scap: Backport for Also get central description (T325831)
14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42861 and previous config saved to /var/cache/conftool/dbconfig/20230105-142029-ladsgroup.json
14:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1110.eqiad.wmnet with reason: Maintenance
14:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1110.eqiad.wmnet with reason: Maintenance
14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42860 and previous config saved to /var/cache/conftool/dbconfig/20230105-142008-ladsgroup.json
14:17 mlitn@deploy1002: Finished scap: Backport for Also get central description (T325831) (duration: 07m 57s)
14:11 mlitn@deploy1002: mlitn and mlitn: Backport for Also get central description (T325831) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:09 mlitn@deploy1002: Started scap: Backport for Also get central description (T325831)
14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42859 and previous config saved to /var/cache/conftool/dbconfig/20230105-140501-ladsgroup.json
13:58 Amir1: start of externallinks migration in elwiki (and rest of large wikis in s3) (T326314)
13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42858 and previous config saved to /var/cache/conftool/dbconfig/20230105-134955-ladsgroup.json
13:46 ladsgroup@deploy1002: Finished scap: Backport for Enable write both for externallinks in ten largest s3 wikis (T321662) (duration: 08m 54s)
13:42 urbanecm: aswikiquote: Run importDump.php to import a XML dump (per new wiki importers request, running into issues with a largish page)
13:39 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Enable write both for externallinks in ten largest s3 wikis (T321662) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:38 XioNoX: start [eqiad] faulty VC optics maintenance - T325803
13:37 ladsgroup@deploy1002: Started scap: Backport for Enable write both for externallinks in ten largest s3 wikis (T321662)
13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42857 and previous config saved to /var/cache/conftool/dbconfig/20230105-133448-ladsgroup.json
13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42856 and previous config saved to /var/cache/conftool/dbconfig/20230105-133234-ladsgroup.json
13:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1100.eqiad.wmnet with reason: Maintenance
13:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1100.eqiad.wmnet with reason: Maintenance
13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42855 and previous config saved to /var/cache/conftool/dbconfig/20230105-133211-ladsgroup.json
13:30 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
13:29 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
13:21 effie: enable puppet on all mw servers
13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42854 and previous config saved to /var/cache/conftool/dbconfig/20230105-131705-ladsgroup.json
13:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
13:03 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
13:03 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
13:03 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
13:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
13:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
13:02 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
13:02 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
13:02 oblivian@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
13:02 oblivian@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42853 and previous config saved to /var/cache/conftool/dbconfig/20230105-130158-ladsgroup.json
13:02 oblivian@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
13:01 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
13:01 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
13:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
13:00 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
13:00 hashar: Restarted Gerrit for a plugin update
12:58 hashar@deploy1002: Finished deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages (duration: 00m 08s)
12:58 hashar@deploy1002: Started deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages
12:52 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
12:49 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
12:49 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
12:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42852 and previous config saved to /var/cache/conftool/dbconfig/20230105-124651-ladsgroup.json
12:45 hashar@deploy1002: Finished deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages (duration: 00m 10s)
12:45 hashar@deploy1002: Started deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages
12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42851 and previous config saved to /var/cache/conftool/dbconfig/20230105-124437-ladsgroup.json
12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance
12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance
12:44 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:42 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
12:42 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:31 ladsgroup:: Deployed security patch for T233004 T326293
12:02 hashar: gerrit: running `copy-approvals` script to prepare for Gerrit 3.6 upgrade (T309870): `ssh -p 29418 gerrit.wikimedia.org gerrit copy-approvals --verbose`
11:59 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
11:58 hashar: Restarting Gerrit
11:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler (duration: 00m 09s)
11:57 hashar@deploy1002: Started deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler
11:57 hashar: Stopping Gerrit for plugin deployment
11:45 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
11:40 effie: disabling puppet on all hosts running mcrouter to merge 860102
11:24 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mwdebug,name=eqiad
11:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:23 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=mwdebug,name=eqiad
11:23 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:22 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mwdebug,name=codfw
11:20 hashar@deploy1002: Finished deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler (duration: 00m 10s)
11:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:20 hashar@deploy1002: Started deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler
11:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:19 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=mwdebug,name=codfw
11:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:13 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:13 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:12 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:12 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42850 and previous config saved to /var/cache/conftool/dbconfig/20230105-105808-root.json
10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42849 and previous config saved to /var/cache/conftool/dbconfig/20230105-104303-root.json
10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42848 and previous config saved to /var/cache/conftool/dbconfig/20230105-102758-root.json
10:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:26 claime: Rolling reboot of api_appserver hosts in eqiad
10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 100%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42847 and previous config saved to /var/cache/conftool/dbconfig/20230105-102357-root.json
10:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42846 and previous config saved to /var/cache/conftool/dbconfig/20230105-101253-root.json
10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 75%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42845 and previous config saved to /var/cache/conftool/dbconfig/20230105-100852-root.json
10:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:06 claime: Restarting rolling reboot of api_appserver hosts in codfw
09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42844 and previous config saved to /var/cache/conftool/dbconfig/20230105-095748-root.json
09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 60%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42843 and previous config saved to /var/cache/conftool/dbconfig/20230105-095347-root.json
09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42841 and previous config saved to /var/cache/conftool/dbconfig/20230105-094243-root.json
09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 50%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42840 and previous config saved to /var/cache/conftool/dbconfig/20230105-093842-root.json
09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42839 and previous config saved to /var/cache/conftool/dbconfig/20230105-092738-root.json
09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 40%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42838 and previous config saved to /var/cache/conftool/dbconfig/20230105-092336-root.json
09:14 XioNoX: turn up BGP to NTT in drmrs - T314929
09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 25%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42837 and previous config saved to /var/cache/conftool/dbconfig/20230105-090831-root.json
08:56 hashar@deploy1002: Finished scap: Backport for [SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367) (duration: 11m 38s)
08:46 hashar@deploy1002: hashar and mlitn: Backport for [SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:44 hashar@deploy1002: Started scap: Backport for [SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367)
07:58 moritzm: installing glibc security updates on bullseye
07:50 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db2151 in s6 T326206', diff saved to https://phabricator.wikimedia.org/P42836 and previous config saved to /var/cache/conftool/dbconfig/20230105-075046-marostegui.json
07:28 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
07:27 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
07:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
07:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 to clone db1176 T326211', diff saved to https://phabricator.wikimedia.org/P42833 and previous config saved to /var/cache/conftool/dbconfig/20230105-064153-marostegui.json
06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2151 for the first time in s6 T326206', diff saved to https://phabricator.wikimedia.org/P42832 and previous config saved to /var/cache/conftool/dbconfig/20230105-063937-marostegui.json
06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance

2023-01-04

23:01 mutante: deploy2002 - re-arming keyholder T324014
23:00 mutante: deploy1002 - re-arming keyholder T324014
22:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42831 and previous config saved to /var/cache/conftool/dbconfig/20230104-223545-marostegui.json
22:27 kindrobot: finished UTC late backport window
22:27 kindrobot@deploy1002: Finished scap: Backport for Fix underlinkedness rescore logic (T301096), Fix underlinkedness rescore logic (T301096) (duration: 15m 20s)
22:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P42828 and previous config saved to /var/cache/conftool/dbconfig/20230104-222038-marostegui.json
22:13 kindrobot@deploy1002: kindrobot and tgr: Backport for Fix underlinkedness rescore logic (T301096), Fix underlinkedness rescore logic (T301096) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
22:11 kindrobot@deploy1002: Started scap: Backport for Fix underlinkedness rescore logic (T301096), Fix underlinkedness rescore logic (T301096)
22:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P42827 and previous config saved to /var/cache/conftool/dbconfig/20230104-220532-marostegui.json
21:51 kindrobot@deploy1002: backport aborted: (duration: 02m 12s)
21:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42826 and previous config saved to /var/cache/conftool/dbconfig/20230104-215025-marostegui.json
21:48 taavi: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki mediawikiwiki "African Wikimedia Technical Community/Project Scope" "Africa Wikimedia Technical Community/Project Scope" "Taavi" --reason "per request phab:T318292" # T318292
21:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42825 and previous config saved to /var/cache/conftool/dbconfig/20230104-214616-marostegui.json
21:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
21:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42824 and previous config saved to /var/cache/conftool/dbconfig/20230104-214555-marostegui.json
21:44 kindrobot@deploy1002: Finished scap: Backport for Add namespace to gorwiktionary (T326253) (duration: 11m 26s)
21:35 kindrobot@deploy1002: kindrobot and jhsoby: Backport for Add namespace to gorwiktionary (T326253) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:33 kindrobot@deploy1002: Started scap: Backport for Add namespace to gorwiktionary (T326253)
21:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P42823 and previous config saved to /var/cache/conftool/dbconfig/20230104-213049-marostegui.json
21:28 kindrobot@deploy1002: Finished scap: Backport for Start writing to cuc_comment_id on group0 and group1 wikis (T233004) (duration: 17m 28s)
21:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P42820 and previous config saved to /var/cache/conftool/dbconfig/20230104-211542-marostegui.json
21:12 kindrobot@deploy1002: kindrobot and zabe: Backport for Start writing to cuc_comment_id on group0 and group1 wikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
21:10 kindrobot@deploy1002: Started scap: Backport for Start writing to cuc_comment_id on group0 and group1 wikis (T233004)
21:05 kindrobot: starting UTC late backport window
21:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42819 and previous config saved to /var/cache/conftool/dbconfig/20230104-210036-marostegui.json
20:58 Amir1: running refreshGlobalimagelinks.php on all wikis (T322588)
20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42818 and previous config saved to /var/cache/conftool/dbconfig/20230104-205628-marostegui.json
20:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
20:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42817 and previous config saved to /var/cache/conftool/dbconfig/20230104-205607-marostegui.json
20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P42816 and previous config saved to /var/cache/conftool/dbconfig/20230104-204100-marostegui.json
20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P42815 and previous config saved to /var/cache/conftool/dbconfig/20230104-202554-marostegui.json
20:14 cstone: payments-wiki upgraded from ede93d62 to f075991f
20:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42814 and previous config saved to /var/cache/conftool/dbconfig/20230104-201047-marostegui.json
20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42813 and previous config saved to /var/cache/conftool/dbconfig/20230104-200638-marostegui.json
20:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
20:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42812 and previous config saved to /var/cache/conftool/dbconfig/20230104-200617-marostegui.json
19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P42811 and previous config saved to /var/cache/conftool/dbconfig/20230104-195110-marostegui.json
19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P42810 and previous config saved to /var/cache/conftool/dbconfig/20230104-193604-marostegui.json
19:32 dduvall@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.17 refs T325580 (duration: 06m 58s)
19:25 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.17 refs T325580
19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42809 and previous config saved to /var/cache/conftool/dbconfig/20230104-192057-marostegui.json
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42808 and previous config saved to /var/cache/conftool/dbconfig/20230104-191648-marostegui.json
19:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
19:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42807 and previous config saved to /var/cache/conftool/dbconfig/20230104-191627-marostegui.json
19:07 dancy@deploy1002: Installing scap version "4.32.0" for 560 hosts
19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P42806 and previous config saved to /var/cache/conftool/dbconfig/20230104-190121-marostegui.json
18:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P42805 and previous config saved to /var/cache/conftool/dbconfig/20230104-184614-marostegui.json
18:40 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@84f5f50]: (no justification provided) (duration: 00m 05s)
18:40 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@84f5f50]: (no justification provided)
18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42804 and previous config saved to /var/cache/conftool/dbconfig/20230104-183108-marostegui.json
18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42803 and previous config saved to /var/cache/conftool/dbconfig/20230104-182700-marostegui.json
18:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
18:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42802 and previous config saved to /var/cache/conftool/dbconfig/20230104-182425-marostegui.json
18:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (after remembering to update the submodules) (duration: 00m 54s)
18:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (after remembering to update the submodules)
18:13 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (duration: 03m 54s)
18:09 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling
18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P42801 and previous config saved to /var/cache/conftool/dbconfig/20230104-180918-marostegui.json
18:00 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P42800 and previous config saved to /var/cache/conftool/dbconfig/20230104-175412-marostegui.json
17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42799 and previous config saved to /var/cache/conftool/dbconfig/20230104-173905-marostegui.json
17:37 dancy@deploy1002: Installing scap version "4.31.1" for 560 hosts
17:36 dancy@deploy1002: Finished scap: testing (duration: 07m 50s)
17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42798 and previous config saved to /var/cache/conftool/dbconfig/20230104-173455-marostegui.json
17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42797 and previous config saved to /var/cache/conftool/dbconfig/20230104-173434-marostegui.json
17:28 dancy@deploy1002: Started scap: testing
17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P42796 and previous config saved to /var/cache/conftool/dbconfig/20230104-171928-marostegui.json
17:10 mutante: new Wikipedia (and other projects) language added: guc - https://en.wikipedia.org/wiki/Wayuu_language - https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Wayuu T321880
17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P42795 and previous config saved to /var/cache/conftool/dbconfig/20230104-170421-marostegui.json
17:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:55 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@84f5f50]: Bumping platform_eng airflow instance to latest (duration: 00m 17s)
16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@84f5f50]: Bumping platform_eng airflow instance to latest
16:49 dancy@deploy1002: Installing scap version "4.30.3-1" for 560 hosts
16:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42794 and previous config saved to /var/cache/conftool/dbconfig/20230104-164915-marostegui.json
16:48 dancy@deploy1002: Finished scap: testing (duration: 13m 16s)
16:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42793 and previous config saved to /var/cache/conftool/dbconfig/20230104-164504-marostegui.json
16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1112.eqiad.wmnet with reason: Maintenance
16:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1112.eqiad.wmnet with reason: Maintenance
16:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
16:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
16:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
16:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
16:37 dancy@deploy1002: Started scap: testing
16:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:33 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:30 dancy@deploy1002: Installing scap version "4.31.0" for 560 hosts
16:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42792 and previous config saved to /var/cache/conftool/dbconfig/20230104-162828-marostegui.json
16:29 dancy@deploy1002: sync-world aborted: (no justification provided) (duration: 00m 13s)
16:27 dancy@deploy1002: Started scap: (no justification provided)
16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P42791 and previous config saved to /var/cache/conftool/dbconfig/20230104-161321-marostegui.json
15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2402.*
15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2401.*
15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2400.*
15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P42790 and previous config saved to /var/cache/conftool/dbconfig/20230104-155815-marostegui.json
15:51 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42789 and previous config saved to /var/cache/conftool/dbconfig/20230104-154308-marostegui.json
15:34 moritzm: installing glibc security updates on bullseye
15:34 moritzm: installing glibc security updates
15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42788 and previous config saved to /var/cache/conftool/dbconfig/20230104-153435-marostegui.json
15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42787 and previous config saved to /var/cache/conftool/dbconfig/20230104-153413-marostegui.json
15:33 ladsgroup@deploy1002: Finished scap: Backport for Disable LoadMonitor in CLI (T322156) (duration: 09m 48s)
15:32 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:32 claime: Restarting rolling reboot of api_appserver hosts in codfw
15:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Disable LoadMonitor in CLI (T322156) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
15:23 ladsgroup@deploy1002: Started scap: Backport for Disable LoadMonitor in CLI (T322156)
15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P42786 and previous config saved to /var/cache/conftool/dbconfig/20230104-151907-marostegui.json
15:06 marostegui: dbmaint deploy schema change on s5 eqiad T326224
15:05 marostegui: dbmaint deploy schema change on s3 eqiad T326224
15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P42785 and previous config saved to /var/cache/conftool/dbconfig/20230104-150400-marostegui.json
15:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
15:00 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42784 and previous config saved to /var/cache/conftool/dbconfig/20230104-144853-marostegui.json
14:46 marostegui: dbmaint deploy schema change on s3 eqiad T326222
14:44 marostegui: dbmaint deploy schema change on s5 eqiad T326222
14:42 XioNoX: fix inconsistent mtu betwen cr1-eqiad<->lsw1-f1 - T315838
14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42783 and previous config saved to /var/cache/conftool/dbconfig/20230104-144025-marostegui.json
14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
14:40 urbanecm: UTC afternoon B&C window done
14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42782 and previous config saved to /var/cache/conftool/dbconfig/20230104-143949-marostegui.json
14:38 marostegui: dbmaint deploy schema change on s3 eqiad T326223
14:38 urbanecm@deploy1002: Finished scap: Backport for Start reading from cul_actor on testwiki (T233004), aswikiquote: Set timezone to Asia/Kolkata (T321246) (duration: 09m 50s)
14:37 marostegui: dbmaint deploy schema change on s5 eqiad T326223
14:32 XioNoX: fix inconsistent mtu on mr1-eqiad - T315838
14:30 urbanecm@deploy1002: urbanecm and urbanecm and zabe: Backport for Start reading from cul_actor on testwiki (T233004), aswikiquote: Set timezone to Asia/Kolkata (T321246) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
14:28 urbanecm@deploy1002: Started scap: Backport for Start reading from cul_actor on testwiki (T233004), aswikiquote: Set timezone to Asia/Kolkata (T321246)
14:27 urbanecm@deploy1002: Finished scap: Backport for plwiki: Add editcontentmodel to interface-admin (T325819), Mark active sections even when their headings are in wrapper elements (T318044 T324869) (duration: 09m 32s)
14:27 XioNoX: fix inconsistent mtu on mr1-codfw - T315838
14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P42781 and previous config saved to /var/cache/conftool/dbconfig/20230104-142442-marostegui.json
14:24 marostegui: dbmaint deploy schema change on s7 eqiad T326227
14:22 XioNoX: fix inconsistent mtu on mr1-eqsin - T315838
14:19 urbanecm@deploy1002: urbanecm and stang and matmarex: Backport for plwiki: Add editcontentmodel to interface-admin (T325819), Mark active sections even when their headings are in wrapper elements (T318044 T324869) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
14:18 urbanecm@deploy1002: Started scap: Backport for plwiki: Add editcontentmodel to interface-admin (T325819), Mark active sections even when their headings are in wrapper elements (T318044 T324869)
14:16 urbanecm@deploy1002: backport aborted: (duration: 00m 07s)
14:16 urbanecm@deploy1002: Finished scap: Backport for Revert "trwiki: Add 20 years celebration logos" (T325823), kuwiki: Install SandboxLink (T325469) (duration: 09m 37s)
14:16 marostegui: Sanitize new wikis T326138 T321294 T321288 T321256
14:15 XioNoX: fix inconsistent mtu on mr1-esams - T315838
14:14 marostegui: dbmaint deploy schema change on s7 eqiad T326228
14:13 marostegui: dbmaint deploy schema change on s7 eqiad T326226
14:11 marostegui: dbmaint deploy schema change on s8 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s7 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s6 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s5 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s4 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s3 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s2 eqiad T326221
14:11 marostegui: dbmaint deploy schema change on s1 eqiad T326221
14:10 marostegui: dbmaint deploy schema change on s7 eqiad T326225
14:10 marostegui: dbmaint deploy schema change on s7 T326225
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetdb2002.codfw.wmnet with reason: maintenance
14:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetdb2002.codfw.wmnet with reason: maintenance
14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P42780 and previous config saved to /var/cache/conftool/dbconfig/20230104-140936-marostegui.json
14:08 urbanecm@deploy1002: urbanecm and stang: Backport for Revert "trwiki: Add 20 years celebration logos" (T325823), kuwiki: Install SandboxLink (T325469) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:06 urbanecm@deploy1002: Started scap: Backport for Revert "trwiki: Add 20 years celebration logos" (T325823), kuwiki: Install SandboxLink (T325469)
14:04 XioNoX: fix inconsistent mtu on mr1-ulsfo - T315838
14:02 marostegui: dbmaint deploy schema change on s3 T326221
14:02 moritzm: updating buster nodes running 5.10 to 5.10.158-2~deb10u1 (only rollout of the new kernel, no reboots)
14:02 urbanecm@deploy1002: Finished scap: Backport for Update interwiki cache (duration: 08m 00s)
13:58 marostegui: dbmaint deploy schema change on s7 T326221
13:57 marostegui: dbmaint deploy schema change on s8 T326221
13:57 marostegui: dbmaint deploy schema change on s6 T326221
13:56 marostegui: dbmaint deploy schema change on s5 T326221
13:55 marostegui: dbmaint deploy schema change on s4 T326221
13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42779 and previous config saved to /var/cache/conftool/dbconfig/20230104-135429-marostegui.json
13:54 urbanecm@deploy1002: Started scap: Backport for Update interwiki cache
13:54 marostegui: dbmaint deploy schema change on s2 T326221
13:53 marostegui: dbmaint deploy schema change on s1 T326221
13:52 urbanecm@deploy1002: Finished scap: Creating gorwiktionary (T326137), fixing aswikiquote logo (T321246) (duration: 07m 52s)
13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42778 and previous config saved to /var/cache/conftool/dbconfig/20230104-134544-marostegui.json
13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
13:45 XioNoX: repool esams-eqiad link for mtu change - T315838
13:44 urbanecm@deploy1002: Started scap: Creating gorwiktionary (T326137), fixing aswikiquote logo (T321246)
13:41 XioNoX: drain esams-eqiad link for mtu change - T315838
13:39 urbanecm@deploy1002: Finished scap: Backport for Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137), Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137) (duration: 38m 23s)
13:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42777 and previous config saved to /var/cache/conftool/dbconfig/20230104-133830-marostegui.json
13:33 XioNoX: fix missmatch MTU on pfw3-codfw - T315838
13:31 urbanecm: New wiki creation will run over by a couple of minutes
13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42776 and previous config saved to /var/cache/conftool/dbconfig/20230104-132323-marostegui.json
13:15 XioNoX: fix missmatch MTU on cloudsw switches - T315838
13:11 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42775 and previous config saved to /var/cache/conftool/dbconfig/20230104-130816-marostegui.json
13:00 urbanecm@deploy1002: Started scap: Backport for Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137), Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137)
12:58 urbanecm@deploy1002: Finished scap: Creating shnwikibooks (T321248) (duration: 07m 38s)
12:56 moritzm: installing emacs security updates
12:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42774 and previous config saved to /var/cache/conftool/dbconfig/20230104-125330-root.json
12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42773 and previous config saved to /var/cache/conftool/dbconfig/20230104-125310-marostegui.json
12:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
12:50 urbanecm@deploy1002: Started scap: Creating shnwikibooks (T321248)
12:48 urbanecm@deploy1002: Finished scap: Creating guwwikiquote (T321247) (duration: 07m 44s)
12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42772 and previous config saved to /var/cache/conftool/dbconfig/20230104-124424-marostegui.json
12:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
12:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42771 and previous config saved to /var/cache/conftool/dbconfig/20230104-124403-marostegui.json
12:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
12:41 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
12:41 urbanecm@deploy1002: Started scap: Creating guwwikiquote (T321247)
12:40 claime: Rolling reboot of api_appserver hosts in codfw paused for https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230104T1200
12:38 urbanecm@deploy1002: Finished scap: Creating aswikiquote (T321246) (duration: 07m 49s)
12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42770 and previous config saved to /var/cache/conftool/dbconfig/20230104-123825-root.json
12:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
12:31 urbanecm@deploy1002: Started scap: Creating aswikiquote (T321246)
12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P42769 and previous config saved to /var/cache/conftool/dbconfig/20230104-122857-marostegui.json
12:27 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
12:26 urbanecm@deploy1002: Finished scap: Backport for Add namespace translations in Wayuu (T321881), Add namespace translations in Wayuu (T321881) (duration: 10m 36s)
12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42768 and previous config saved to /var/cache/conftool/dbconfig/20230104-122320-root.json
12:18 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Add namespace translations in Wayuu (T321881), Add namespace translations in Wayuu (T321881) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
12:16 urbanecm@deploy1002: Started scap: Backport for Add namespace translations in Wayuu (T321881), Add namespace translations in Wayuu (T321881)
12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P42767 and previous config saved to /var/cache/conftool/dbconfig/20230104-121350-marostegui.json
12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42766 and previous config saved to /var/cache/conftool/dbconfig/20230104-120815-root.json
11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42765 and previous config saved to /var/cache/conftool/dbconfig/20230104-115844-marostegui.json
11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42764 and previous config saved to /var/cache/conftool/dbconfig/20230104-115310-root.json
11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42763 and previous config saved to /var/cache/conftool/dbconfig/20230104-115011-marostegui.json
11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2109.codfw.wmnet with reason: Maintenance
11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2109.codfw.wmnet with reason: Maintenance
11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42761 and previous config saved to /var/cache/conftool/dbconfig/20230104-113805-root.json
11:33 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetdb2003.codfw.wmnet
11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2151 to dbctl depooled T326206', diff saved to https://phabricator.wikimedia.org/P42759 and previous config saved to /var/cache/conftool/dbconfig/20230104-112801-marostegui.json
11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42758 and previous config saved to /var/cache/conftool/dbconfig/20230104-112300-root.json
11:02 vgutierrez: testing HAProxy 2.4.20 in cp4037 and cp4045
10:56 vgutierrez: (apt1001) import HAproxy 2.4.20 from third-party repo for buster and bullseye
10:49 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 1098 hosts
10:48 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 1098 hosts
10:48 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 894 hosts
10:47 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 894 hosts
10:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:37 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124 T326206', diff saved to https://phabricator.wikimedia.org/P42756 and previous config saved to /var/cache/conftool/dbconfig/20230104-103109-marostegui.json
10:29 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:29 claime: Rolling reboot of api_appserver hosts in codfw
10:24 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
10:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:14 claime: Rolling reboot of mwdebug hosts in eqiad
10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
10:04 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:04 marostegui: dbmaint eqiad deploy schema change on s5 T326011
10:04 claime: Rolling reboot of mwdebug hosts in codfw
10:04 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:04 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:04 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:03 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:03 filippo@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
10:03 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:03 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:03 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:03 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:03 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
10:03 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
10:03 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:02 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:02 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
10:01 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
10:01 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:00 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
09:53 effie: Upload imposm3_0.11.1-1 to buster-wikimedia - T325293
09:48 XioNoX: drmrs: offload traffic from Tata - T324955
09:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56286
09:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56286
09:37 marostegui: dbmaint codfw deploy schema change on s5 T326011
09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet
09:29 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
09:08 matthiasmullie: UTC morning backports done
09:07 mlitn@deploy1002: Finished scap: Backport for Squashed diff to catch up to wmf/1.40.0-wmf.17 (duration: 08m 13s)
09:01 mlitn@deploy1002: mlitn and mlitn: Backport for Squashed diff to catch up to wmf/1.40.0-wmf.17 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
09:00 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb1003.eqiad.wmnet
08:59 mlitn@deploy1002: Started scap: Backport for Squashed diff to catch up to wmf/1.40.0-wmf.17
08:57 mlitn@deploy1002: Finished scap: Backport for Change IW breakpoint to be enabled on smaller screen (T321377) (duration: 08m 56s)
08:50 mlitn@deploy1002: mlitn and mlitn: Backport for Change IW breakpoint to be enabled on smaller screen (T321377) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
08:48 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
08:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb1003.eqiad.wmnet
08:48 mlitn@deploy1002: Started scap: Backport for Change IW breakpoint to be enabled on smaller screen (T321377)
08:32 mlitn@deploy1002: Finished scap: Backport for Always show search results at full width (T321377) (duration: 08m 22s)
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: After testing', diff saved to https://phabricator.wikimedia.org/P42755 and previous config saved to /var/cache/conftool/dbconfig/20230104-082942-root.json
08:26 marostegui: dbmaint codfw deploy schema change on s8 T326011
08:26 marostegui: dbmaint eqiad deploy schema change on s8 T326011
08:26 marostegui: dbmaint eqiad deploy schema change on s4 T326011
08:26 marostegui: dbmaint codfw deploy schema change on s4 T326011
08:26 marostegui: dbmaint codfw deploy schema change on s4 T255174
08:26 marostegui: dbmaint eqiad deploy schema change on s4 T255174
08:25 mlitn@deploy1002: mlitn and mlitn: Backport for Always show search results at full width (T321377) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:23 mlitn@deploy1002: Started scap: Backport for Always show search results at full width (T321377)
08:22 marostegui: dbmaint eqiad deploy schema change on s8 T255174
08:20 marostegui: dbmaint codfw deploy schema change on s8 T255174
08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: After testing', diff saved to https://phabricator.wikimedia.org/P42754 and previous config saved to /var/cache/conftool/dbconfig/20230104-081437-root.json
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: After testing', diff saved to https://phabricator.wikimedia.org/P42753 and previous config saved to /var/cache/conftool/dbconfig/20230104-075932-root.json
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: After testing', diff saved to https://phabricator.wikimedia.org/P42752 and previous config saved to /var/cache/conftool/dbconfig/20230104-074427-root.json
07:38 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
07:38 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
07:38 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
07:38 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
07:38 marostegui: Switch x1 back to RBR T255174
07:35 marostegui: dbmaint codfw deploy schema change on x1 T255174
07:35 marostegui: dbmaint eqiad deploy schema change on x1 T255174
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: After testing', diff saved to https://phabricator.wikimedia.org/P42751 and previous config saved to /var/cache/conftool/dbconfig/20230104-072922-root.json
07:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
07:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
07:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
07:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: After testing', diff saved to https://phabricator.wikimedia.org/P42750 and previous config saved to /var/cache/conftool/dbconfig/20230104-071417-root.json
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: After testing', diff saved to https://phabricator.wikimedia.org/P42749 and previous config saved to /var/cache/conftool/dbconfig/20230104-065912-root.json

2023-01-03

22:47 eileen: config 34754c69 -> 03c4d7a6
22:33 eileen: config revision changed from 5c73975a to 34754c69
21:55 mutante: gitlab-runner* - correction: allowing connections TO kubestagemaster.svc.eqiad.wmnet port 6443 FROM trusted runners, of course - T325385
21:53 mutante: gitlab-runner* - allowing kubestagemaster.svc.eqiad.wmnet to connect to port 6443, run puppet via cumin, deploy gerrit:868737 - T325385
21:47 taavi: UTC late backports done
21:46 taavi@deploy1002: Finished scap: Backport for Specify Citoid RESTBase URL separately (T325425), Use new DiscussionTools heading markup on group1 wikis (T314714) (duration: 12m 12s)
21:35 taavi@deploy1002: taavi and matmarex: Backport for Specify Citoid RESTBase URL separately (T325425), Use new DiscussionTools heading markup on group1 wikis (T314714) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:34 taavi@deploy1002: Started scap: Backport for Specify Citoid RESTBase URL separately (T325425), Use new DiscussionTools heading markup on group1 wikis (T314714)
21:30 taavi@deploy1002: Finished scap: Backport for Start writing to cuc_comment_id on test wikis (T233004) (duration: 12m 54s)
21:19 taavi@deploy1002: taavi and zabe: Backport for Start writing to cuc_comment_id on test wikis (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:17 taavi@deploy1002: Started scap: Backport for Start writing to cuc_comment_id on test wikis (T233004)
21:15 taavi@deploy1002: Finished scap: Backport for Stop setting $wgActorTableSchemaMigrationStage (T215466), Pin $wgCommentTempTableSchemaMigrationStage to default value (T299954), Pin cu_changes comment migration to old schema (T233004) (duration: 08m 49s)
21:08 taavi@deploy1002: taavi and zabe: Backport for Stop setting $wgActorTableSchemaMigrationStage (T215466), Pin $wgCommentTempTableSchemaMigrationStage to default value (T299954), Pin cu_changes comment migration to old schema (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:06 taavi@deploy1002: Started scap: Backport for Stop setting $wgActorTableSchemaMigrationStage (T215466), Pin $wgCommentTempTableSchemaMigrationStage to default value (T299954), Pin cu_changes comment migration to old schema (T233004)
19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.17 refs T325580
19:18 dduvall@deploy1002: deploy-promote aborted: (duration: 08m 55s)
19:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1001.eqiad.wmnet with OS bullseye
17:37 claime: Finished parse reboots in eqiad
17:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
17:30 sukhe: sudo cumin -b 1 -s 5 'A:codfw and P{O:swift::proxy}' 'depool && sleep 3 && systemctl restart swift-proxy && sleep 3 && pool'
16:40 ejegg: fundraising EOY receipt calculation finished, restarted scheduled jobs
16:21 ejegg: fundraising scheduled jobs disabled for EOY receipt calculation
15:37 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
15:30 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1001.eqiad.wmnet with OS bullseye
15:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:13 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
15:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:13 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
15:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:11 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
15:10 andrewbogott: upgrading and rebooting wikitech-static
15:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
15:06 claime: Starting rolling reboot of parse* hosts in eqiad
15:05 taavi: UTC afternoon backports done
15:04 taavi@deploy1002: Finished scap: Backport for SecurePoll: Add files for UCoC 2023 vote (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793) (duration: 08m 10s)
15:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts graphite1004.eqiad.wmnet
14:59 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:59 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: graphite1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
14:58 taavi@deploy1002: taavi and taavi: Backport for SecurePoll: Add files for UCoC 2023 vote (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:56 taavi@deploy1002: Started scap: Backport for SecurePoll: Add files for UCoC 2023 vote (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793)
14:53 taavi@deploy1002: Finished scap: Backport for Revert "Revert "Start mobile DiscussionTools A/B test"" (T321961) (duration: 09m 13s)
14:48 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: graphite1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
14:45 taavi@deploy1002: taavi and matmarex: Backport for Revert "Revert "Start mobile DiscussionTools A/B test"" (T321961) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:44 filippo@cumin1001: START - Cookbook sre.dns.netbox
14:44 taavi@deploy1002: Started scap: Backport for Revert "Revert "Start mobile DiscussionTools A/B test"" (T321961)
14:41 taavi@deploy1002: Finished scap: Backport for Log token for the DiscussionTools mobile a/b test (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961), a/b test anonymous ID was being reset because of cookie prefixes (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961) (duration: 08m 31s)
14:39 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts graphite1004.eqiad.wmnet
14:34 taavi@deploy1002: taavi and matmarex: Backport for Log token for the DiscussionTools mobile a/b test (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961), a/b test anonymous ID was being reset because of cookie prefixes (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961) synced to the testservers:
14:33 taavi@deploy1002: Started scap: Backport for Log token for the DiscussionTools mobile a/b test (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961), a/b test anonymous ID was being reset because of cookie prefixes (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961)
14:13 oblivian@deploy1002: Finished scap: Backport for etcd: use the v3-style SRV record (T320397) (duration: 07m 58s)
14:07 oblivian@deploy1002: oblivian and oblivian: Backport for etcd: use the v3-style SRV record (T320397) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:05 oblivian@deploy1002: Started scap: Backport for etcd: use the v3-style SRV record (T320397)
13:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
13:46 moritzm: installing libksba security updates
13:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab1004.eqiad.wmnet
13:19 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host phab1004.eqiad.wmnet
12:33 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (duration: 02m 49s)
12:30 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling
12:28 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): pushing wmf-puppet-dashboard updates for enc git handling (duration: 01m 12s)
12:27 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): pushing wmf-puppet-dashboard updates for enc git handling
11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P42744 and previous config saved to /var/cache/conftool/dbconfig/20230103-114030-marostegui.json
11:35 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
11:34 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
11:34 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
11:33 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
11:30 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2001.wikimedia.org
11:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
11:25 claime: Starting rolling reboot of parse* hosts in codfw
11:06 hashar: contint2001: starting Jenkins manually
11:04 marostegui: Change x1 binlog format to STATEMENT T255174
11:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
10:59 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
10:59 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org
10:58 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2002.wikimedia.org
10:53 marostegui: Restart eqiad sanitarium T326105
10:53 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2002.wikimedia.org
10:50 marostegui: Restart codfw sanitarium masters T326105
10:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1002.wikimedia.org
10:43 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint1002.wikimedia.org
10:37 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
10:36 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
10:36 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit1001.wikimedia.org
10:31 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit1001.wikimedia.org
10:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
10:18 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
09:27 vgutierrez: restarting varnish on cp5032 to clear VarnishChildRestarted alert - T325797
08:19 kartik@deploy1002: Finished scap: Backport for Content Translation: Move ttwiki out of Beta (T319177) (duration: 16m 09s)
08:16 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse1002.eqiad.wmnet
08:12 moritzm: installing Linux 4.19.269 on Buster hosts
08:12 phedenskog@deploy1002: Finished deploy [performance/navtiming@4f8c010]: (no justification provided) (duration: 00m 08s)
08:12 phedenskog@deploy1002: Started deploy [performance/navtiming@4f8c010]: (no justification provided)
08:05 kartik@deploy1002: kartik and kartik: Backport for Content Translation: Move ttwiki out of Beta (T319177) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
08:03 kartik@deploy1002: Started scap: Backport for Content Translation: Move ttwiki out of Beta (T319177)
04:58 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.17 refs T325580 (duration: 55m 31s)
04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.17 refs T325580

2023-01-02

10:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host otrs1001.eqiad.wmnet
10:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host otrs1001.eqiad.wmnet

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s