Server Admin Log/Archive 62
Appearance
2023-01-31
- 23:51 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS bullseye
- 23:45 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3055.esams.wmnet with OS bullseye
- 23:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
- 23:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS bullseye
- 23:34 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
- 23:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3054.esams.wmnet with OS bullseye
- 22:54 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2040.codfw.wmnet
- 22:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2040.codfw.wmnet with OS bullseye
- 22:35 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), Stop writing to cuc_comment in testwiki (T233004) (duration: 07m 34s)
- 22:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
- 22:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
- 22:30 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), Stop writing to cuc_comment in testwiki (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
- 22:28 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), Stop writing to cuc_comment in testwiki (T233004)
- 22:26 zabe@deploy1002: Finished scap: Backport for Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097) (duration: 08m 43s)
- 22:19 zabe@deploy1002: zabe and bawolff: Backport for Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 22:17 zabe@deploy1002: Started scap: Backport for Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097)
- 22:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2040.codfw.wmnet with OS bullseye
- 22:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet
- 22:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2038.codfw.wmnet with OS bullseye
- 22:07 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=ats-be
- 22:07 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=cdn
- 22:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5020.eqsin.wmnet with OS bullseye
- 21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
- 21:44 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cassandra-dev2002.codfw.wmnet
- 21:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
- 21:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host cassandra-dev2002.codfw.wmnet
- 21:36 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
- 21:35 kindrobot: close UTC late backport window. Did not deploy bawolff 884142 as I ran out of time. zabe may reopen the window in around 30 minutes to finish it out
- 21:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
- 21:33 kindrobot@deploy1002: Finished scap: Backport for Enable ClientPreferences for group0 (T327979) (duration: 10m 17s)
- 21:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
- 21:29 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
- 21:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2038.codfw.wmnet with OS bullseye
- 21:25 kindrobot@deploy1002: kindrobot and nray: Backport for Enable ClientPreferences for group0 (T327979) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 21:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2039.codfw.wmnet
- 21:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet
- 21:23 kindrobot@deploy1002: Started scap: Backport for Enable ClientPreferences for group0 (T327979)
- 21:17 kindrobot@deploy1002: Finished scap: Backport for Enable Linter write namespace, tag and template for group0 and group1 (T299612) (duration: 13m 20s)
- 21:06 kindrobot@deploy1002: sbailey and kindrobot: Backport for Enable Linter write namespace, tag and template for group0 and group1 (T299612) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 21:04 kindrobot@deploy1002: Started scap: Backport for Enable Linter write namespace, tag and template for group0 and group1 (T299612)
- 21:04 jgleeson: smashpig updated from d1434aeb to 683df497
- 21:03 kindrobot: start UTC late backport window
- 20:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
- 20:57 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5020.eqsin.wmnet with OS bullseye
- 20:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2036.codfw.wmnet with OS bullseye
- 20:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS bullseye
- 20:45 zabe: start running "foreachwikiindblist s5.dblist migrateRevisionCommentTemp.php --sleep 2" in screen # T275246
- 20:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
- 20:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
- 20:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
- 20:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
- 20:11 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2036.codfw.wmnet with OS bullseye
- 20:09 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5029.eqsin.wmnet,service=ats-be
- 20:09 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5029.eqsin.wmnet,service=cdn
- 20:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS bullseye
- 20:05 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet
- 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5029.eqsin.wmnet with OS bullseye
- 20:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2037.codfw.wmnet with OS bullseye
- 20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
- 19:59 sukhe: sudo rm /etc/dhcp/automation/ttyS1-115200/cp5020.conf
- 19:58 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5020.eqsin.wmnet with OS bullseye
- 19:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
- 19:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
- 19:40 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
- 19:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
- 19:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
- 19:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2037.codfw.wmnet with OS bullseye
- 19:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
- 19:16 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
- 19:12 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.21 refs T325584
- 18:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-be
- 18:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=cdn
- 18:44 mutante: gitlab-prod-1001.devtools (cloud) - rebooted VM ; ip addr del 172.16.7.146/32 dev eth0 - T318521
- 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
- 18:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
- 18:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2034.codfw.wmnet with OS bullseye
- 18:26 mutante: gitlab-prod-1001.devtools (cloud) - ip addr del 172.16.7.146/21 dev eth0 - T318521
- 18:25 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 18:25 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 18:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1075']
- 18:24 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1075']
- 18:22 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1075.eqiad.wmnet']
- 18:22 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1075.eqiad.wmnet']
- 18:21 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1075.eqiad.wmnet
- 18:21 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1075.eqiad.wmnet
- 18:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
- 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
- 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=cdn
- 18:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
- 18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
- 18:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
- 18:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
- 17:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp5029.eqsin.wmnet with OS bullseye
- 17:53 sukhe: depool cp1075.eqiad.wmnet for iDRAC firmware testing: T321309
- 17:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
- 17:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=cdn
- 17:50 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2034.codfw.wmnet with OS bullseye
- 17:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp5019.eqsin.wmnet
- 17:47 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp5019.eqsin.wmnet
- 17:39 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1090.eqiad.wmnet
- 17:38 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1090.eqiad.wmnet
- 17:38 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1076.eqiad.wmnet
- 17:37 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1076.eqiad.wmnet
- 17:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet,service=ats-be
- 17:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet,service=cdn
- 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5018.eqsin.wmnet with OS bullseye
- 17:33 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet
- 17:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5028.eqsin.wmnet,service=ats-be
- 17:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5028.eqsin.wmnet,service=cdn
- 17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5028.eqsin.wmnet with OS bullseye
- 17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
- 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
- 17:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=ats-be
- 17:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=cdn
- 17:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5029.eqsin.wmnet,service=ats-be
- 17:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5029.eqsin.wmnet,service=cdn
- 17:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2032.codfw.wmnet with OS bullseye
- 17:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp5019.eqsin.wmnet
- 17:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
- 17:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
- 17:03 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp5019.eqsin.wmnet
- 16:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
- 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
- 16:54 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
- 16:54 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
- 16:52 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 11s)
- 16:52 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
- 16:52 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 16:52 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 10s)
- 16:52 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 16:52 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
- 16:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS bullseye
- 16:49 mutante: mw2271 - renabling disabled puppet
- 16:49 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2032.codfw.wmnet with OS bullseye
- 16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 16:45 cwhite@cumin2002: conftool action : set/weight=10; selector: name=logstash2032.codfw.wmnet
- 16:44 cwhite@cumin2002: conftool action : set/weight=10; selector: name=logstash1032.eqiad.wmnet
- 16:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 16:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 16:40 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:38 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 16:37 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:37 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 16:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
- 16:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
- 16:29 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Grants:Programs/Wikimedia Community Fund" "Grants:Programs/Wikimedia Community Fund/General Support Fund" "Zabe" --reason "per request T328456" --skip-subpages # T328456
- 16:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet,service=ats-be
- 16:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet,service=cdn
- 16:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS bullseye
- 16:20 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 16:19 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 16:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
- 16:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS bullseye
- 16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5028.eqsin.wmnet with OS bullseye
- 16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5018.eqsin.wmnet with OS bullseye
- 16:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bullseye
- 16:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
- 16:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
- 16:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS bullseye
- 16:01 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
- 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
- 15:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS bullseye
- 15:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
- 15:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
- 15:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=cdn
- 15:54 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS bullseye
- 15:49 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagemaster1001.eqiad.wmnet with OS bullseye
- 15:40 ladsgroup@deploy1002: Finished scap: Backport for Set 'groupLoadsBySection' for s11 (T326980) (duration: 09m 49s)
- 15:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 15:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 15:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
- 15:32 ladsgroup@deploy1002: ladsgroup and zabe: Backport for Set 'groupLoadsBySection' for s11 (T326980) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 15:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
- 15:30 ladsgroup@deploy1002: Started scap: Backport for Set 'groupLoadsBySection' for s11 (T326980)
- 15:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2035.codfw.wmnet with OS bullseye
- 15:23 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bullseye
- 15:20 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagemaster1001.eqiad.wmnet with OS bullseye
- 15:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
- 15:01 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1005.eqiad.wmnet with OS bullseye
- 15:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
- 14:56 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1004.eqiad.wmnet with OS bullseye
- 14:56 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1006.eqiad.wmnet with OS bullseye
- 14:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1005.eqiad.wmnet with reason: host reimage
- 14:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 14:46 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 14:46 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1004.eqiad.wmnet with reason: host reimage
- 14:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 14:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 14:43 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1006.eqiad.wmnet with reason: host reimage
- 14:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2035.codfw.wmnet with OS bullseye
- 14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1005.eqiad.wmnet with reason: host reimage
- 14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1004.eqiad.wmnet with reason: host reimage
- 14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1006.eqiad.wmnet with reason: host reimage
- 14:34 urbanecm@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) (duration: 07m 23s)
- 14:33 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1006.eqiad.wmnet with OS bullseye
- 14:33 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1005.eqiad.wmnet with OS bullseye
- 14:32 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1004.eqiad.wmnet with OS bullseye
- 14:28 urbanecm@deploy1002: dreamyjazz and urbanecm and daniel: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwde
- 14:26 urbanecm@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534)
- 14:20 urbanecm@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) (duration: 16m 33s)
- 14:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 14:07 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 14:06 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 14:05 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 14:05 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 14:05 urbanecm@deploy1002: urbanecm and dreamyjazz and daniel: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwde
- 14:05 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 14:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
- 14:04 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
- 14:03 urbanecm@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534)
- 14:01 urbanecm@deploy1002: Backport cancelled.
- 12:36 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 35s)
- 12:36 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
- 11:51 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@42a07d3] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 35s)
- 11:50 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@42a07d3] (eqiad): Disable traffic mirroring from codfw to eqiad
- 11:21 moritzm: installing bind9 security updates (client-side tools/libs only)
- 10:57 jayme@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=k8s-ingress-staging
- 10:57 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=k8s-ingress-staging
- 10:18 jayme: switching active kubernetes staging cluster from eqiad to codfw - T327664
- 09:20 marostegui: dbmaint Install MariaDB 10.6 on db2093 (db_inventory) T328408
- 09:05 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 09:00 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text in testwiki (T233004) (duration: 08m 11s)
- 09:00 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 08:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 08:54 elukey: roll restart kafka on kafka-logging1001 to pick up new pki certs
- 08:53 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text in testwiki (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 08:51 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text in testwiki (T233004)
- 08:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 08:45 elukey: restore previously removed password for keystore to kafka-logging clusters
- 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 08:36 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 07:56 moritzm: installing bash bugfix updates from Bullseye point release
- 07:22 marostegui: dbmaint Schema change on s3 eqiad T328373
- 07:22 marostegui: dbmaint Schema change on s1 eqiad T328373
- 07:10 marostegui: Failover m2 from db1164 to db1195 - T328253
- 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
- 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
- 07:03 marostegui: dbmaint Schema change on s5 eqiad T328373
- 06:59 marostegui: dbmaint Schema change on s7 eqiad T328373
- 06:57 marostegui: dbmaint Schema change on s4 eqiad T328373
- 06:52 marostegui: dbmaint Schema change on s8 eqiad T328373
- 05:02 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.19 (duration: 02m 15s)
- 05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
- 05:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
- 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.21 refs T325584 (duration: 52m 56s)
- 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.21 refs T325584
- 02:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet,service=ats-be
- 02:43 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet,service=cdn
- 02:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3053.esams.wmnet with OS bullseye
- 02:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
- 01:59 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
- 01:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
- 01:33 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3053.esams.wmnet']
- 01:31 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3053.esams.wmnet']
- 00:50 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet
- 00:42 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5027.eqsin.wmnet with OS bullseye
- 00:14 mutante: etherpad - maintenance downtime for about 5 minutes to test monitoring
- 00:09 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
- 00:06 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
2023-01-30
- 23:30 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS bullseye
- 23:29 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3053.esams.wmnet with OS bullseye
- 23:07 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
- 22:58 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
- 22:50 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3053.esams.wmnet with OS bullseye
- 22:38 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
- 22:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-be
- 22:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=cdn
- 22:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2030.codfw.wmnet with OS bullseye
- 21:56 urbanecm@deploy1002: Finished scap: Backport for Try to determine what's adding to Parsoid init times (T328201), Update interwiki cache (duration: 12m 24s)
- 21:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
- 21:46 urbanecm@deploy1002: arlolra and urbanecm: Backport for Try to determine what's adding to Parsoid init times (T328201), Update interwiki cache synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
- 21:44 urbanecm@deploy1002: Started scap: Backport for Try to determine what's adding to Parsoid init times (T328201), Update interwiki cache
- 21:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
- 21:42 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Update campaign configuration (T321370) (duration: 08m 47s)
- 21:35 urbanecm@deploy1002: tgr and urbanecm: Backport for GrowthExperiments: Update campaign configuration (T321370) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 21:34 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2020.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
- 21:34 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Update campaign configuration (T321370)
- 21:33 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
- 21:31 urbanecm@deploy1002: Finished scap: Backport for Enable WelcomeSurvey at viwiki (T325376), Fix grid blowout with limited width turned off (T327423), Support new style of table of contents (T327942) (duration: 09m 52s)
- 21:26 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bullseye
- 21:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2030.codfw.wmnet with OS bullseye
- 21:24 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2020.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
- 21:23 urbanecm@deploy1002: tgr and urbanecm and jdlrobson and legoktm: Backport for Enable WelcomeSurvey at viwiki (T325376), Fix grid blowout with limited width turned off (T327423), Support new style of table of contents (T327942) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 21:21 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2019.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
- 21:21 urbanecm@deploy1002: Started scap: Backport for Enable WelcomeSurvey at viwiki (T325376), Fix grid blowout with limited width turned off (T327423), Support new style of table of contents (T327942)
- 21:21 urbanecm@deploy1002: Finished scap: Backport for InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387) (duration: 19m 51s)
- 21:11 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2019.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
- 21:03 urbanecm@deploy1002: urbanecm and musikanimal: Backport for InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 21:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-be
- 21:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=cdn
- 21:01 urbanecm@deploy1002: Started scap: Backport for InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387)
- 20:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
- 20:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
- 20:51 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
- 20:35 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
- 20:35 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
- 20:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2033.codfw.wmnet with OS bullseye
- 20:23 zabe@deploy1002: Finished scap: Backport for slwiki: Raise AF emergency disable treshold+count (T328366) (duration: 07m 32s)
- 20:17 zabe@deploy1002: zabe: Backport for slwiki: Raise AF emergency disable treshold+count (T328366) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 20:16 zabe@deploy1002: Started scap: Backport for slwiki: Raise AF emergency disable treshold+count (T328366)
- 20:15 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
- 20:14 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
- 20:12 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS bullseye
- 20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
- 20:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
- 19:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet,service=ats-be
- 19:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet,service=cdn
- 19:50 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
- 19:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3052.esams.wmnet with OS bullseye
- 19:47 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
- 19:44 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2033.codfw.wmnet with OS bullseye
- 19:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 19:35 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 19:26 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
- 19:26 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4044.ulsfo.wmnet with OS bullseye
- 19:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
- 19:22 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
- 19:21 cstone: payments-wiki upgraded from 653c7cc8 to f20a2208
- 19:16 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
- 19:15 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4051.ulsfo.wmnet
- 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
- 18:46 sukhe@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp3052.esams.wmnet']
- 18:46 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
- 18:46 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3052.esams.wmnet']
- 18:45 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
- 18:45 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3052.esams.wmnet with OS bullseye
- 18:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
- 18:37 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3052.esams.wmnet with OS bullseye
- 18:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3052.esams.wmnet']
- 18:37 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
- 18:34 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4051.ulsfo.wmnet with OS bullseye
- 18:29 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
- 18:29 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
- 18:19 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
- 18:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3052.esams.wmnet with OS bullseye
- 18:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
- 18:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3052.esams.wmnet
- 18:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
- 18:04 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
- 18:01 urbanecm@deploy1002: Finished scap: Backport for [Growth] Remove wgGERecentChangesUnstarredMenteesFilterEnabled (duration: 07m 59s)
- 17:53 urbanecm@deploy1002: Started scap: Backport for [Growth] Remove wgGERecentChangesUnstarredMenteesFilterEnabled
- 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43517 and previous config saved to /var/cache/conftool/dbconfig/20230130-174957-ladsgroup.json
- 17:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3052.esams.wmnet
- 17:43 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS bullseye
- 17:43 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4051.ulsfo.wmnet with OS bullseye
- 17:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=ats-be
- 17:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=cdn
- 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43516 and previous config saved to /var/cache/conftool/dbconfig/20230130-173450-ladsgroup.json
- 17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=ats-be
- 17:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=cdn
- 17:31 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS bullseye
- 17:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet
- 17:27 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS bullseye
- 17:24 inflatador: bking@build2001 rebuilding docker images for 884351 complete
- 17:22 inflatador: bking@build2001 rebuilding docker images for 884351
- 17:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5026.eqsin.wmnet with OS bullseye
- 17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43515 and previous config saved to /var/cache/conftool/dbconfig/20230130-171944-ladsgroup.json
- 17:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3050.esams.wmnet with OS bullseye
- 17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43514 and previous config saved to /var/cache/conftool/dbconfig/20230130-170437-ladsgroup.json
- 16:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
- 16:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
- 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43513 and previous config saved to /var/cache/conftool/dbconfig/20230130-165359-ladsgroup.json
- 16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43512 and previous config saved to /var/cache/conftool/dbconfig/20230130-165348-ladsgroup.json
- 16:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
- 16:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
- 16:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
- 16:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
- 16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43511 and previous config saved to /var/cache/conftool/dbconfig/20230130-163842-ladsgroup.json
- 16:35 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
- 16:35 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS bullseye
- 16:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
- 16:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
- 16:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
- 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43510 and previous config saved to /var/cache/conftool/dbconfig/20230130-162336-ladsgroup.json
- 16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=ats-be
- 16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=cdn
- 16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-be
- 16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=cdn
- 16:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3050.esams.wmnet with OS bullseye
- 16:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
- 16:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2029.codfw.wmnet with OS bullseye
- 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43509 and previous config saved to /var/cache/conftool/dbconfig/20230130-161324-root.json
- 16:11 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
- 16:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
- 16:10 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
- 16:10 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3050.esams.wmnet,service=ats-be
- 16:10 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3050.esams.wmnet,service=cdn
- 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43508 and previous config saved to /var/cache/conftool/dbconfig/20230130-160829-ladsgroup.json
- 16:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS bullseye
- 16:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5026.eqsin.wmnet with OS bullseye
- 16:03 sukhe: racreset cp3050.esams.wmnet: firmware cookbook iDRAC upgrade test
- 16:03 moritzm: upgrading idp-test to latest Java security update
- 15:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
- 15:59 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
- 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43507 and previous config saved to /var/cache/conftool/dbconfig/20230130-155819-root.json
- 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43506 and previous config saved to /var/cache/conftool/dbconfig/20230130-155802-ladsgroup.json
- 15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43505 and previous config saved to /var/cache/conftool/dbconfig/20230130-155747-ladsgroup.json
- 15:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS bullseye
- 15:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
- 15:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
- 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43504 and previous config saved to /var/cache/conftool/dbconfig/20230130-154314-root.json
- 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43503 and previous config saved to /var/cache/conftool/dbconfig/20230130-154241-ladsgroup.json
- 15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3051.esams.wmnet with OS bullseye
- 15:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2029.codfw.wmnet with OS bullseye
- 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43502 and previous config saved to /var/cache/conftool/dbconfig/20230130-152809-root.json
- 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43501 and previous config saved to /var/cache/conftool/dbconfig/20230130-152734-ladsgroup.json
- 15:14 marostegui: Retrospective: Starting s4 codfw failover from db2110 to db2140 - T328022
- 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43500 and previous config saved to /var/cache/conftool/dbconfig/20230130-151304-root.json
- 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43499 and previous config saved to /var/cache/conftool/dbconfig/20230130-151228-ladsgroup.json
- 15:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
- 15:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
- 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43498 and previous config saved to /var/cache/conftool/dbconfig/20230130-150132-ladsgroup.json
- 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43497 and previous config saved to /var/cache/conftool/dbconfig/20230130-145759-root.json
- 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 T328022', diff saved to https://phabricator.wikimedia.org/P43496 and previous config saved to /var/cache/conftool/dbconfig/20230130-145508-root.json
- 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary T328022', diff saved to https://phabricator.wikimedia.org/P43495 and previous config saved to /var/cache/conftool/dbconfig/20230130-145421-root.json
- 14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43494 and previous config saved to /var/cache/conftool/dbconfig/20230130-145229-ladsgroup.json
- 14:47 moritzm: updating puppetdb 7 hosts to 7.12.1 T321783
- 14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable Linter write namespace, tag and template from core, group0 (T299612) (duration: 11m 11s)
- 14:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3051.esams.wmnet with OS bullseye
- 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43493 and previous config saved to /var/cache/conftool/dbconfig/20230130-144213-ladsgroup.json
- 14:38 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43492 and previous config saved to /var/cache/conftool/dbconfig/20230130-143723-ladsgroup.json
- 14:36 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and sbailey: Backport for Enable Linter write namespace, tag and template from core, group0 (T299612) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 14:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable Linter write namespace, tag and template from core, group0 (T299612)
- 14:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Remove references to mediawiki.Uri" (T328143), Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143) (duration: 12m 07s)
- 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43491 and previous config saved to /var/cache/conftool/dbconfig/20230130-142708-ladsgroup.json
- 14:22 lucaswerkmeister-wmde@deploy1002: matmarex and lucaswerkmeister-wmde: Backport for Revert "Remove references to mediawiki.Uri" (T328143), Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43490 and previous config saved to /var/cache/conftool/dbconfig/20230130-142216-ladsgroup.json
- 14:21 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Remove references to mediawiki.Uri" (T328143), Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143)
- 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
- 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
- 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 T328022', diff saved to https://phabricator.wikimedia.org/P43489 and previous config saved to /var/cache/conftool/dbconfig/20230130-141822-root.json
- 14:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
- 14:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
- 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43488 and previous config saved to /var/cache/conftool/dbconfig/20230130-141203-ladsgroup.json
- 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43487 and previous config saved to /var/cache/conftool/dbconfig/20230130-140710-ladsgroup.json
- 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43486 and previous config saved to /var/cache/conftool/dbconfig/20230130-135659-ladsgroup.json
- 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43485 and previous config saved to /var/cache/conftool/dbconfig/20230130-135632-ladsgroup.json
- 13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
- 13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
- 13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 13:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43484 and previous config saved to /var/cache/conftool/dbconfig/20230130-134406-ladsgroup.json
- 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 13:31 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 23s)
- 13:29 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
- 13:29 godog: bounce logstash on logstash1025 -- GC unhappy causing kafka lag
- 13:29 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 01m 13s)
- 13:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
- 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43483 and previous config saved to /var/cache/conftool/dbconfig/20230130-132701-ladsgroup.json
- 13:23 awight@deploy1002: Finished scap: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113) (duration: 08m 34s)
- 13:21 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 11s)
- 13:21 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
- 13:21 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 00m 22s)
- 13:20 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
- 13:16 awight@deploy1002: awight: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 13:14 awight@deploy1002: Started scap: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113)
- 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43482 and previous config saved to /var/cache/conftool/dbconfig/20230130-131155-ladsgroup.json
- 13:00 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 12:59 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3004.wikimedia.org
- 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:57 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43481 and previous config saved to /var/cache/conftool/dbconfig/20230130-125648-ladsgroup.json
- 12:56 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 12:55 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 12:55 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 12:55 awight@deploy1002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.40.0-wmf.20" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.2oaGSEpQR1"' returned non-zero exit status 255. (duration: 00m 00s)
- 12:55 awight@deploy1002: Started scap: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113)
- 12:46 awight@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f]: Roll back kartotherian (duration: 01m 27s)
- 12:45 awight@deploy1002: Started deploy [kartotherian/deploy@5c58f8f]: Roll back kartotherian
- 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43479 and previous config saved to /var/cache/conftool/dbconfig/20230130-124142-ladsgroup.json
- 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43478 and previous config saved to /var/cache/conftool/dbconfig/20230130-123004-ladsgroup.json
- 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43477 and previous config saved to /var/cache/conftool/dbconfig/20230130-122943-ladsgroup.json
- 12:25 awight@deploy1002: Finished deploy [kartotherian/deploy@42a07d3]: Disable traffic mirroring from codfw to eqiad (duration: 02m 44s)
- 12:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:23 awight@deploy1002: Started deploy [kartotherian/deploy@42a07d3]: Disable traffic mirroring from codfw to eqiad
- 12:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43476 and previous config saved to /var/cache/conftool/dbconfig/20230130-121437-ladsgroup.json
- 12:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3004.wikimedia.org
- 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43475 and previous config saved to /var/cache/conftool/dbconfig/20230130-115930-ladsgroup.json
- 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast6001.wikimedia.org
- 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 11:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 11:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast6001.wikimedia.org
- 11:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42473
- 11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42473
- 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43474 and previous config saved to /var/cache/conftool/dbconfig/20230130-114424-ladsgroup.json
- 11:42 moritzm: installing install4002 T327867
- 11:42 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1005.eqiad.wmnet
- 11:41 Amir1: dropping old wikiadmin user (T326802)
- 11:35 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1005.eqiad.wmnet
- 11:35 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1004.eqiad.wmnet
- 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43473 and previous config saved to /var/cache/conftool/dbconfig/20230130-113319-ladsgroup.json
- 11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 11:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43472 and previous config saved to /var/cache/conftool/dbconfig/20230130-113254-ladsgroup.json
- 11:28 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1004.eqiad.wmnet
- 11:24 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
- 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install4002.wikimedia.org
- 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43471 and previous config saved to /var/cache/conftool/dbconfig/20230130-111748-ladsgroup.json
- 11:17 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
- 11:11 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
- 11:09 phedenskog@deploy1002: Finished deploy [performance/navtiming@4e5ff3f]: (no justification provided) (duration: 00m 05s)
- 11:09 phedenskog@deploy1002: Started deploy [performance/navtiming@4e5ff3f]: (no justification provided)
- 11:05 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
- 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install4002.wikimedia.org on all recursors
- 11:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install4002.wikimedia.org on all recursors
- 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4002.wikimedia.org - jmm@cumin2002"
- 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4002.wikimedia.org - jmm@cumin2002"
- 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43470 and previous config saved to /var/cache/conftool/dbconfig/20230130-110241-ladsgroup.json
- 11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
- 10:49 ladsgroup@deploy1002: Finished scap: Backport for Enable write both for externallinks except s4, s7, s8 (T321662) (duration: 13m 10s)
- 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43468 and previous config saved to /var/cache/conftool/dbconfig/20230130-104735-ladsgroup.json
- 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast4003.wikimedia.org
- 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 10:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 10:37 ladsgroup@deploy1002: ladsgroup: Backport for Enable write both for externallinks except s4, s7, s8 (T321662) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 10:36 ladsgroup@deploy1002: Started scap: Backport for Enable write both for externallinks except s4, s7, s8 (T321662)
- 10:36 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43467 and previous config saved to /var/cache/conftool/dbconfig/20230130-103540-ladsgroup.json
- 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 10:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 10:30 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4003.wikimedia.org
- 10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43466 and previous config saved to /var/cache/conftool/dbconfig/20230130-102500-ladsgroup.json
- 10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593
- 10:17 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts bast4003.wikimedia.org
- 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4003.wikimedia.org
- 10:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 14593
- 10:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 49544
- 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43465 and previous config saved to /var/cache/conftool/dbconfig/20230130-100954-ladsgroup.json
- 10:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 49544
- 10:00 awight@deploy1002: Finished scap: Backport for Enable kartographer external data parse time fetch for all wikis (T326317) (duration: 07m 53s)
- 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43464 and previous config saved to /var/cache/conftool/dbconfig/20230130-095447-ladsgroup.json
- 09:54 awight@deploy1002: lilients and awight: Backport for Enable kartographer external data parse time fetch for all wikis (T326317) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 09:52 awight@deploy1002: Started scap: Backport for Enable kartographer external data parse time fetch for all wikis (T326317)
- 09:52 XioNoX: push pfw policies - T328085
- 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43463 and previous config saved to /var/cache/conftool/dbconfig/20230130-093941-ladsgroup.json
- 09:29 jynus: disabling puppet on dbprov2004 to reorganize partitions T327155
- 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43462 and previous config saved to /var/cache/conftool/dbconfig/20230130-092804-ladsgroup.json
- 09:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
- 09:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
- 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43461 and previous config saved to /var/cache/conftool/dbconfig/20230130-092732-ladsgroup.json
- 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43460 and previous config saved to /var/cache/conftool/dbconfig/20230130-091225-ladsgroup.json
- 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43459 and previous config saved to /var/cache/conftool/dbconfig/20230130-085719-ladsgroup.json
- 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43458 and previous config saved to /var/cache/conftool/dbconfig/20230130-085530-ladsgroup.json
- 08:48 moritzm: installing install1004 T327867
- 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43457 and previous config saved to /var/cache/conftool/dbconfig/20230130-084213-ladsgroup.json
- 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43456 and previous config saved to /var/cache/conftool/dbconfig/20230130-084024-ladsgroup.json
- 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43455 and previous config saved to /var/cache/conftool/dbconfig/20230130-083034-ladsgroup.json
- 08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43454 and previous config saved to /var/cache/conftool/dbconfig/20230130-082517-ladsgroup.json
- 08:19 zabe:: Deployed security patch for T278365
- 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43452 and previous config saved to /var/cache/conftool/dbconfig/20230130-081011-ladsgroup.json
- 07:54 phedenskog@deploy1002: Finished deploy [performance/navtiming@bfbd6d7]: (no justification provided) (duration: 00m 05s)
- 07:54 phedenskog@deploy1002: Started deploy [performance/navtiming@bfbd6d7]: (no justification provided)
- 07:50 moritzm: installing install2004 T327867
- 07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43451 and previous config saved to /var/cache/conftool/dbconfig/20230130-074502-ladsgroup.json
- 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43450 and previous config saved to /var/cache/conftool/dbconfig/20230130-073827-ladsgroup.json
- 07:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 07:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43449 and previous config saved to /var/cache/conftool/dbconfig/20230130-073806-ladsgroup.json
- 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43448 and previous config saved to /var/cache/conftool/dbconfig/20230130-072956-ladsgroup.json
- 07:26 marostegui: dbmaint Schema change on s7 eqiad T328236
- 07:25 marostegui: dbmaint Schema change on s2 eqiad T328236
- 07:25 marostegui: dbmaint Schema change on s1 eqiad T328236
- 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43447 and previous config saved to /var/cache/conftool/dbconfig/20230130-072300-ladsgroup.json
- 07:21 marostegui: dbmaint Schema change on s1 eqiad T328236
- 07:17 marostegui: dbmaint Schema change on s4 eqiad T328236
- 07:16 marostegui: dbmaint Schema change on s6 eqiad T328236
- 07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43446 and previous config saved to /var/cache/conftool/dbconfig/20230130-071450-ladsgroup.json
- 07:11 marostegui: dbmaint Schema change on s5 eqiad T328236
- 07:10 marostegui: dbmaint Schema change on s8 eqiad T328236
- 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43445 and previous config saved to /var/cache/conftool/dbconfig/20230130-070753-ladsgroup.json
- 07:05 marostegui: dbmaint Schema change on s3 eqiad T328086
- 07:02 marostegui: dbmaint Schema change on s1 eqiad T328086
- 07:01 marostegui: dbmaint Schema change on s4 eqiad T328086
- 06:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43444 and previous config saved to /var/cache/conftool/dbconfig/20230130-065943-ladsgroup.json
- 06:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43443 and previous config saved to /var/cache/conftool/dbconfig/20230130-065247-ladsgroup.json
- 06:51 marostegui: dbmaint Schema change on s5 eqiad T328086
- 06:45 marostegui: dbmaint Schema change on s2 eqiad T328086
- 06:43 marostegui: dbmaint Schema change on s7 eqiad T328086
- 06:41 marostegui: dbmaint Schema change on s8 eqiad T328086
- 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
- 06:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
- 06:34 marostegui: dbmaint Schema change on s6 eqiad T328086
- 06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T318605)', diff saved to https://phabricator.wikimedia.org/P43441 and previous config saved to /var/cache/conftool/dbconfig/20230130-061534-ladsgroup.json
- 06:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 06:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43440 and previous config saved to /var/cache/conftool/dbconfig/20230130-061401-ladsgroup.json
- 06:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 06:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43439 and previous config saved to /var/cache/conftool/dbconfig/20230130-053033-ladsgroup.json
- 05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 05:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
2023-01-29
- 14:46 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
- 14:40 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
- 14:39 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1008.eqiad.wmnet
- 14:33 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1008.eqiad.wmnet
2023-01-28
- 00:36 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet
- 00:35 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet
- 00:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS bullseye
2023-01-27
- 23:55 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
- 23:52 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
- 23:31 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
- 23:31 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4050.ulsfo.wmnet with OS bullseye
- 23:22 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
- 23:21 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet
- 22:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS bullseye
- 22:24 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
- 22:20 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
- 22:11 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.2-1+deb11u1_amd64.changes # T328162
- 22:11 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.2-1_amd64.changes # T328162
- 22:00 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
- 21:59 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4042.ulsfo.wmnet with OS bullseye
- 21:51 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
- 21:49 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
- 20:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4049.ulsfo.wmnet with OS bullseye
- 20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
- 20:26 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
- 20:05 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
- 20:02 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4049.ulsfo.wmnet with OS bullseye
- 19:38 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
- 19:38 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4049.ulsfo.wmnet with OS bullseye
- 19:32 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
- 19:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
- 19:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp404.ulsfo.wmnet
- 19:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS bullseye
- 19:02 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
- 18:57 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
- 18:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
- 18:37 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4041.ulsfo.wmnet with OS bullseye
- 18:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
- 18:24 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4048.ulsfo.wmnet
- 18:14 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS bullseye
- 17:52 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
- 17:49 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
- 17:38 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@907fe2a]: (no justification provided) (duration: 00m 14s)
- 17:38 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@907fe2a]: (no justification provided)
- 17:28 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS bullseye
- 17:28 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4048.ulsfo.wmnet with OS bullseye
- 17:15 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS bullseye
- 15:50 dancy@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 04s)
- 15:50 dancy@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
- 15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet,service=ats-be
- 15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet,service=cdn
- 15:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS bullseye
- 15:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet,service=ats-be
- 15:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet,service=cdn
- 15:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2027.codfw.wmnet with OS bullseye
- 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
- 15:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
- 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
- 14:58 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
- 14:55 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 14:55 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 14:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
- 14:46 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
- 14:45 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
- 14:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
- 14:41 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
- 14:40 moritzm: installing install3002 T327867
- 14:39 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
- 14:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 14:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 14:27 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts clouddb2001-dev.codfw.wmnet
- 14:27 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:26 andrew@cumin1001: START - Cookbook sre.dns.netbox
- 14:22 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts clouddb2001-dev.codfw.wmnet
- 14:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts clouddb2001-dev.codfw.wmnet
- 14:20 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:20 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
- 14:17 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
- 14:13 andrew@cumin1001: START - Cookbook sre.dns.netbox
- 14:10 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts clouddb2001-dev.codfw.wmnet
- 13:46 moritzm: installing install5002 T327867
- 13:08 moritzm: installing install6002 T327867
- 12:47 hashar: gerrit1001 running Puppet to deploy https://gerrit.wikimedia.org/r/883965 and restarting Apache 2 to change the `Listen` statements # T326125
- 12:42 hashar: Rebooting gerrit2002
- 12:38 hashar: Stopped Puppet on gerrit1001 to prevent auto deployment of https://gerrit.wikimedia.org/r/883965
- 12:25 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
- 12:25 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
- 12:23 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
- 12:03 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@9690bf9]: (no justification provided) (duration: 00m 15s)
- 12:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@9690bf9]: (no justification provided)
- 12:01 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 12:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138915
- 12:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 11:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138915
- 11:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9318
- 11:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9318
- 11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55821
- 11:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55821
- 11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398143
- 11:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398143
- 11:57 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2001-dev.codfw.wmnet with reason: host reimage
- 11:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26077
- 11:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26077
- 11:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 50266
- 11:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 50266
- 11:54 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2001-dev.codfw.wmnet with reason: host reimage
- 11:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14593
- 11:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14593
- 11:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56898
- 11:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56898
- 11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8368
- 11:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8368
- 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8560
- 11:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8560
- 11:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34309
- 11:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34309
- 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12033
- 11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12033
- 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62537
- 11:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62537
- 11:41 XioNoX: restart keyholder on deploy1002
- 11:41 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
- 11:40 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
- 11:38 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
- 11:36 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
- 11:27 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 11:26 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 56s)
- 11:25 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
- 11:25 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
- 11:24 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
- 11:24 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 11:15 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
- 11:15 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
- 11:15 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
- 11:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
- 11:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
- 11:13 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
- 11:12 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
- 11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
- 11:11 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
- 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-corp1001.wikimedia.org
- 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:09 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:08 aborrero@cumin2002: START - Cookbook sre.dns.netbox
- 11:08 aborrero@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 11:05 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp1001.wikimedia.org
- 11:04 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: apply on main
- 11:04 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
- 11:03 aborrero@cumin2002: START - Cookbook sre.dns.netbox
- 11:01 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: apply on main
- 11:01 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
- 10:53 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ldap-corp1001.wikimedia.org
- 10:52 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp1001.wikimedia.org
- 10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
- 10:38 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
- 10:37 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
- 10:37 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 10:26 aborrero@cumin2002: START - Cookbook sre.dns.netbox
- 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-corp2001.wikimedia.org
- 10:23 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 10:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 10:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp2001.wikimedia.org
- 09:40 moritzm: disabling old bastions bast3005/bast4003/bast5002/bast6001, use bast3006/bast4004/bast5003/bast6002 instead
- 08:23 marostegui: Apply schema change on labtestwiki (clouddb2002-dev)T328086
- 08:22 marostegui: Apply schema change on db1106 (s1 enwiki) T328086
- 08:06 elukey: restart kube-apiserver on ml-staging-ctrl2* nodes as attempt to mitigate some LIST API high latency
- 07:41 elukey: restart kube-apiserver on ml-serve-ctrl2* nodes as attempt to mitigate some 504 API response errors
- 01:15 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2*: Applying configuration change to cassandra-dev cluster - eevans@cumin1001
- 01:11 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet
- 01:10 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4047.ulsfo.wmnet with OS bullseye
- 00:56 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2*: Applying configuration change to cassandra-dev cluster - eevans@cumin1001
- 00:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
- 00:45 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
- 00:33 zabe@deploy1002: Finished scap: Backport for Stop setting cul_actor migration var (T233004) (duration: 07m 36s)
- 00:27 zabe@deploy1002: zabe: Backport for Stop setting cul_actor migration var (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 00:26 zabe@deploy1002: Started scap: Backport for Stop setting cul_actor migration var (T233004)
- 00:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
- 00:24 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye
- 00:16 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
- 00:15 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye
- 00:11 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
- 00:10 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye
2023-01-26
- 23:59 zabe@deploy1002: Finished scap: Backport for Add a project logo on gorwiktionary (T327987) (duration: 34m 42s)
- 23:54 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
- 23:52 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4039.ulsfo.wmnet
- 23:51 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4039.ulsfo.wmnet with OS bullseye
- 23:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
- 23:26 zabe@deploy1002: zabe and superpes: Backport for Add a project logo on gorwiktionary (T327987) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 23:25 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
- 23:24 zabe@deploy1002: Started scap: Backport for Add a project logo on gorwiktionary (T327987)
- 23:13 sbassett@deploy1002: Synchronized private/PrivateSettings.php: T326691 - remove mitigation and monitor (duration: 06m 52s)
- 23:04 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
- 23:04 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS bullseye
- 23:03 zabe@deploy1002: Finished scap: Backport for Pin CheckUserEventTablesMigrationStage to read and write old (T324907) (duration: 08m 36s)
- 22:56 zabe@deploy1002: dreamyjazz and zabe: Backport for Pin CheckUserEventTablesMigrationStage to read and write old (T324907) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 22:54 zabe@deploy1002: Started scap: Backport for Pin CheckUserEventTablesMigrationStage to read and write old (T324907)
- 22:45 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
- 22:44 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4046.ulsfo.wmnet
- 22:44 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS bullseye
- 22:23 zabe: running migrateRevisionCommentTemp.php in cebwiki in screen with --sleep 2 # T275246
- 22:22 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
- 22:18 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
- 21:58 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
- 21:47 thcipriani@deploy1002: Finished scap: Backport for Increase threshold for table of contents collapsing (T328045), Remove redundant block for search descriptions (T324859) (duration: 08m 49s)
- 21:40 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for Increase threshold for table of contents collapsing (T328045), Remove redundant block for search descriptions (T324859) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 21:39 thcipriani@deploy1002: Started scap: Backport for Increase threshold for table of contents collapsing (T328045), Remove redundant block for search descriptions (T324859)
- 21:36 thcipriani@deploy1002: Finished scap: Backport for ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704) (duration: 08m 43s)
- 21:35 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS bullseye
- 21:34 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
- 21:33 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS bullseye
- 21:33 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
- 21:33 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS bullseye
- 21:29 thcipriani@deploy1002: matmarex and thcipriani: Backport for ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 21:27 thcipriani@deploy1002: Started scap: Backport for ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704)
- 21:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
- 21:25 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
- 21:24 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
- 21:20 thcipriani@deploy1002: Finished scap: Backport for Enable write new for CheckUserLog comment fields everywhere (T233004) (duration: 11m 18s)
- 21:11 thcipriani@deploy1002: thcipriani and dreamyjazz: Backport for Enable write new for CheckUserLog comment fields everywhere (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 21:09 thcipriani@deploy1002: Started scap: Backport for Enable write new for CheckUserLog comment fields everywhere (T233004)
- 21:01 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
- 20:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
- 20:36 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
- 20:13 ryankemper: `ryankemper@thanos-fe1001:~$ sudo run-puppet-agent` following merge of wdqs recording rule patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/883610
- 20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on cp2027.codfw.wmnet with reason: reimaging
- 20:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on cp2027.codfw.wmnet with reason: reimaging
- 20:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
- 19:56 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4038.ulsfo.wmnet with OS bullseye
- 19:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp2027.codfw.wmnet with reason: reimaging
- 19:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp2027.codfw.wmnet with reason: reimaging
- 19:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.20 refs T325583
- 19:00 brennen: 1.40.0-wmf.20 train (T325583): no current blockers, rolling to all wikis.
- 18:59 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
- 18:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet
- 18:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS bullseye
- 18:20 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
- 18:17 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
- 18:17 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 18:16 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 18:16 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 18:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 18:15 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
- 18:15 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
- 18:15 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
- 18:15 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
- 18:15 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
- 18:15 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
- 18:15 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
- 18:15 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
- 18:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 18:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 18:14 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 18:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 18:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 18:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 18:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 18:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 18:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 18:10 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 18:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 18:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:59 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS bullseye
- 17:55 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet
- 17:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS bullseye
- 17:30 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1015.eqiad.wmnet
- 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43427 and previous config saved to /var/cache/conftool/dbconfig/20230126-172806-root.json
- 17:27 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
- 17:24 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1015.eqiad.wmnet
- 17:24 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
- 17:22 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1014.eqiad.wmnet
- 17:19 dancy@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 11s)
- 17:19 dancy@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
- 17:16 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet
- 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43426 and previous config saved to /var/cache/conftool/dbconfig/20230126-171302-root.json
- 17:12 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1013.eqiad.wmnet
- 17:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
- 17:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
- 17:06 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1013.eqiad.wmnet
- 17:06 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS bullseye
- 17:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet
- 17:05 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 17:05 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 17:04 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6007.drmrs.wmnet
- 17:03 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS bullseye
- 17:02 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1012.eqiad.wmnet
- 16:59 cgoubert@deploy1002: Synchronized tox.ini: Rebuilding mediawiki-webserver (duration: 07m 19s)
- 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43425 and previous config saved to /var/cache/conftool/dbconfig/20230126-165757-root.json
- 16:56 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1012.eqiad.wmnet
- 16:53 claime: Running scap sync-file -D php_fpm_restart_script:/bin/true tox.ini "Rebuilding mediawiki-webserver image" - T326794
- 16:51 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
- 16:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2027']
- 16:48 sukhe: correcting earlier log: pooling lvs2007 after T326564
- 16:48 sukhe: pooling lvs2009 after T326564
- 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43424 and previous config saved to /var/cache/conftool/dbconfig/20230126-164252-root.json
- 16:41 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
- 16:41 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2027']
- 16:38 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
- 16:38 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
- 16:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
- 16:31 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet
- 16:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
- 16:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
- 16:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
- 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43423 and previous config saved to /var/cache/conftool/dbconfig/20230126-162747-root.json
- 16:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
- 16:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet
- 16:24 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001-dev
- 16:23 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001-dev
- 16:23 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
- 16:21 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1010.eqiad.wmnet
- 16:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 16:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 16:20 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 16:19 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet
- 16:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 16:19 aborrero@cumin2002: START - Cookbook sre.dns.netbox
- 16:18 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS bullseye
- 16:14 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1010.eqiad.wmnet
- 16:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717
- 16:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717
- 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43422 and previous config saved to /var/cache/conftool/dbconfig/20230126-161242-root.json
- 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 T328024', diff saved to https://phabricator.wikimedia.org/P43421 and previous config saved to /var/cache/conftool/dbconfig/20230126-161137-root.json
- 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 primary T328024', diff saved to https://phabricator.wikimedia.org/P43420 and previous config saved to /var/cache/conftool/dbconfig/20230126-161058-marostegui.json
- 16:10 marostegui: Starting s8 codfw failover from db2161 to db2165 - T328024
- 16:09 moritzm: installing distro-info-data updates from Bullseye point release
- 16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudgw2001-dev.codfw.wmnet
- 16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
- 16:06 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
- 16:05 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1009.eqiad.wmnet
- 15:55 jbond: enable-puppet post deploy requestctl ferm chage gerrit:883935
- 15:55 aborrero@cumin2002: START - Cookbook sre.dns.netbox
- 15:51 hashar: Restarting CI Jenkins for upgrade
- 15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s8 T328024
- 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 T328024', diff saved to https://phabricator.wikimedia.org/P43419 and previous config saved to /var/cache/conftool/dbconfig/20230126-155000-root.json
- 15:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s8 T328024
- 15:49 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudgw2001-dev.codfw.wmnet
- 15:46 hashar: Restart Jenkins for upgrade
- 15:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1009.eqiad.wmnet
- 15:30 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
- 15:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
- 15:30 sukhe: install2003: rm /etc/dhcp/automation/ttyS1-115200/cp2027.conf
- 15:29 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
- 15:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
- 15:27 sukhe: poweroff lvs2007: T326564
- 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43418 and previous config saved to /var/cache/conftool/dbconfig/20230126-152329-root.json
- 15:12 jbond: disabl-puppet deplot requestctl ferm chage gerrit:883935
- 15:09 sukhe: stop pybal on lvs2007: T326564
- 15:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on lvs2007.codfw.wmnet with reason: powering off for T326564
- 15:09 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on lvs2007.codfw.wmnet with reason: powering off for T326564
- 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43417 and previous config saved to /var/cache/conftool/dbconfig/20230126-150824-root.json
- 15:04 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
- 15:04 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
- 15:02 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
- 15:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
- 14:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
- 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43415 and previous config saved to /var/cache/conftool/dbconfig/20230126-145319-root.json
- 14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
- 14:40 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
- 14:39 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
- 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43414 and previous config saved to /var/cache/conftool/dbconfig/20230126-143814-root.json
- 14:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
- 14:37 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
- 14:36 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
- 14:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 14:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:31 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 14:31 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotating wikiadmin password (T326802) (duration: 07m 04s)
- 14:27 moritzm: installing containerd security updates
- 14:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
- 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43413 and previous config saved to /var/cache/conftool/dbconfig/20230126-142309-root.json
- 14:16 Lucas_WMDE: UTC afternoon backport+config window done
- 14:15 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004) (duration: 09m 16s)
- 14:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 14:11 jbond: disable puppet fleet wide to role out etcd ferm change gerrit:883888
- 14:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 14:09 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43412 and previous config saved to /var/cache/conftool/dbconfig/20230126-140804-root.json
- 14:07 lucaswerkmeister-wmde@deploy1002: dreamyjazz and lucaswerkmeister-wmde: Backport for Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123 T328023', diff saved to https://phabricator.wikimedia.org/P43411 and previous config saved to /var/cache/conftool/dbconfig/20230126-140716-root.json
- 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2113 to s5 primary T328023', diff saved to https://phabricator.wikimedia.org/P43410 and previous config saved to /var/cache/conftool/dbconfig/20230126-140630-root.json
- 14:06 marostegui: Starting s5 codfw failover from db2123 to db2113 - T328023
- 14:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004)
- 14:00 moritzm: restarting etherpad-lite to pick up nodejs security update
- 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Remove vslow from db2113, future s5 codfw master T328023', diff saved to https://phabricator.wikimedia.org/P43409 and previous config saved to /var/cache/conftool/dbconfig/20230126-135509-marostegui.json
- 13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T328023
- 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2113 with weight 0 T328023', diff saved to https://phabricator.wikimedia.org/P43408 and previous config saved to /var/cache/conftool/dbconfig/20230126-135215-root.json
- 13:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T328023
- 13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 13:44 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
- 13:37 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
- 13:32 ladsgroup@deploy1002: Finished scap: Backport for Change time zone setting on gorwiktionary (T327986) (duration: 12m 02s)
- 13:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 13:25 moritzm: restarting turnilo for nodejs security update
- 13:22 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Change time zone setting on gorwiktionary (T327986) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 13:20 ladsgroup@deploy1002: Started scap: Backport for Change time zone setting on gorwiktionary (T327986)
- 13:10 moritzm: installing nodejs security updates on bullseye
- 13:09 hashar: Rebooting gerrit2002.wikimedia.org host to validate Apache 2 services starts AFTER network went online | T326125
- 13:06 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:04 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
- 12:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
- 12:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp3051.esams.wmnet with reason: T323717
- 12:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp3051.esams.wmnet with reason: T323717
- 12:42 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=ats-be
- 12:42 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=cdn
- 12:41 sukhe: depool cp3051.esams.wmnet for firmware update testing: T323717
- 12:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
- 12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
- 12:29 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
- 12:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
- 12:10 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
- 12:10 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
- 12:03 jbond: enable profile::base::firewall::defs_from_etcd: true globally
- 11:56 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd-client-ssl._tcp.wikimedia.org on all recursors
- 11:56 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd-client-ssl._tcp.wikimedia.org on all recursors
- 11:49 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
- 11:49 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
- 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flowspec1001
- 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
- 11:46 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
- 11:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
- 11:40 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts flowspec1001
- 11:36 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux
- 11:29 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
- 11:29 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
- 11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
- 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43405 and previous config saved to /var/cache/conftool/dbconfig/20230126-110822-root.json
- 11:03 hashar: Restarted Apache 2 on gerrit.wikimedia.org
- 10:55 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
- 10:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"
- 10:54 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
- 10:54 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
- 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43404 and previous config saved to /var/cache/conftool/dbconfig/20230126-105317-root.json
- 10:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
- 10:46 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"
- 10:45 moritzm: installing postgresql-13 security updates
- 10:43 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
- 10:42 joal@deploy1002: Finished deploy [airflow-dags/analytics@e52205b]: (no justification provided) (duration: 00m 11s)
- 10:42 joal@deploy1002: Started deploy [airflow-dags/analytics@e52205b]: (no justification provided)
- 10:41 claime: cgoubert@authdns1001:~$ sudo -i authdns-update
- 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43403 and previous config saved to /var/cache/conftool/dbconfig/20230126-103812-root.json
- 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43402 and previous config saved to /var/cache/conftool/dbconfig/20230126-103448-root.json
- 10:32 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435] (duration: 01m 16s)
- 10:31 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435]
- 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43401 and previous config saved to /var/cache/conftool/dbconfig/20230126-102307-root.json
- 10:21 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435] (duration: 00m 04s)
- 10:21 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435]
- 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43400 and previous config saved to /var/cache/conftool/dbconfig/20230126-101943-root.json
- 10:08 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts sretest1002.eqiad.wmnet
- 10:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43399 and previous config saved to /var/cache/conftool/dbconfig/20230126-100802-root.json
- 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43398 and previous config saved to /var/cache/conftool/dbconfig/20230126-100438-root.json
- 09:59 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 09:58 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8ed8435] (duration: 01m 08s)
- 09:57 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8ed8435]
- 09:57 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (thin): Regular analytics weekly train THIN [analytics/refinery@8ed8435] (duration: 00m 05s)
- 09:57 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (thin): Regular analytics weekly train THIN [analytics/refinery@8ed8435]
- 09:56 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435]: Regular analytics weekly train [analytics/refinery@8ed8435] (duration: 07m 00s)
- 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43397 and previous config saved to /var/cache/conftool/dbconfig/20230126-095257-root.json
- 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43396 and previous config saved to /var/cache/conftool/dbconfig/20230126-095205-root.json
- 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43395 and previous config saved to /var/cache/conftool/dbconfig/20230126-094933-root.json
- 09:49 joal@deploy1002: Started deploy [analytics/refinery@8ed8435]: Regular analytics weekly train [analytics/refinery@8ed8435]
- 09:48 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
- 09:48 jbond@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
- 09:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
- 09:47 jbond@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
- 09:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
- 09:47 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
- 09:46 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
- 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43394 and previous config saved to /var/cache/conftool/dbconfig/20230126-093700-root.json
- 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43393 and previous config saved to /var/cache/conftool/dbconfig/20230126-093620-root.json
- 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43392 and previous config saved to /var/cache/conftool/dbconfig/20230126-093428-root.json
- 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43391 and previous config saved to /var/cache/conftool/dbconfig/20230126-093303-root.json
- 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2144 to x2 primary T313811', diff saved to https://phabricator.wikimedia.org/P43390 and previous config saved to /var/cache/conftool/dbconfig/20230126-092512-root.json
- 09:24 marostegui: Starting x2 codfw failover from db2142 to db2144 - T328001
- 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
- 09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
- 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43389 and previous config saved to /var/cache/conftool/dbconfig/20230126-092155-root.json
- 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43388 and previous config saved to /var/cache/conftool/dbconfig/20230126-092115-root.json
- 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43387 and previous config saved to /var/cache/conftool/dbconfig/20230126-091923-root.json
- 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
- 09:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
- 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43386 and previous config saved to /var/cache/conftool/dbconfig/20230126-091758-root.json
- 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43385 and previous config saved to /var/cache/conftool/dbconfig/20230126-090650-root.json
- 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43384 and previous config saved to /var/cache/conftool/dbconfig/20230126-090610-root.json
- 09:05 phedenskog@deploy1002: Finished deploy [performance/navtiming@e5fdd6e]: (no justification provided) (duration: 00m 06s)
- 09:05 phedenskog@deploy1002: Started deploy [performance/navtiming@e5fdd6e]: (no justification provided)
- 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P43383 and previous config saved to /var/cache/conftool/dbconfig/20230126-090418-root.json
- 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T328000', diff saved to https://phabricator.wikimedia.org/P43382 and previous config saved to /var/cache/conftool/dbconfig/20230126-090302-root.json
- 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43381 and previous config saved to /var/cache/conftool/dbconfig/20230126-090253-root.json
- 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 primary T328000', diff saved to https://phabricator.wikimedia.org/P43380 and previous config saved to /var/cache/conftool/dbconfig/20230126-090212-root.json
- 09:02 marostegui: Starting s7 codfw failover from db2121 to db2118 - T328000
- 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43379 and previous config saved to /var/cache/conftool/dbconfig/20230126-085145-root.json
- 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43378 and previous config saved to /var/cache/conftool/dbconfig/20230126-085105-root.json
- 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43377 and previous config saved to /var/cache/conftool/dbconfig/20230126-084748-root.json
- 08:44 moritzm: added Eoghan to pwstore
- 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T328000
- 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2118 with weight 0 T328000', diff saved to https://phabricator.wikimedia.org/P43376 and previous config saved to /var/cache/conftool/dbconfig/20230126-084112-root.json
- 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T328000
- 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43375 and previous config saved to /var/cache/conftool/dbconfig/20230126-083640-root.json
- 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43374 and previous config saved to /var/cache/conftool/dbconfig/20230126-083600-root.json
- 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2105 T327999', diff saved to https://phabricator.wikimedia.org/P43373 and previous config saved to /var/cache/conftool/dbconfig/20230126-083543-root.json
- 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2127 to s3 primary T327999', diff saved to https://phabricator.wikimedia.org/P43372 and previous config saved to /var/cache/conftool/dbconfig/20230126-083459-root.json
- 08:34 marostegui: Starting s3 codfw failover from db2105 to db2127 - T327999
- 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43371 and previous config saved to /var/cache/conftool/dbconfig/20230126-083243-root.json
- 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T327999
- 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2127 with weight 0 T327999', diff saved to https://phabricator.wikimedia.org/P43370 and previous config saved to /var/cache/conftool/dbconfig/20230126-082432-root.json
- 08:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s3 T327999
- 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43369 and previous config saved to /var/cache/conftool/dbconfig/20230126-082055-root.json
- 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43368 and previous config saved to /var/cache/conftool/dbconfig/20230126-082038-root.json
- 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 T327998', diff saved to https://phabricator.wikimedia.org/P43367 and previous config saved to /var/cache/conftool/dbconfig/20230126-081916-root.json
- 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2107 to s2 primary T327998', diff saved to https://phabricator.wikimedia.org/P43366 and previous config saved to /var/cache/conftool/dbconfig/20230126-081818-root.json
- 08:17 marostegui: Starting s2 codfw failover from db2104 to db2107 - T327998
- 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43365 and previous config saved to /var/cache/conftool/dbconfig/20230126-081738-root.json
- 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43364 and previous config saved to /var/cache/conftool/dbconfig/20230126-080533-root.json
- 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T327998
- 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T327998
- 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2107 with weight 0 T327998', diff saved to https://phabricator.wikimedia.org/P43363 and previous config saved to /var/cache/conftool/dbconfig/20230126-080427-root.json
- 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P43362 and previous config saved to /var/cache/conftool/dbconfig/20230126-080233-root.json
- 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103 T327997', diff saved to https://phabricator.wikimedia.org/P43361 and previous config saved to /var/cache/conftool/dbconfig/20230126-080159-root.json
- 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 primary T327997', diff saved to https://phabricator.wikimedia.org/P43360 and previous config saved to /var/cache/conftool/dbconfig/20230126-080033-root.json
- 08:00 marostegui: Starting s1 codfw failover from db2103 to db2112 - T327997
- 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43359 and previous config saved to /var/cache/conftool/dbconfig/20230126-075028-root.json
- 07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2012.*
- 07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2011.*
- 07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2010.*
- 07:48 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2009.*
- 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2112 with weight 0 T327997', diff saved to https://phabricator.wikimedia.org/P43358 and previous config saved to /var/cache/conftool/dbconfig/20230126-073616-root.json
- 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 38 hosts with reason: Primary switchover s1 T327997
- 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 38 hosts with reason: Primary switchover s1 T327997
- 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43357 and previous config saved to /var/cache/conftool/dbconfig/20230126-073523-root.json
- 07:25 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Depool pc2011 (T327925) (duration: 11m 19s)
- 07:25 dcausse: T322869: depooling wdqs2009 wdqs2010 wdqs2011 wdqs2012 these hosts should not serve user traffic yet they don't have the database loaded
- 07:23 marostegui: Failover m1 from db1195 to db1176 - T327800
- 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43356 and previous config saved to /var/cache/conftool/dbconfig/20230126-072017-root.json
- 07:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
- 07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
- 07:17 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
- 07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
- 07:16 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Depool pc2011 (T327925) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 07:14 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Depool pc2011 (T327925)
- 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800
- 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800
- 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43354 and previous config saved to /var/cache/conftool/dbconfig/20230126-070512-root.json
- 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db1103', diff saved to https://phabricator.wikimedia.org/P43353 and previous config saved to /var/cache/conftool/dbconfig/20230126-070220-marostegui.json
- 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T327861', diff saved to https://phabricator.wikimedia.org/P43352 and previous config saved to /var/cache/conftool/dbconfig/20230126-070158-root.json
- 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 primary and set section read-write T327861', diff saved to https://phabricator.wikimedia.org/P43351 and previous config saved to /var/cache/conftool/dbconfig/20230126-070035-marostegui.json
- 07:00 marostegui: Starting x1 eqiad failover from db1120 to db1103 - T327861
- 06:48 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6015.drmrs.wmnet
- 06:48 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS bullseye
- 06:32 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotating wikiuser password (T326802) (duration: 07m 23s)
- 06:20 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
- 06:18 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
- 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1103 with weight 0 T327861', diff saved to https://phabricator.wikimedia.org/P43350 and previous config saved to /var/cache/conftool/dbconfig/20230126-061751-root.json
- 06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327861
- 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327861
- 05:57 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS bullseye
- 05:53 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6006.drmrs.wmnet
- 05:53 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6006.drmrs.wmnet with OS bullseye
- 05:32 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
- 05:28 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
- 05:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6006.drmrs.wmnet with OS bullseye
- 05:09 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6014.drmrs.wmnet
- 05:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS bullseye
- 04:45 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
- 04:42 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
- 04:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS bullseye
- 04:22 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6005.drmrs.wmnet
- 04:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6005.drmrs.wmnet with OS bullseye
- 03:52 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
- 03:49 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
- 03:29 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6005.drmrs.wmnet with OS bullseye
- 03:27 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6013.drmrs.wmnet
- 03:27 ejegg: payments-wiki upgraded from 08b8c3bc to 82d89841
- 03:26 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS bullseye
- 03:04 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
- 03:01 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
- 02:41 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS bullseye
- 02:30 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
- 02:17 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
- 02:17 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
- 01:58 ejegg: restarted fundraising scheduled jobs after queue server reboot
- 01:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
- 01:49 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.codfw.wmnet,service=ats-be
- 01:49 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.codfw.wmnet,service=cdn
- 01:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2027.codfw.wmnet with reason: firmware test
- 01:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2027.codfw.wmnet with reason: firmware test
- 01:46 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet,service=ats-be
- 01:46 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet,service=cdn
- 01:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2028.codfw.wmnet with OS bullseye
- 01:24 ejegg: payments-wiki upgraded from 15395d05 to 08b8c3bc (upgraded from MW 1.35 to MW 1.39)
- 01:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
- 01:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
- 01:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
- 01:14 ejegg: disabled fundraising scheduled jobs for queue server reboot
- 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2028.codfw.wmnet with OS bullseye
- 01:03 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
- 01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2028.codfw.wmnet
- 01:00 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
- 01:00 ejegg: turned pending transaction resolvers back on after civi deploy
- 00:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2028.codfw.wmnet
- 00:50 ejegg: civicrm upgraded from 3e6b21b6 to b5d6a790
- 00:50 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
- 00:49 sukhe: depool cp2028 for testing firmware update cookbook: T321309
- 00:49 ejegg: disabled pending transaction resolvers for civi deploy
- 00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=ats-be
- 00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=cdn
2023-01-25
- 23:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6004.drmrs.wmnet
- 23:57 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS bullseye
- 23:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
- 23:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
- 23:29 zabe@deploy1002: Finished scap: (no justification provided) (duration: 07m 34s)
- 23:21 zabe@deploy1002: Started scap: (no justification provided)
- 23:20 zabe@deploy1002: Backport cancelled.
- 23:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS bullseye
- 23:13 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6012.drmrs.wmnet
- 23:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS bullseye
- 22:43 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
- 22:40 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
- 22:21 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS bullseye
- 22:14 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6003.drmrs.wmnet
- 21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
- 21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
- 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
- 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
- 21:34 samtar@deploy1002: Finished scap: Backport for Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714) (duration: 09m 27s)
- 21:26 samtar@deploy1002: jdrewniak and samtar: Backport for Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714) synced to the testservers: mwdebug2002.cod
- 21:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 21:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 21:24 samtar@deploy1002: Started scap: Backport for Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)
- 21:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
- 20:59 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
- 20:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS bullseye
- 20:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts cp2028.codfw.wmnet
- 20:58 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
- 20:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
- 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
- 20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
- 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
- 20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
- 20:49 ejegg: updated employers.csv on paymentswiki
- 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
- 20:33 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
- 20:32 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo-eqiad cluster: Reboot kafka nodes
- 20:30 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
- 20:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS bullseye
- 20:00 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6011.drmrs.wmnet
- 19:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS bullseye
- 19:52 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
- 19:38 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
- 19:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
- 19:33 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
- 19:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
- 19:21 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
- 19:17 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.20 refs T325583 (duration: 07m 04s)
- 19:12 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS bullseye
- 19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.20 refs T325583
- 19:06 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6002.drmrs.wmnet
- 19:01 brennen: 1.40.0-wmf.20 train (T325583): no blockers, rolling to group1.
- 19:00 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
- 19:00 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
- 18:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS bullseye
- 18:37 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
- 18:35 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 18:34 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
- 18:33 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 18:33 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 18:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 18:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS bullseye
- 18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 18:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
- 18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 18:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
- 18:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6010.drmrs.wmnet
- 17:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS bullseye
- 17:32 mutante: removing racktables.wikimedia.org from DNS - that's it for this ancient service T327405
- 16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
- 16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
- 16:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2031.codfw.wmnet with OS bullseye
- 16:50 btullis@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka jumbo-eqiad cluster: Reboot kafka nodes
- 16:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
- 16:43 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
- 16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=ats-be
- 16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=cdn
- 16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
- 16:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
- 16:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
- 16:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS bullseye
- 16:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
- 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
- 16:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
- 16:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
- 16:08 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
- 16:04 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 16:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
- 15:56 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
- 15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2031']
- 15:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 15:50 robh: db1139 ilom wins/netbios disabled and ilom reset T327877
- 15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
- 15:47 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
- 15:46 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
- 15:45 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
- 15:45 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
- 15:44 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031.codfw.wmnet']
- 15:44 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031.codfw.wmnet']
- 15:43 robh: netbios wins disabled on db1140 ilom and ilom reset T327877
- 15:43 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
- 15:38 papaul: on going maintenance on fasw-c-eqiad
- 15:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
- 15:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
- 15:33 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
- 15:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
- 15:23 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 15:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
- 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
- 15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-be
- 15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=cdn
- 15:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS bullseye
- 15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
- 15:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
- 15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 15:12 urbanecm@deploy1002: Finished scap: triggering i18n refresh for T327824 (duration: 07m 57s)
- 15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
- 15:04 urbanecm@deploy1002: Started scap: triggering i18n refresh for T327824
- 15:04 urbanecm@deploy1002: Finished scap: Backport for Enable the Wikibase REST API on Wikidata (T324999) (duration: 08m 43s)
- 15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-be
- 15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=cdn
- 15:01 urbanecm: Overrunning B&C window
- 14:57 urbanecm@deploy1002: urbanecm and migr: Backport for Enable the Wikibase REST API on Wikidata (T324999) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
- 14:55 urbanecm@deploy1002: Started scap: Backport for Enable the Wikibase REST API on Wikidata (T324999)
- 14:53 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 14:53 urbanecm@deploy1002: Finished scap: Backport for REST: Use error log level for unexpected errors (T327490), User impact: amend incorrect parameter for the single day streak text (T327824) (duration: 32m 21s)
- 14:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
- 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
- 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install6002.wikimedia.org
- 14:40 urbanecm@deploy1002: jakob and sgimeno and urbanecm: Backport for REST: Use error log level for unexpected errors (T327490), User impact: amend incorrect parameter for the single day streak text (T327824) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 14:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
- 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install6002.wikimedia.org on all recursors
- 14:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install6002.wikimedia.org on all recursors
- 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
- 14:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
- 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 14:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
- 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 14:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
- 14:25 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 14:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install6002.wikimedia.org
- 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5002.wikimedia.org
- 14:21 urbanecm@deploy1002: Started scap: Backport for REST: Use error log level for unexpected errors (T327490), User impact: amend incorrect parameter for the single day streak text (T327824)
- 14:16 urbanecm@deploy1002: Finished scap: Backport for Enable Draft namespace on Serbo-Croatian Wikipedia (T327864) (duration: 12m 59s)
- 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5002.wikimedia.org on all recursors
- 14:09 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5002.wikimedia.org on all recursors
- 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
- 14:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
- 14:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
- 14:05 urbanecm@deploy1002: aleksandar and urbanecm: Backport for Enable Draft namespace on Serbo-Croatian Wikipedia (T327864) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 14:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 14:04 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5002.wikimedia.org
- 14:03 urbanecm@deploy1002: Started scap: Backport for Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)
- 13:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
- 13:51 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
- 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
- 13:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
- 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install3002.wikimedia.org
- 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install3002.wikimedia.org on all recursors
- 13:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install3002.wikimedia.org on all recursors
- 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
- 13:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
- 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install3002.wikimedia.org
- 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install2004.wikimedia.org
- 13:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4037.ulsfo.wmnet with OS bullseye
- 13:04 jbond: puppet now using vendored version of augeas-core https://gerrit.wikimedia.org/r/c/operations/puppet/+/883233
- 13:04 jbond: enable puppet fleet wide to post deploy gerrit:883233
- 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install2004.wikimedia.org on all recursors
- 13:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install2004.wikimedia.org on all recursors
- 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
- 12:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
- 12:54 jbond: disable puppet fleet wide to deploy gerrit:883233
- 12:54 jnuche@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 21s)
- 12:54 jnuche@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
- 12:45 moritzm: restarting Exim on MXes to pick up new libtasn
- 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2003.codfw.wmnet
- 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2002.codfw.wmnet
- 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1003.eqiad.wmnet
- 12:42 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1002.eqiad.wmnet
- 12:41 moritzm: restarting slapd on r/w servers to pick up new libtasn
- 12:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:37 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install2004.wikimedia.org
- 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install1004.wikimedia.org
- 12:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
- 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install1004.wikimedia.org on all recursors
- 12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install1004.wikimedia.org on all recursors
- 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
- 12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
- 12:12 moritzm: installing libtasn security updates on buster
- 11:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install1004.wikimedia.org
- 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
- 11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
- 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
- 11:34 Lucas_WMDE: Updated the Wikidata property suggester with data from 20230102's JSON dump (T325942)
- 11:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
- 11:27 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
- 11:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
- 11:12 hnowlan: restarting lvs on lvs1019 for thumbor healthcheck change
- 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43344 and previous config saved to /var/cache/conftool/dbconfig/20230125-111059-root.json
- 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43343 and previous config saved to /var/cache/conftool/dbconfig/20230125-110924-root.json
- 11:08 hnowlan: restarting lvs on lvs2009 for thumbor healthcheck change
- 11:00 hnowlan: restarting lvs on lvs1020 for thumbor healthcheck change
- 11:00 hnowlan: restarting lvs on lvs1010 for thumbor healthcheck change
- 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43342 and previous config saved to /var/cache/conftool/dbconfig/20230125-105554-root.json
- 10:54 hnowlan: restarting lvs on lvs2010 for thumbor healthcheck change
- 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43341 and previous config saved to /var/cache/conftool/dbconfig/20230125-105443-root.json
- 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43340 and previous config saved to /var/cache/conftool/dbconfig/20230125-105419-root.json
- 10:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 10:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 10:43 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43338 and previous config saved to /var/cache/conftool/dbconfig/20230125-104049-root.json
- 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43337 and previous config saved to /var/cache/conftool/dbconfig/20230125-103938-root.json
- 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43336 and previous config saved to /var/cache/conftool/dbconfig/20230125-103914-root.json
- 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43335 and previous config saved to /var/cache/conftool/dbconfig/20230125-102544-root.json
- 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43334 and previous config saved to /var/cache/conftool/dbconfig/20230125-102433-root.json
- 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43333 and previous config saved to /var/cache/conftool/dbconfig/20230125-102409-root.json
- 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43332 and previous config saved to /var/cache/conftool/dbconfig/20230125-101039-root.json
- 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43331 and previous config saved to /var/cache/conftool/dbconfig/20230125-100928-root.json
- 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43330 and previous config saved to /var/cache/conftool/dbconfig/20230125-100904-root.json
- 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43329 and previous config saved to /var/cache/conftool/dbconfig/20230125-095534-root.json
- 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43328 and previous config saved to /var/cache/conftool/dbconfig/20230125-095423-root.json
- 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43327 and previous config saved to /var/cache/conftool/dbconfig/20230125-095400-root.json
- 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43326 and previous config saved to /var/cache/conftool/dbconfig/20230125-094029-root.json
- 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43325 and previous config saved to /var/cache/conftool/dbconfig/20230125-093918-root.json
- 09:30 Emperor: rolling depool & update of thanos front-ends T327871
- 08:40 XioNoX: bump SGIX max prefix limit
- 08:13 ladsgroup@deploy1002: Finished scap: Backport for Add sandbox link to Serbo-Croatian Wikipedia (T327833) (duration: 10m 13s)
- 08:05 ladsgroup@deploy1002: ladsgroup and aleksandar: Backport for Add sandbox link to Serbo-Croatian Wikipedia (T327833) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 08:03 ladsgroup@deploy1002: Started scap: Backport for Add sandbox link to Serbo-Croatian Wikipedia (T327833)
- 07:49 marostegui: Cloning db1196 from db1206 (lag will appear on s1 wiki replicas) T327859
- 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206 to clone db1196 T327859', diff saved to https://phabricator.wikimedia.org/P43322 and previous config saved to /var/cache/conftool/dbconfig/20230125-074601-marostegui.json
- 07:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@bfff15d]: (no justification provided) (duration: 00m 05s)
- 07:34 phedenskog@deploy1002: Started deploy [performance/navtiming@bfff15d]: (no justification provided)
- 07:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
- 07:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
- 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 to clone db1198', diff saved to https://phabricator.wikimedia.org/P43320 and previous config saved to /var/cache/conftool/dbconfig/20230125-072033-marostegui.json
- 07:08 AndyRussG: updated payments (config only) revision 15395d05, config 418160e9
- 04:10 eileen: config revision changed from dc0a0d3a to 089d0acb
- 04:01 eileen: civicrm upgraded from 9197ca29 to 3e6b21b6
- 03:27 eileen: civicrm upgraded from f6093fb2 to 9197ca29
- 03:05 eileen: config revision changed from 3f641fce to dc0a0d3a
- 01:17 legoktm: adjusting Gerrit group "Campaigns Team" so it is not recursively a member of itself
- 00:10 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
- 00:10 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
2023-01-24
- 23:10 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id on testcommonswiki (T299954) (duration: 08m 02s)
- 23:04 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id on testcommonswiki (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 23:02 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id on testcommonswiki (T299954)
- 22:47 TheresNoTime: closing UTC late backport window
- 22:47 samtar@deploy1002: Finished scap: Backport for Add temporary extra grid-area for content translation extension (T327715), Add temporary extra grid-area for content translation extension (T327715) (duration: 09m 04s)
- 22:39 samtar@deploy1002: jdrewniak and samtar: Backport for Add temporary extra grid-area for content translation extension (T327715), Add temporary extra grid-area for content translation extension (T327715) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 22:37 samtar@deploy1002: Started scap: Backport for Add temporary extra grid-area for content translation extension (T327715), Add temporary extra grid-area for content translation extension (T327715)
- 22:30 samtar@deploy1002: Finished scap: Backport for [BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724), newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114) (duration: 07m 59s)
- 22:23 samtar@deploy1002: jforrester and samtar and stang: Backport for [BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724), newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 22:22 samtar@deploy1002: Started scap: Backport for [BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724), newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)
- 22:20 samtar@deploy1002: Finished scap: Backport for newiki: Add new permissions to group reviewer (T327114) (duration: 09m 02s)
- 22:19 mutante: DNS - adding new project language "gur" (Gurenɛ) - Gurenɛ is a major language of northern Ghana and the predominant language of the Upper East Region of Ghana. It is also widely spoken in Burkina Faso.. T327813
- 22:13 samtar@deploy1002: samtar and stang: Backport for newiki: Add new permissions to group reviewer (T327114) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 22:11 samtar@deploy1002: Started scap: Backport for newiki: Add new permissions to group reviewer (T327114)
- 22:08 samtar@deploy1002: Finished scap: Backport for Fix Wikitext editor preview layout in Vector 2022 (T327778), Fix Wikitext editor preview layout in Vector 2022 (T327778) (duration: 09m 36s)
- 22:06 TheresNoTime: extending UTC late backport window due to late start
- 22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=ats-be
- 22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=cdn
- 22:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS bullseye
- 22:00 samtar@deploy1002: samtar and jdrewniak: Backport for Fix Wikitext editor preview layout in Vector 2022 (T327778), Fix Wikitext editor preview layout in Vector 2022 (T327778) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 21:59 samtar@deploy1002: Started scap: Backport for Fix Wikitext editor preview layout in Vector 2022 (T327778), Fix Wikitext editor preview layout in Vector 2022 (T327778)
- 21:56 samtar@deploy1002: Finished scap: Backport for Work around sticky-positioned layers disabling subpixel rendering (T327460) (duration: 13m 31s)
- 21:45 samtar@deploy1002: nray and samtar: Backport for Work around sticky-positioned layers disabling subpixel rendering (T327460) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 21:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1009.eqiad.wmnet with OS bullseye
- 21:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 21:43 samtar@deploy1002: Started scap: Backport for Work around sticky-positioned layers disabling subpixel rendering (T327460)
- 21:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 21:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
- 21:38 zabe: running migrateRevisionCommentTemp.php on testcommonswiki (s4) with --sleep 10 # T275246
- 21:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
- 21:32 samtar@deploy1002: backport aborted: (duration: 06m 28s)
- 21:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
- 21:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
- 21:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS bullseye
- 21:05 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
- 21:03 TheresNoTime: holding UTC late backport window for outage, T327815
- 21:01 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
- 20:50 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
- 20:50 urandom: rebooting sessionstore1001.eqiad.wmnet -- T325132
- 20:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host sessionstore1001.eqiad.wmnet
- 20:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
- 20:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2027.codfw.wmnet
- 20:32 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2027.codfw.wmnet
- 20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-be
- 20:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
- 20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=cdn
- 20:29 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet
- 20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5025.eqsin.wmnet with OS bullseye
- 20:28 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet
- 20:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
- 20:20 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
- 20:20 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=ats-be
- 20:19 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=cdn
- 20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=cdn
- 20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=ats-be
- 20:16 bblack: pool cp5032
- 20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=ats-be
- 20:16 mutante: contint2001 - restarted zuul
- 20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=cdn
- 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
- 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=cdn
- 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=ats-be
- 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=cdn
- 20:12 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
- 20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS bullseye
- 20:09 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-be
- 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=cdn
- 20:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
- 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=ats-be
- 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=cdn
- 20:05 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS bullseye
- 20:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
- 19:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
- 19:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
- 19:54 sukhe: reprepro -C main include bullseye-wikimedia libvmod-netmapper_1.9-3_amd64.changes: T326634
- 19:53 sukhe: reprepro -C main include bullseye-wikimedia libvmod-re2_1.5.3-4_amd64.changes: T326634
- 19:53 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
- 19:51 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
- 19:47 sukhe: reprepro -C main include bullseye-wikimedia libvmod-querysort_0.4_amd64.changes: T326634
- 19:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
- 19:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
- 19:39 urandom: rebooting restbase cassandra nodes, row d -- T325132
- 19:33 bblack: cp5032: restart varnish-frontend
- 19:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2025.codfw.wmnet
- 19:28 sukhe: reprepro -C main include bullseye-wikimedia varnish-modules_0.15.0-3_amd64.changes: T326634
- 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
- 19:24 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
- 19:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2025.codfw.wmnet
- 19:19 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
- 19:19 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5025.eqsin.wmnet with OS bullseye
- 19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.20 refs T325583
- 19:06 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1011.eqiad.wmnet with OS bullseye
- 19:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1010.eqiad.wmnet with OS bullseye
- 19:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 19:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
- 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
- 18:55 jynus: deploy new dump grants for analytics dbs at db1108 T327155
- 18:43 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
- 18:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS bullseye
- 18:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
- 18:14 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
- 18:12 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
- 18:05 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
- 17:44 bblack: cp5032: upgrading packages (varnish, trafficserver
- 17:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2020.codfw.wmnet
- 17:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
- 17:36 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5017.eqsin.wmnet with OS bullseye
- 17:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
- 17:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
- 17:19 thcipriani: restarting ci jenkins for updates
- 17:13 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
- 17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
- 17:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
- 17:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
- 17:04 urandom: rebooting restbase cassandra nodes, row c -- T325132
- 16:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 16:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 16:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2042.codfw.wmnet with OS bullseye
- 16:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 16:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 16:23 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
- 16:23 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
- 16:22 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
- 16:22 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
- 16:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
- 16:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 16:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 16:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
- 15:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 15:53 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2042.codfw.wmnet with OS bullseye
- 15:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 15:31 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 15:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
- 15:26 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 15:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 40s)
- 15:15 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
- 15:12 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring" (duration: 00m 33s)
- 15:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring"
- 14:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 14:57 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
- 14:55 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 14:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 14:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
- 14:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 14:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 14:41 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
- 14:39 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
- 14:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
- 14:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:36 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
- 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 14:35 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
- 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 14:34 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
- 14:33 volans@cumin1001: START - Cookbook sre.dns.netbox
- 14:29 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
- 14:29 effie: switch maps (kartotherian) from eqiad to codfw (attempt #2)
- 14:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 14:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 14:25 TheresNoTime: close UTC afternoon backport window
- 14:24 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 14:20 XioNoX: repool ulsfo (maintenance over)
- 14:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1010.eqiad.wmnet with OS bullseye
- 14:17 samtar@deploy1002: Finished scap: Backport for Increase PC writes from parsoid API to 10% (T320534) (duration: 07m 41s)
- 14:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 14:11 samtar@deploy1002: daniel and samtar: Backport for Increase PC writes from parsoid API to 10% (T320534) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 14:09 samtar@deploy1002: Started scap: Backport for Increase PC writes from parsoid API to 10% (T320534)
- 13:50 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 13:44 XioNoX: reboot ulsfo switches for software upgrade
- 13:40 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 13:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1002.eqiad.wmnet
- 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1002.eqiad.wmnet
- 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2002.codfw.wmnet
- 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 13:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 13:10 topranks: enabling tunnel services on cr2-eqdfw fpc 0 pic 1
- 13:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:04 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping2002.codfw.wmnet
- 12:56 zabe@deploy1002: Finished scap: Backport for Remove PoolCounter from extension-list (T327336) (duration: 44m 09s)
- 12:51 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 12:51 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
- 12:48 XioNoX: restart ulsfo switches for network maintenance
- 12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
- 12:43 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
- 12:40 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
- 12:38 zabe@deploy1002: zabe: Backport for Remove PoolCounter from extension-list (T327336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 12:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor2004.codfw.wmnet
- 12:12 zabe@deploy1002: Started scap: Backport for Remove PoolCounter from extension-list (T327336)
- 11:54 volans: uploaded python3-gjson_1.0.0 to apt.wikimedia.org bullseye-wikimedia,unstable-wikimedia
- 11:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43311 and previous config saved to /var/cache/conftool/dbconfig/20230124-114255-root.json
- 11:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 11:36 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
- 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3002.esams.wmnet
- 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 11:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 11:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 11:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43310 and previous config saved to /var/cache/conftool/dbconfig/20230124-112750-root.json
- 11:26 zabe@deploy1002: Finished scap: Backport for Stop loading PoolCounter extension (T327336) (duration: 09m 19s)
- 11:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bullseye
- 11:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 11:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 11:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping3002.esams.wmnet
- 11:19 zabe@deploy1002: zabe: Backport for Stop loading PoolCounter extension (T327336) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 11:17 zabe@deploy1002: Started scap: Backport for Stop loading PoolCounter extension (T327336)
- 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43308 and previous config saved to /var/cache/conftool/dbconfig/20230124-111245-root.json
- 11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 11:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
- 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
- 11:03 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
- 11:03 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 11:03 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 11:02 effie: depooling maps (kartotherian) from codfw, leaving eqiad as pooled
- 11:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 10:59 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
- 10:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 10:58 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 10:58 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43306 and previous config saved to /var/cache/conftool/dbconfig/20230124-105740-root.json
- 10:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bullseye
- 10:52 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
- 10:49 XioNoX: depool ulsfo for network maintenance - T316532
- 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl in s1 T326116', diff saved to https://phabricator.wikimedia.org/P43305 and previous config saved to /var/cache/conftool/dbconfig/20230124-104336-marostegui.json
- 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43304 and previous config saved to /var/cache/conftool/dbconfig/20230124-104235-root.json
- 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1176 from s1 T326116', diff saved to https://phabricator.wikimedia.org/P43303 and previous config saved to /var/cache/conftool/dbconfig/20230124-104219-root.json
- 10:33 vgutierrez: repool cp4046
- 10:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 10:31 vgutierrez: restarting varnish on cp4046
- 10:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 10:29 vgutierrez: depool cp4046
- 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43302 and previous config saved to /var/cache/conftool/dbconfig/20230124-102730-root.json
- 10:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 10:22 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
- 10:19 moritzm: rolling Apache/FPM restarts on mw canaries to pick up libtasn security update
- 10:19 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
- 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2165 T327754', diff saved to https://phabricator.wikimedia.org/P43301 and previous config saved to /var/cache/conftool/dbconfig/20230124-101825-root.json
- 10:17 effie: depooling maps from equad && pooling maps on codfw
- 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary T327754', diff saved to https://phabricator.wikimedia.org/P43300 and previous config saved to /var/cache/conftool/dbconfig/20230124-101727-root.json
- 10:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 10:14 marostegui: Starting s8 codfw failover from db2165 to db2161 - T327754
- 10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2041.codfw.wmnet with OS bullseye
- 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43299 and previous config saved to /var/cache/conftool/dbconfig/20230124-101025-root.json
- 09:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
- 09:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 09:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
- 09:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
- 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43298 and previous config saved to /var/cache/conftool/dbconfig/20230124-095520-root.json
- 09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 35 hosts with reason: Primary switchover s8 T327754
- 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 T327754', diff saved to https://phabricator.wikimedia.org/P43297 and previous config saved to /var/cache/conftool/dbconfig/20230124-095235-marostegui.json
- 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 35 hosts with reason: Primary switchover s8 T327754
- 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43296 and previous config saved to /var/cache/conftool/dbconfig/20230124-094725-root.json
- 09:41 moritzm: installing libtasn1-6 security updates on buster
- 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43295 and previous config saved to /var/cache/conftool/dbconfig/20230124-094016-root.json
- 09:39 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 09:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2041.codfw.wmnet with OS bullseye
- 09:39 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43294 and previous config saved to /var/cache/conftool/dbconfig/20230124-093220-root.json
- 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43293 and previous config saved to /var/cache/conftool/dbconfig/20230124-092511-root.json
- 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43292 and previous config saved to /var/cache/conftool/dbconfig/20230124-091715-root.json
- 09:14 kart_: Done: UTC morning backport window
- 09:13 kartik@deploy1002: Finished scap: Backport for Remove Kartographer versioned mapdata flags (T326288) (duration: 09m 44s)
- 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43291 and previous config saved to /var/cache/conftool/dbconfig/20230124-091006-root.json
- 09:05 kartik@deploy1002: awight and kartik: Backport for Remove Kartographer versioned mapdata flags (T326288) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 09:03 kartik@deploy1002: Started scap: Backport for Remove Kartographer versioned mapdata flags (T326288)
- 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43290 and previous config saved to /var/cache/conftool/dbconfig/20230124-090210-root.json
- 09:01 kartik@deploy1002: Finished scap: Backport for Deprecate the EnableMapFrame feature flag (T326288) (duration: 10m 42s)
- 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43289 and previous config saved to /var/cache/conftool/dbconfig/20230124-085501-root.json
- 08:52 kartik@deploy1002: awight and kartik: Backport for Deprecate the EnableMapFrame feature flag (T326288) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 08:50 kartik@deploy1002: Started scap: Backport for Deprecate the EnableMapFrame feature flag (T326288)
- 08:48 kartik@deploy1002: Finished scap: Backport for Enable write new for CheckUserLog comment fields on testwikis (T233004) (duration: 15m 20s)
- 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43288 and previous config saved to /var/cache/conftool/dbconfig/20230124-084705-root.json
- 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db2115 in x1 codfw', diff saved to https://phabricator.wikimedia.org/P43287 and previous config saved to /var/cache/conftool/dbconfig/20230124-084552-marostegui.json
- 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096 T327745', diff saved to https://phabricator.wikimedia.org/P43286 and previous config saved to /var/cache/conftool/dbconfig/20230124-084508-marostegui.json
- 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2115 to x1 codfw T327745', diff saved to https://phabricator.wikimedia.org/P43285 and previous config saved to /var/cache/conftool/dbconfig/20230124-084206-marostegui.json
- 08:39 marostegui: Starting x1 codfw failover from db2096 to db2115 - T327745
- 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2115 with weight 0 T327745', diff saved to https://phabricator.wikimedia.org/P43284 and previous config saved to /var/cache/conftool/dbconfig/20230124-083643-marostegui.json
- 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327745
- 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327745
- 08:35 kartik@deploy1002: dreamyjazz and kartik: Backport for Enable write new for CheckUserLog comment fields on testwikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
- 08:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@8c87ca6]: (no justification provided) (duration: 00m 06s)
- 08:34 phedenskog@deploy1002: Started deploy [performance/navtiming@8c87ca6]: (no justification provided)
- 08:33 kartik@deploy1002: Started scap: Backport for Enable write new for CheckUserLog comment fields on testwikis (T233004)
- 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43283 and previous config saved to /var/cache/conftool/dbconfig/20230124-083200-root.json
- 08:28 kartik@deploy1002: Finished scap: Backport for Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727) (duration: 09m 09s)
- 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2110 from API T327739', diff saved to https://phabricator.wikimedia.org/P43282 and previous config saved to /var/cache/conftool/dbconfig/20230124-082440-marostegui.json
- 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 T327739', diff saved to https://phabricator.wikimedia.org/P43281 and previous config saved to /var/cache/conftool/dbconfig/20230124-082138-marostegui.json
- 08:21 kartik@deploy1002: kartik and matmarex: Backport for Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary T327739', diff saved to https://phabricator.wikimedia.org/P43280 and previous config saved to /var/cache/conftool/dbconfig/20230124-082025-root.json
- 08:19 kartik@deploy1002: Started scap: Backport for Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)
- 08:18 marostegui: Starting s4 codfw failover from db2140 to db2110 - T327739
- 08:16 kartik@deploy1002: Finished scap: Backport for Content Translation: Add campaign for Wiki Loves Living Heritage (T327587) (duration: 10m 25s)
- 08:07 kartik@deploy1002: kartik: Backport for Content Translation: Add campaign for Wiki Loves Living Heritage (T327587) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 08:05 kartik@deploy1002: Started scap: Backport for Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)
- 07:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T327739
- 07:58 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T327739
- 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 T327739', diff saved to https://phabricator.wikimedia.org/P43279 and previous config saved to /var/cache/conftool/dbconfig/20230124-075824-root.json
- 07:50 moritzm: installing Linux 5.10.162 on Bullseye hosts
- 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl T327616', diff saved to https://phabricator.wikimedia.org/P43278 and previous config saved to /var/cache/conftool/dbconfig/20230124-074323-marostegui.json
- 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43277 and previous config saved to /var/cache/conftool/dbconfig/20230124-064905-ladsgroup.json
- 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43276 and previous config saved to /var/cache/conftool/dbconfig/20230124-064554-ladsgroup.json
- 06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43275 and previous config saved to /var/cache/conftool/dbconfig/20230124-063358-ladsgroup.json
- 06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43274 and previous config saved to /var/cache/conftool/dbconfig/20230124-063048-ladsgroup.json
- 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43273 and previous config saved to /var/cache/conftool/dbconfig/20230124-061852-ladsgroup.json
- 06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43272 and previous config saved to /var/cache/conftool/dbconfig/20230124-061541-ladsgroup.json
- 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43271 and previous config saved to /var/cache/conftool/dbconfig/20230124-060345-ladsgroup.json
- 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43270 and previous config saved to /var/cache/conftool/dbconfig/20230124-060129-ladsgroup.json
- 06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43269 and previous config saved to /var/cache/conftool/dbconfig/20230124-060035-ladsgroup.json
- 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43268 and previous config saved to /var/cache/conftool/dbconfig/20230124-055816-ladsgroup.json
- 05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 04:57 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.18 (duration: 02m 07s)
- 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.20 refs T325583 (duration: 53m 01s)
- 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.20 refs T325583
- 03:30 AndyRussG: payments-wiki upgraded from 3d882ac7 to 15395d05
- 02:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2024.codfw.wmnet
- 02:27 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2024.codfw.wmnet
- 02:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
- 02:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
- 02:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2019.codfw.wmnet
- 02:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
- 02:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
- 01:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
- 01:51 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
- 01:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
- 01:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
- 01:26 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
- 01:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1032.eqiad.wmnet
- 01:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
- 01:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1031.eqiad.wmnet
- 01:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1031.eqiad.wmnet
- 01:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
- 00:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
- 00:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
- 00:47 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
- 00:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
- 00:38 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
- 00:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
- 00:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
- 00:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1018.eqiad.wmnet
- 00:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1018.eqiad.wmnet
- 00:14 zabe@deploy1002: Finished scap: Backport for Use core's PoolCounterClient (T327336) (duration: 12m 47s)
- 00:03 zabe@deploy1002: zabe: Backport for Use core's PoolCounterClient (T327336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 00:01 zabe@deploy1002: Started scap: Backport for Use core's PoolCounterClient (T327336)
2023-01-23
- 23:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
- 23:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
- 23:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
- 23:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
- 23:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
- 23:07 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
- 22:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
- 22:57 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
- 22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
- 22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
- 22:56 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@544f5f3]: 0.3.119 (duration: 07m 30s)
- 22:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
- 22:49 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.119` on canary `wdqs1003`; proceeding to rest of fleet
- 22:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@544f5f3]: 0.3.119
- 22:46 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.119`. Pre-deploy tests passing on canary `wdqs1003`
- 22:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
- 22:37 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
- 22:31 maryum: Deployed patch for T285159
- 21:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
- 21:40 zabe@deploy1002: Finished scap: Backport for throttle: Remove expired rule (duration: 07m 45s)
- 21:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
- 21:34 zabe@deploy1002: zabe: Backport for throttle: Remove expired rule synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 21:32 zabe@deploy1002: Started scap: Backport for throttle: Remove expired rule
- 21:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
- 21:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
- 21:12 kindrobot: close UTC late backport window
- 21:12 kindrobot@deploy1002: Finished scap: Backport for Enable Page Tools for logged-in users on enwiki (T327686) (duration: 09m 00s)
- 21:04 kindrobot@deploy1002: jdrewniak and kindrobot: Backport for Enable Page Tools for logged-in users on enwiki (T327686) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 21:03 kindrobot@deploy1002: Started scap: Backport for Enable Page Tools for logged-in users on enwiki (T327686)
- 21:01 kindrobot: start UTC late backport window
- 20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
- 20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
- 20:45 taavi: restart T315510 on group1 after mwmaint restart, currently running on wikidatawiki
- 19:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
- 19:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
- 19:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
- 19:30 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
- 19:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
- 19:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
- 19:17 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
- 19:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
- 19:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
- 19:16 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
- 18:48 mutante: miscweb1002 - unload CAS apache module and config; apt-get remove libapache2-mod-auth-cas
- 18:19 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load - apt-get remove libapache2-mod-auth-cas - T327405
- 18:08 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load
- 18:05 mutante: miscweb1002 - disabling puppet because latest merge would break apache if it runs, debugging in progress on inactive miscweb2002
- 18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
- 18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
- 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43265 and previous config saved to /var/cache/conftool/dbconfig/20230123-175241-ladsgroup.json
- 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43264 and previous config saved to /var/cache/conftool/dbconfig/20230123-173736-ladsgroup.json
- 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43263 and previous config saved to /var/cache/conftool/dbconfig/20230123-172231-ladsgroup.json
- 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43262 and previous config saved to /var/cache/conftool/dbconfig/20230123-170726-ladsgroup.json
- 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 16:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 16:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 16:48 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 48s)
- 16:42 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 48s)
- 16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
- 16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
- 16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43261 and previous config saved to /var/cache/conftool/dbconfig/20230123-163207-root.json
- 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43260 and previous config saved to /var/cache/conftool/dbconfig/20230123-163138-root.json
- 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43259 and previous config saved to /var/cache/conftool/dbconfig/20230123-161702-root.json
- 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43258 and previous config saved to /var/cache/conftool/dbconfig/20230123-161633-root.json
- 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43257 and previous config saved to /var/cache/conftool/dbconfig/20230123-160157-root.json
- 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43256 and previous config saved to /var/cache/conftool/dbconfig/20230123-160126-root.json
- 15:53 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.11-1wm1_amd64.changes: T326634
- 15:50 urbanecm: Deploy security patch for T327613
- 15:48 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 15:48 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43255 and previous config saved to /var/cache/conftool/dbconfig/20230123-154652-root.json
- 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43254 and previous config saved to /var/cache/conftool/dbconfig/20230123-154621-root.json
- 15:44 papaul: on going maintenance on fasw-codfw
- 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43253 and previous config saved to /var/cache/conftool/dbconfig/20230123-153147-root.json
- 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43252 and previous config saved to /var/cache/conftool/dbconfig/20230123-153116-root.json
- 15:17 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.1.4-1wm1_amd64.changes: T325563
- 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43251 and previous config saved to /var/cache/conftool/dbconfig/20230123-151642-root.json
- 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43250 and previous config saved to /var/cache/conftool/dbconfig/20230123-151611-root.json
- 15:09 taavi@deploy1002: Finished scap: Backport for Revert "Enable Linter write namespace tag and template using core config" (duration: 07m 28s)
- 15:03 taavi@deploy1002: taavi and trainbranchbot: Backport for Revert "Enable Linter write namespace tag and template using core config" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
- 15:02 taavi@deploy1002: Started scap: Backport for Revert "Enable Linter write namespace tag and template using core config"
- 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P43248 and previous config saved to /var/cache/conftool/dbconfig/20230123-150110-marostegui.json
- 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P43247 and previous config saved to /var/cache/conftool/dbconfig/20230123-150018-marostegui.json
- 15:00 taavi@deploy1002: Finished scap: Backport for Enable Linter write namespace tag and template using core config (T299612) (duration: 07m 56s)
- 14:59 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 14:59 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 14:53 taavi@deploy1002: taavi and sbailey: Backport for Enable Linter write namespace tag and template using core config (T299612) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
- 14:52 taavi@deploy1002: Started scap: Backport for Enable Linter write namespace tag and template using core config (T299612)
- 14:46 taavi@deploy1002: Finished scap: Backport for SpecialUserrights: Allow updating the expiry of user groups (T327605) (duration: 08m 48s)
- 14:42 sukhe: rolling out pybal 1.15.10: T321191
- 14:39 taavi@deploy1002: taavi and func: Backport for SpecialUserrights: Allow updating the expiry of user groups (T327605) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 14:37 taavi@deploy1002: Started scap: Backport for SpecialUserrights: Allow updating the expiry of user groups (T327605)
- 14:37 taavi@deploy1002: Finished scap: Backport for zhwiki: Install PageAssessments (T326387) (duration: 11m 24s)
- 14:27 taavi@deploy1002: stang and taavi: Backport for zhwiki: Install PageAssessments (T326387) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
- 14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
- 14:25 taavi@deploy1002: Started scap: Backport for zhwiki: Install PageAssessments (T326387)
- 14:25 taavi@deploy1002: Finished scap: Backport for bnwikiquote: Update logo (T323131), shnwikibooks: Add project logo (T327380) (duration: 09m 22s)
- 14:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 14:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:18 taavi: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zhwiki pageassessments # T326387
- 14:17 taavi@deploy1002: taavi and stang: Backport for bnwikiquote: Update logo (T323131), shnwikibooks: Add project logo (T327380) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 14:16 taavi@deploy1002: Started scap: Backport for bnwikiquote: Update logo (T323131), shnwikibooks: Add project logo (T327380)
- 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43246 and previous config saved to /var/cache/conftool/dbconfig/20230123-124532-ladsgroup.json
- 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43245 and previous config saved to /var/cache/conftool/dbconfig/20230123-123025-ladsgroup.json
- 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43242 and previous config saved to /var/cache/conftool/dbconfig/20230123-121519-ladsgroup.json
- 12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 12:06 marostegui: dbmaint Reboot db2135 (m5 codfw master)
- 12:06 marostegui: dbmaint Reboot db2134 (m3 codfw master)
- 12:05 Emperor: removing /usr/local/bin/prometheus-puppet-agent-stats from prometheus crontab on snapshot1014
- 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43241 and previous config saved to /var/cache/conftool/dbconfig/20230123-120012-ladsgroup.json
- 11:58 marostegui: dbmaint Reboot db2133 (m2 codfw master)
- 11:57 marostegui: dbmaint Reboot db2132 (m1 codfw master)
- 11:57 marostegui: Reboot db2132 (m1 codfw master)
- 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43239 and previous config saved to /var/cache/conftool/dbconfig/20230123-113506-ladsgroup.json
- 11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2114 T327644', diff saved to https://phabricator.wikimedia.org/P43236 and previous config saved to /var/cache/conftool/dbconfig/20230123-112134-ladsgroup.json
- 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43235 and previous config saved to /var/cache/conftool/dbconfig/20230123-112001-ladsgroup.json
- 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2129 to s6 primary T327644', diff saved to https://phabricator.wikimedia.org/P43234 and previous config saved to /var/cache/conftool/dbconfig/20230123-111813-ladsgroup.json
- 11:17 Amir1: Starting s6 codfw failover from db2114 to db2129 - T327644
- 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43233 and previous config saved to /var/cache/conftool/dbconfig/20230123-111147-ladsgroup.json
- 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43232 and previous config saved to /var/cache/conftool/dbconfig/20230123-110456-ladsgroup.json
- 10:55 XioNoX: update management routers ACLs to add new bast hosts
- 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2129 with weight 0 T327644', diff saved to https://phabricator.wikimedia.org/P43231 and previous config saved to /var/cache/conftool/dbconfig/20230123-105520-ladsgroup.json
- 10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
- 10:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
- 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43230 and previous config saved to /var/cache/conftool/dbconfig/20230123-104951-ladsgroup.json
- 10:48 vgutierrez: rolling upgrade to HAProxy 2.4.20 on ulsfo
- 10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 06s)
- 10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
- 10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 20s)
- 10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
- 10:39 btullis@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
- 10:39 btullis@deploy1002: Installing scap version "4.33.1" for 1 hosts
- 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-tool1010.eqiad.wmnet with OS bullseye
- 10:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
- 10:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
- 10:07 ladsgroup@deploy1002: Finished scap: Backport for Remove Flow as default in techconductwiki (duration: 07m 51s)
- 10:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-tool1010.eqiad.wmnet with OS bullseye
- 10:01 ladsgroup@deploy1002: ladsgroup: Backport for Remove Flow as default in techconductwiki synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 09:59 ladsgroup@deploy1002: Started scap: Backport for Remove Flow as default in techconductwiki
- 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 09:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
- 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
- 09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 08:49 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:49 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
- 08:48 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
- 08:46 volans@cumin1001: START - Cookbook sre.dns.netbox
- 08:45 zabe@deploy1002: Finished scap: Backport for Remove oversight group from privileged groups (T112147), Start reading from cuc_comment_id on wikidatawiki (T233004) (duration: 07m 48s)
- 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43229 and previous config saved to /var/cache/conftool/dbconfig/20230123-084326-marostegui.json
- 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43228 and previous config saved to /var/cache/conftool/dbconfig/20230123-084239-marostegui.json
- 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43227 and previous config saved to /var/cache/conftool/dbconfig/20230123-084055-root.json
- 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43226 and previous config saved to /var/cache/conftool/dbconfig/20230123-084045-root.json
- 08:39 zabe@deploy1002: zabe: Backport for Remove oversight group from privileged groups (T112147), Start reading from cuc_comment_id on wikidatawiki (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 08:37 zabe@deploy1002: Started scap: Backport for Remove oversight group from privileged groups (T112147), Start reading from cuc_comment_id on wikidatawiki (T233004)
- 08:37 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 08s)
- 08:36 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
- 08:30 ladsgroup@deploy1002: Finished scap: Backport for Tweaks for new heading HTML structure (T327328 T327469) (duration: 17m 12s)
- 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43225 and previous config saved to /var/cache/conftool/dbconfig/20230123-082550-root.json
- 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43224 and previous config saved to /var/cache/conftool/dbconfig/20230123-082540-root.json
- 08:22 ladsgroup@deploy1002: ladsgroup and matmarex: Backport for Tweaks for new heading HTML structure (T327328 T327469) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 08:12 ladsgroup@deploy1002: Started scap: Backport for Tweaks for new heading HTML structure (T327328 T327469)
- 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43223 and previous config saved to /var/cache/conftool/dbconfig/20230123-081045-root.json
- 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43222 and previous config saved to /var/cache/conftool/dbconfig/20230123-081035-root.json
- 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43221 and previous config saved to /var/cache/conftool/dbconfig/20230123-080824-ladsgroup.json
- 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43220 and previous config saved to /var/cache/conftool/dbconfig/20230123-075540-root.json
- 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43219 and previous config saved to /var/cache/conftool/dbconfig/20230123-075530-root.json
- 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43218 and previous config saved to /var/cache/conftool/dbconfig/20230123-075319-ladsgroup.json
- 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43217 and previous config saved to /var/cache/conftool/dbconfig/20230123-074035-root.json
- 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43216 and previous config saved to /var/cache/conftool/dbconfig/20230123-074025-root.json
- 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43215 and previous config saved to /var/cache/conftool/dbconfig/20230123-073814-ladsgroup.json
- 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43214 and previous config saved to /var/cache/conftool/dbconfig/20230123-072530-root.json
- 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43213 and previous config saved to /var/cache/conftool/dbconfig/20230123-072520-root.json
- 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43212 and previous config saved to /var/cache/conftool/dbconfig/20230123-072309-ladsgroup.json
- 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 db1206 T326669', diff saved to https://phabricator.wikimedia.org/P43211 and previous config saved to /var/cache/conftool/dbconfig/20230123-071323-marostegui.json
- 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 06:23 kart_: Updated cxserver to 2023-01-20-051603-production (T323840, T326236)
- 06:19 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 06:18 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 06:17 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 06:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 06:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 06:16 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 06:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 06:12 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 05:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 05:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 04:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2113 T327611', diff saved to https://phabricator.wikimedia.org/P43210 and previous config saved to /var/cache/conftool/dbconfig/20230123-045939-ladsgroup.json
- 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2123 to s5 primary T327611', diff saved to https://phabricator.wikimedia.org/P43209 and previous config saved to /var/cache/conftool/dbconfig/20230123-045740-ladsgroup.json
- 04:57 Amir1: Starting s5 codfw failover from db2113 to db2123 - T327611
- 04:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 04:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2123 with weight 0 T327611', diff saved to https://phabricator.wikimedia.org/P43208 and previous config saved to /var/cache/conftool/dbconfig/20230123-043324-ladsgroup.json
- 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
- 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
- 04:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 04:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
- 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2107 T327609', diff saved to https://phabricator.wikimedia.org/P43207 and previous config saved to /var/cache/conftool/dbconfig/20230123-035458-ladsgroup.json
- 03:52 Amir1: Starting s2 codfw failover from db2107 to db2104 - T327609
2023-01-20
- 18:22 jynus: deploying new grants for backups on m1 T327155
- 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 14:28 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 14:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
- 13:08 moritzm: installing node-minimatch security updates
- 13:01 moritzm: installing libxstream-java security updates
- 13:00 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1~wmf1_amd64.changes: T325557
- 12:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
- 12:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2040.codfw.wmnet with OS bullseye
- 12:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
- 12:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
- 12:17 moritzm: installing ping1003 T273509
- 12:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2040.codfw.wmnet with OS bullseye
- 12:03 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
- 12:02 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
- 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
- 10:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
- 10:32 elukey: restart kubelet on ml-staging200* nodes (some fs-inotify-related issues with the istio-proxy of newly created containers)
- 10:27 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 10:13 moritzm: installing emacs security updates on bullseye
- 10:13 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 10:12 moritzm: imported jenkins 2.375-2 to thirdparty/ci T326531
- 10:00 jnuche@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
- 10:00 jnuche@deploy1002: Installing scap version "4.33.1" for 1 hosts
- 08:59 moritzm: installing ping2003 T273509
- 08:10 elukey: restart kubelet on kubernetes2007 - node reported issues with it, marked as "notready" by the control plane
- 07:58 elukey: `apt-get clean` on doh4001 to free space (root partition almost filled)
- 01:55 ejegg: payments-wiki upgraded from 3cf03933 to 3d882ac7
- 01:12 ejegg: payments-wiki upgraded from fcb9ab60 to 3cf03933
2023-01-19
- 21:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2039.codfw.wmnet with OS bullseye
- 21:42 jdrewniak@deploy1002: Finished scap: Backport for Enable Page tools on viwiki and itwiki (T327348) (duration: 10m 38s)
- 21:33 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for Enable Page tools on viwiki and itwiki (T327348) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
- 21:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
- 21:31 jdrewniak@deploy1002: Started scap: Backport for Enable Page tools on viwiki and itwiki (T327348)
- 21:27 jdrewniak@deploy1002: Finished scap: Backport for Fix grid blowout with limited width turned off (T327423) (duration: 08m 26s)
- 21:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
- 21:20 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 13s)
- 21:20 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
- 21:20 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for Fix grid blowout with limited width turned off (T327423) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
- 21:18 jdrewniak@deploy1002: Started scap: Backport for Fix grid blowout with limited width turned off (T327423)
- 21:11 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2039.codfw.wmnet with OS bullseye
- 20:13 zabe@deploy1002: Finished scap: fix k8s drift (duration: 08m 02s)
- 20:05 zabe@deploy1002: Started scap: fix k8s drift
- 20:02 zabe@deploy1002: Finished scap: Backport for Start reading from cuc_comment_id everywhere except wikidatawiki (T233004) (duration: 14m 01s)
- 19:49 zabe@deploy1002: zabe: Backport for Start reading from cuc_comment_id everywhere except wikidatawiki (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 19:48 zabe@deploy1002: Started scap: Backport for Start reading from cuc_comment_id everywhere except wikidatawiki (T233004)
- 18:36 zabe: re-start populateCucComment on wikidatawiki post-mwmaint-reboot in screen with --sleep 2, will take ~30 hours # T233004
- 18:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 18:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 18:16 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 18:16 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 18:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 18:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 18:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 18:08 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 18:06 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 18:05 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 18:02 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 18:01 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 17:36 Amir1: bash Krinkle> Vatican Interm Papacy Runbook, § 5.1: Notify Wikipedia about incoming traffic.
- 17:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2038.codfw.wmnet with OS bullseye
- 17:13 zabe@deploy1002: Finished scap: T233004 (duration: 18m 50s)
- 17:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
- 16:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
- 16:54 zabe@deploy1002: Started scap: T233004
- 16:54 zabe@deploy1002: backport aborted: (duration: 15m 22s)
- 16:48 godog: roll-restart opensearch-dashboards in logstash collectors eqiad - T327161
- 16:44 zabe@deploy1002: Started scap: Backport for Add ability to start from cuc_id to populateCucComment (T233004)
- 16:42 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2038.codfw.wmnet with OS bullseye
- 16:27 moritzm: installing cryptsetup updates for bullseye
- 16:18 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=1) rolling restart_daemons on A:logstash-collector
- 16:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
- 16:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
- 16:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
- 16:06 jclark@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:55 sukhe: update pybal to 1.15.10 on lvs4010: T321191
- 15:45 effie: enable puppet on C:memcached hosts
- 15:42 godog: bounce opensearch on logstash102[34] - T327161
- 15:30 sukhe: reprepro -C main include buster-wikimedia pybal_1.15.10_amd64.changes: T321191
- 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43194 and previous config saved to /var/cache/conftool/dbconfig/20230119-151917-ladsgroup.json
- 15:17 effie: disable puppet on all C:memcached servers to deploy 812173
- 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43193 and previous config saved to /var/cache/conftool/dbconfig/20230119-150412-ladsgroup.json
- 14:57 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43192 and previous config saved to /var/cache/conftool/dbconfig/20230119-144907-ladsgroup.json
- 14:47 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 14:40 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43191 and previous config saved to /var/cache/conftool/dbconfig/20230119-143402-ladsgroup.json
- 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 14:32 zabe: run populateCulComment on group2 wikis # T327290
- 14:30 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 14:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 13:58 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 12:27 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps2009.codfw.wmnet
- 12:19 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps2009.codfw.wmnet
- 12:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 12:06 moritzm: stopping/masking slapd on ldap-corp1001/ldap-corp2001 T323820
- 11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1054.eqiad.wmnet with OS bullseye
- 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 11:29 hnowlan: rebooting maps-codfw for updates
- 11:29 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps1009.eqiad.wmnet
- 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf2004.codfw.wmnet
- 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
- 11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet
- 11:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 11:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
- 11:18 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
- 11:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
- 11:13 filippo@cumin1001: START - Cookbook sre.dns.netbox
- 11:09 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf2004.codfw.wmnet
- 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf1004.eqiad.wmnet
- 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
- 11:06 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
- 11:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1054.eqiad.wmnet with OS bullseye
- 11:02 filippo@cumin1001: START - Cookbook sre.dns.netbox
- 10:58 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf1004.eqiad.wmnet
- 10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 10:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
- 10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 10:44 hnowlan: rebooting maps-eqiad for updates
- 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 10:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
- 10:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
- 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
- 10:17 claime: Restarted maintenance scripts on mwmaint1002.eqiad.wmnet
- 10:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
- 10:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
- 10:15 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
- 10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint1002.eqiad.wmnet
- 10:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint1002.eqiad.wmnet
- 10:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
- 10:06 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
- 10:05 claime: Stopping maintenance scripts on mwmaint1002.eqiad.wmnet for reboot
- 09:55 moritzm: installing ping3003 T273509
- 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
- 09:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
- 09:24 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.19 refs T325582
- 09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 08:26 moritzm: installing sudo security updates
- 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 06:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
- 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
- 06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
- 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2118 T327372', diff saved to https://phabricator.wikimedia.org/P43190 and previous config saved to /var/cache/conftool/dbconfig/20230119-060449-ladsgroup.json
- 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2121 to s7 primary T327372', diff saved to https://phabricator.wikimedia.org/P43189 and previous config saved to /var/cache/conftool/dbconfig/20230119-060316-ladsgroup.json
- 06:02 Amir1: Starting s7 codfw failover from db2118 to db2121 - T327372
- 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2121 with weight 0 T327372', diff saved to https://phabricator.wikimedia.org/P43188 and previous config saved to /var/cache/conftool/dbconfig/20230119-054243-ladsgroup.json
- 05:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372
- 05:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372
2023-01-18
- 23:47 zabe: run populateCulComment.php on all group0 and group1 wikis # T327290
- 23:42 cstone: civicrm upgraded from 164270b0 to f6093fb2
- 22:35 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646
- 22:03 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646
- 21:50 kindrobot: close UTC late backport window
- 21:50 kindrobot@deploy1002: Finished scap: Backport for [config]: Undeploy GDI Safety Survey Wave 4 (T327296) (duration: 10m 45s)
- 21:41 kindrobot@deploy1002: essexigyan and kindrobot: Backport for [config]: Undeploy GDI Safety Survey Wave 4 (T327296) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 21:39 kindrobot@deploy1002: Started scap: Backport for [config]: Undeploy GDI Safety Survey Wave 4 (T327296)
- 21:36 kindrobot@deploy1002: Finished scap: Backport for Bump English Wikipedia event logging from 0.5 to 1% (T326892), Legacy Vector is not a responsive skin (T327256) (duration: 13m 01s)
- 21:25 kindrobot@deploy1002: kindrobot and jdlrobson: Backport for Bump English Wikipedia event logging from 0.5 to 1% (T326892), Legacy Vector is not a responsive skin (T327256) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 21:23 kindrobot@deploy1002: Started scap: Backport for Bump English Wikipedia event logging from 0.5 to 1% (T326892), Legacy Vector is not a responsive skin (T327256)
- 21:08 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS bullseye
- 21:05 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS bullseye
- 21:03 kindrobot: start UTC late backport window
- 20:54 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
- 20:51 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
- 20:49 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
- 20:48 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
- 20:36 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS bullseye
- 20:35 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS bullseye
- 20:34 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
- 20:34 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
- 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS buster
- 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 19:52 bblack: db1129 and lvs1017: removed misconfigured IP address in wrong vlan from eno1 and /e/n/i
- 19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS buster
- 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 19:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
- 19:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
- 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
- 19:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
- 19:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS buster
- 18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS buster
- 18:21 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable the REST API on test-wikidata (T324999) (duration: 09m 38s)
- 18:14 lucaswerkmeister-wmde@deploy1002: migr and lucaswerkmeister-wmde: Backport for Enable the REST API on test-wikidata (T324999) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 18:12 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable the REST API on test-wikidata (T324999)
- 17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
- 17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
- 17:44 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 560 hosts
- 17:44 jnuche@deploy1002: Installing scap version "4.33.0" for 560 hosts
- 17:42 jnuche@deploy1002: install-world aborted: (duration: 07m 17s)
- 17:42 btullis@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
- 17:41 btullis@deploy1002: Installing scap version "4.33.0" for 1 hosts
- 17:35 jnuche@deploy1002: Installing scap version "4.33.0" for 561 hosts
- 17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1037']
- 17:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
- 17:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1037']
- 17:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
- 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1036']
- 16:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1036']
- 16:45 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
- 16:45 jnuche@deploy1002: Installing scap version "4.33.0" for 1 hosts
- 16:39 jdrewniak@deploy1002: Finished scap: Backport for [100%] English Wikipedia uses Vector 2022 skin (duration: 09m 27s)
- 16:31 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [100%] English Wikipedia uses Vector 2022 skin synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 16:29 jdrewniak@deploy1002: Started scap: Backport for [100%] English Wikipedia uses Vector 2022 skin
- 16:20 jdrewniak@deploy1002: Finished scap: Backport for [75%] English Wikipedia uses Vector 2022 skin (T326892) (duration: 09m 24s)
- 16:13 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [75%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 16:11 jdrewniak@deploy1002: Started scap: Backport for [75%] English Wikipedia uses Vector 2022 skin (T326892)
- 16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
- 16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
- 15:58 jdrewniak@deploy1002: Finished scap: Backport for [50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892) (duration: 08m 52s)
- 15:51 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 15:49 jdrewniak@deploy1002: Started scap: Backport for [50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)
- 15:44 jdrewniak@deploy1002: Finished scap: Backport for [25%] English Wikipedia uses Vector 2022 skin (T326892) (duration: 09m 06s)
- 15:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1052.eqiad.wmnet with OS bullseye
- 15:37 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:37 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:36 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [25%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 15:35 jdrewniak@deploy1002: Started scap: Backport for [25%] English Wikipedia uses Vector 2022 skin (T326892)
- 15:31 urandom: re-enabling Cassandra hinted-handoff for codfw -- T327001
- 15:29 jdrewniak@deploy1002: Finished scap: Backport for [10%] English Wikipedia uses Vector 2022 skin (T326892) (duration: 11m 30s)
- 15:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
- 15:19 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [10%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 15:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
- 15:17 jdrewniak@deploy1002: Started scap: Backport for [10%] English Wikipedia uses Vector 2022 skin (T326892)
- 15:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990) (duration: 09m 11s)
- 15:13 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
- 15:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1052.eqiad.wmnet with OS bullseye
- 15:06 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 15:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990)
- 15:04 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert gallery changes in 1.40.0-wmf.18 (T326990) (duration: 13m 04s)
- 15:01 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
- 14:57 moritzm: uploaded python-jose 3.3.0+dfsg-4~wmf11u1 to apt.wikmedia.org (needed by python-social-auth/Bitu)
- 14:53 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for Revert gallery changes in 1.40.0-wmf.18 (T326990) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 14:51 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert gallery changes in 1.40.0-wmf.18 (T326990)
- 14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Breaking upgrade: mapdata" (T327151) (duration: 10m 33s)
- 14:37 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and wmde-fisch: Backport for Revert "Breaking upgrade: mapdata" (T327151) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 14:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Breaking upgrade: mapdata" (T327151)
- 14:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Write to cul_reason[_plaintext]_id everywhere (T233004) (duration: 19m 54s)
- 14:23 moritzm: installing mod-wsgi security updates
- 14:16 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and dreamyjazz: Backport for Write to cul_reason[_plaintext]_id everywhere (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 14:14 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Write to cul_reason[_plaintext]_id everywhere (T233004)
- 13:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
- 13:16 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
- 12:20 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
- 11:54 volans: upgraded cumin on cumin1001 to 4.2.0-1+deb11u1
- 11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
- 11:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
- 11:42 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
- 11:27 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
- 11:16 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 11:16 volans@cumin1001: START - Cookbook sre.network.cf
- 11:15 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 11:15 volans@cumin1001: START - Cookbook sre.network.cf
- 11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1050.eqiad.wmnet with OS bullseye
- 11:11 volans@cumin2002: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
- 11:11 volans@cumin2002: START - Cookbook sre.network.cf
- 11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
- 11:10 volans@cumin1001: START - Cookbook sre.network.cf
- 11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
- 11:10 volans@cumin1001: START - Cookbook sre.network.cf
- 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 T326116', diff saved to https://phabricator.wikimedia.org/P43185 and previous config saved to /var/cache/conftool/dbconfig/20230118-110716-marostegui.json
- 10:59 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 10:59 volans@cumin1001: START - Cookbook sre.network.cf
- 10:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
- 10:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
- 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43184 and previous config saved to /var/cache/conftool/dbconfig/20230118-105106-marostegui.json
- 10:49 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
- 10:48 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
- 10:43 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1050.eqiad.wmnet with OS bullseye
- 10:21 zabe@deploy1002: Finished scap: Backport for Start reading from cuc_comment_id from a few wikis (T233004) (duration: 09m 17s)
- 10:14 zabe@deploy1002: zabe and zabe: Backport for Start reading from cuc_comment_id from a few wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 10:12 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
- 10:12 zabe@deploy1002: Started scap: Backport for Start reading from cuc_comment_id from a few wikis (T233004)
- 09:51 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:51 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:49 godog: start migration from webperf1004 to arclamp1001 - T319434
- 09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
- 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
- 09:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
- 09:33 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
- 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
- 09:24 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.19 refs T325582 (duration: 08m 20s)
- 09:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.19 refs T325582
- 08:54 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
- 08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet
- 08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=ms-fe2010.codfw.wmnet
- 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query,name=codfw
- 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-swift,name=codfw
- 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
- 08:30 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
- 07:56 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
- 02:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
- 02:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
- 02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
- 02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
- 01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
- 01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
- 01:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
- 01:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
- 01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
- 01:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
- 01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
- 01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
- 00:28 zabe: enwiki: rename the "discretionary sanctions alert" tag to "contentious topics alert" # T327118
- 00:26 zabe@deploy1002: Finished scap: Backport for Add script to rename a change tag in wmf prod (T327118) (duration: 08m 29s)
- 00:20 zabe@deploy1002: zabe and zabe: Backport for Add script to rename a change tag in wmf prod (T327118) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 00:18 zabe@deploy1002: Started scap: Backport for Add script to rename a change tag in wmf prod (T327118)
- 00:08 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=180p.vp9.webm # T312153
- 00:07 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=120p.vp9.webm # T312153
2023-01-17
- 23:51 zabe: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "User:Amire80/frg" "Movement Multilingual Termbase" "Zabe" "per request T327149" # T327149
- 23:33 zabe@deploy1002: Finished scap: Backport for Start reading from cuc_comment_id on testwiki (T233004), Start reading from cuc_actor everywhere (T233004) (duration: 09m 58s)
- 23:25 zabe@deploy1002: zabe and zabe: Backport for Start reading from cuc_comment_id on testwiki (T233004), Start reading from cuc_actor everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 23:24 zabe@deploy1002: Started scap: Backport for Start reading from cuc_comment_id on testwiki (T233004), Start reading from cuc_actor everywhere (T233004)
- 23:19 zabe@deploy1002: Finished scap: Backport for Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004), Revert "Add read new support for cu_log comment ID columns" (T327219) (duration: 11m 46s)
- 23:09 zabe@deploy1002: zabe and dreamyjazz and zabe: Backport for Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004), Revert "Add read new support for cu_log comment ID columns" (T327219) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 23:07 zabe@deploy1002: Started scap: Backport for Revert "Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"" (T233004), Revert "Add read new support for cu_log comment ID columns" (T327219)
- 23:06 zabe@deploy1002: Finished scap: Backport for Stop writing to cul_user and cul_user_text everywhere (T233004), Start writing to rev_comment_id everywhere (T299954) (duration: 10m 29s)
- 22:57 zabe@deploy1002: zabe and zabe: Backport for Stop writing to cul_user and cul_user_text everywhere (T233004), Start writing to rev_comment_id everywhere (T299954) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
- 22:55 zabe@deploy1002: Started scap: Backport for Stop writing to cul_user and cul_user_text everywhere (T233004), Start writing to rev_comment_id everywhere (T299954)
- 22:51 bblack: repooling codfw
- 22:48 ebernhardson@deploy1002: Finished scap: Backport for Make sticky header edit button default for all wikis (T324799) (duration: 10m 34s)
- 22:39 ebernhardson@deploy1002: ebernhardson and jdrewniak: Backport for Make sticky header edit button default for all wikis (T324799) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
- 22:38 ebernhardson@deploy1002: Started scap: Backport for Make sticky header edit button default for all wikis (T324799)
- 22:30 volans@cumin1001: conftool action : set/pooled=inactive; selector: name=non-existent1001
- 22:27 ebernhardson@deploy1002: Finished scap: Backport for Resolve deprecations and type changes in elastica 7.3.0, UpdateSuggesterIndex: Properly cleanup bad indices (duration: 09m 42s)
- 22:25 bblack: cp2031: restart ats-be
- 22:20 ebernhardson@deploy1002: ebernhardson and ebernhardson: Backport for Resolve deprecations and type changes in elastica 7.3.0, UpdateSuggesterIndex: Properly cleanup bad indices synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 22:18 ebernhardson@deploy1002: Started scap: Backport for Resolve deprecations and type changes in elastica 7.3.0, UpdateSuggesterIndex: Properly cleanup bad indices
- 22:14 ebernhardson@deploy1002: Finished scap: Backport for Show edit button in sticky header for desktop-improvement wikis (T324799) (duration: 10m 43s)
- 22:05 ebernhardson@deploy1002: ebernhardson and jdrewniak: Backport for Show edit button in sticky header for desktop-improvement wikis (T324799) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 22:04 ebernhardson@deploy1002: Started scap: Backport for Show edit button in sticky header for desktop-improvement wikis (T324799)
- 21:54 ebernhardson: Finished scap: Backport for Table of contents Collapse/Expand not working (T327064)
- 21:54 ebernhardson@deploy1002: Finished scap: Backport for Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis" (duration: 09m 20s)
- 21:52 zabe: zabe@mwmaint1002:~$ mwscript extensions/CheckUser/maintenance/populateCulComment.php --wiki testwiki
- 21:46 ebernhardson@deploy1002: ebernhardson and trainbranchbot: Backport for Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 21:44 ebernhardson@deploy1002: Started scap: Backport for Revert "Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis"
- 21:42 ebernhardson@deploy1002: Sync cancelled.
- 21:35 ebernhardson@deploy1002: ebernhardson and dreamyjazz: Backport for Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
- 21:34 ebernhardson: scap also backporting Table of contents Collapse/Expand not working (T327064)
- 21:34 ebernhardson@deploy1002: Started scap: Backport for Start writing to cul_reason[_plaintext]_id on group0 and group1 wikis (T233004)
- 21:29 ebernhardson@deploy1002: Finished scap: Backport for Enable Phonos on afwiktionary and arwiki (T324561) (duration: 12m 21s)
- 21:18 ebernhardson@deploy1002: ebernhardson and hmonroy: Backport for Enable Phonos on afwiktionary and arwiki (T324561) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 21:17 ebernhardson@deploy1002: Started scap: Backport for Enable Phonos on afwiktionary and arwiki (T324561)
- 21:00 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo pool` (had been left depooled from previous powercycle)
- 20:47 ryankemper: [WDQS] Depooled `wdqs1016`
- 20:25 herron: ran preferred-replica-election on kafka-logging codfw to clear replica imbalance
- 20:18 ryankemper: [WDQS] Restart blazegraph on `wdqs1016` to clear alert: `ryankemper@wdqs1016:~$ sudo systemctl restart wdqs-blazegraph`
- 20:06 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.19 refs T325582
- 20:04 eileen: config revision changed from 2e5cee3c to 7425df0b
- 19:50 ryankemper: T327175 Reprocessing last several hours of updates (`2023-01-17T12:00:00Z` -> `2023-01-17T17:30:00Z`) on codfw elasticsearch, running on `ryankemper@mwmaint2002` tmux session `reindex`
- 19:43 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
- 19:43 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
- 19:41 zabe@deploy1002: Finished scap: Backport for Revert "Revert "Enable visual enhancements on all talk namespaces"" (duration: 10m 25s)
- 19:32 zabe@deploy1002: zabe and zabe: Backport for Revert "Revert "Enable visual enhancements on all talk namespaces"" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 19:30 zabe@deploy1002: Started scap: Backport for Revert "Revert "Enable visual enhancements on all talk namespaces""
- 18:48 zabe@deploy1002: Finished scap: Backport for Revert "Enable visual enhancements on all talk namespaces" (duration: 09m 08s)
- 18:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
- 18:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
- 18:41 zabe@deploy1002: zabe and zabe: Backport for Revert "Enable visual enhancements on all talk namespaces" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 18:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 18:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 18:39 zabe@deploy1002: Started scap: Backport for Revert "Enable visual enhancements on all talk namespaces"
- 18:39 zabe@deploy1002: backport aborted: (duration: 00m 26s)
- 18:35 zabe@deploy1002: backport aborted: (duration: 19m 41s)
- 18:29 otto@deploy1002: Finished deploy [analytics/refinery@55f90ac]: Regular analytics weekly train [analytics/refinery@55f90ac] (duration: 04m 28s)
- 18:29 otto@deploy1002: Finished deploy [airflow-dags/analytics@8d0e919]: Regular analytics weekly train @8d0e919] (duration: 00m 15s)
- 18:29 otto@deploy1002: Started deploy [airflow-dags/analytics@8d0e919]: Regular analytics weekly train @8d0e919]
- 18:25 otto@deploy1002: Started deploy [analytics/refinery@55f90ac]: Regular analytics weekly train [analytics/refinery@55f90ac]
- {{safesubst:SAL entry|1=18:25 zabe@deploy1002: zabe and matmarex and zabe: Backport for objectcache: Fix DI for MultiWriteBagOStuff sub caches (T327158), Use new DiscussionTools heading markup on enwiki (T314714), Add "Clear Affordances" to DiscussionTools beta feature on remaining wikis (T321955), Add "Page Frame" to DiscussionTools beta feature on partner wikis (T317907), [[}}
- {{safesubst:SAL entry|1=18:23 zabe@deploy1002: Started scap: Backport for objectcache: Fix DI for MultiWriteBagOStuff sub caches (T327158), Use new DiscussionTools heading markup on enwiki (T314714), Add "Clear Affordances" to DiscussionTools beta feature on remaining wikis (T321955), Add "Page Frame" to DiscussionTools beta feature on partner wikis (T317907), [[gerrit:879103|}}
- 18:13 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
- 18:10 mutante: gerrit1002/gerrit2002: sudo rmdir /srv/gerrit/jvmlogs
- 18:07 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
- 18:07 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
- 18:05 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
- 18:01 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-wikikube-rw,name=codfw
- 17:58 jynus: restarted es5 codfw backup
- 17:54 bblack: authdns1001: restart confd
- 17:27 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=aqs,name=codfw
- 17:19 effie: pooling back codfw services
- 17:17 bblack: removing errant 2620:0:860:118: IPs from primary interfaces of hosts in B2
- 17:01 effie: restarting confd on deploy1002
- 16:59 effie: pooling back depooled mw servers in codfw
- 16:44 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
- 16:44 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Shutting down for RAID controller BBU replacement
- 16:32 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1_amd64.changes: T325557
- 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43179 and previous config saved to /var/cache/conftool/dbconfig/20230117-162100-ladsgroup.json
- 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43178 and previous config saved to /var/cache/conftool/dbconfig/20230117-160555-ladsgroup.json
- 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43177 and previous config saved to /var/cache/conftool/dbconfig/20230117-155050-ladsgroup.json
- 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43175 and previous config saved to /var/cache/conftool/dbconfig/20230117-153545-ladsgroup.json
- 15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 15:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 15:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 15:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 15:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 15:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 14:56 urandom: truncating hints for Cassandra nodes in codfw row b -- T327001
- 14:52 urandom: disabling Cassandra hinted-handoff for codfw -- T327001
- 14:27 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 14:26 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
- 14:12 _joe_: try to restart cassandra-a on aqs2005
- 13:37 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=recommendation-api,name=codfw
- 13:35 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-query,name=codfw
- 13:35 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-swift,name=codfw
- 13:27 jynus: restarting manually replication on es2020, may require data check afterwards
- 13:26 _joe_: depooling all services in codfw
- 13:19 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool mobileapps in codfw: maintenance
- 13:15 mvernon@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
- 13:14 oblivian@cumin1001: START - Cookbook sre.discovery.service-route depool mobileapps in codfw: maintenance
- 13:13 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check citoid: maintenance
- 13:13 oblivian@cumin1001: START - Cookbook sre.discovery.service-route check citoid: maintenance
- 13:08 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
- 13:01 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
- 13:01 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=.*
- 12:35 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
- 12:35 moritzm: installing ipython security updates
- 11:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1048.eqiad.wmnet with OS bullseye
- 11:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
- 11:16 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: host reimage
- 11:08 volans: upgraded cumin on cumin2002 to 4.2.0-1+deb11u1
- 11:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1048.eqiad.wmnet with OS bullseye
- 10:16 godog: restart opensearch_2@production-elk7-eqiad.service on logstash102[34]
- 10:12 jnuche@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
- 10:11 jnuche@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.19 refs T325582 (duration: 42m 26s)
- 09:42 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9568478]: (no justification provided) (duration: 00m 12s)
- 09:42 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9568478]: (no justification provided)
- 09:28 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.19 refs T325582
- 09:26 jnuche@deploy1002: scap failed: PermissionError [Errno 13] Permission denied: '/home/jnuche/scap-image-build-and-push-log' (duration: 00m 50s)
- 09:26 jnuche@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.19 refs T325582
- 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 08:47 ladsgroup@deploy1002: Finished scap: Backport for Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004) (duration: 13m 50s)
- 08:35 ladsgroup@deploy1002: ladsgroup and dreamyjazz: Backport for Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 08:33 ladsgroup@deploy1002: Started scap: Backport for Start writing to cul_reason_id and cul_reason_plaintext_id on testwiki (T233004)
- 08:29 kartik@deploy1002: Finished scap: Backport for testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667) (duration: 20m 56s)
- 08:26 zabe: zabe@mwmaint1002:~$ mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=zhwiki --namespaceName='USER_TALK' # T327146
- 08:13 kartik@deploy1002: kartik and kartik: Backport for testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 08:08 kartik@deploy1002: Started scap: Backport for testwiki: Use Parsoid in Mediawiki Core for Content Translation (T323667)
- 07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43168 and previous config saved to /var/cache/conftool/dbconfig/20230117-075222-ladsgroup.json
- 07:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43167 and previous config saved to /var/cache/conftool/dbconfig/20230117-073717-ladsgroup.json
- 07:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43166 and previous config saved to /var/cache/conftool/dbconfig/20230117-072212-ladsgroup.json
- 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43165 and previous config saved to /var/cache/conftool/dbconfig/20230117-070707-ladsgroup.json
- 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1173 T326134', diff saved to https://phabricator.wikimedia.org/P43164 and previous config saved to /var/cache/conftool/dbconfig/20230117-070532-ladsgroup.json
- 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1131 to s6 primary and set section read-write T326134', diff saved to https://phabricator.wikimedia.org/P43163 and previous config saved to /var/cache/conftool/dbconfig/20230117-070102-ladsgroup.json
- 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T326134', diff saved to https://phabricator.wikimedia.org/P43162 and previous config saved to /var/cache/conftool/dbconfig/20230117-070035-ladsgroup.json
- 07:00 Amir1: Starting s6 eqiad failover from db1173 to db1131 - T326134
- 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T326134', diff saved to https://phabricator.wikimedia.org/P43160 and previous config saved to /var/cache/conftool/dbconfig/20230117-060710-ladsgroup.json
- 06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T326134
- 06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T326134
2023-01-16
- 17:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
- 17:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
- 17:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 17:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 17:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 16:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 16:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1044.eqiad.wmnet with OS bullseye
- 16:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
- 16:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
- 16:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1044.eqiad.wmnet with OS bullseye
- 16:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1042.eqiad.wmnet with OS bullseye
- 16:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
- 15:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
- 15:47 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1042.eqiad.wmnet with OS bullseye
- 13:35 XioNoX: disable one of 3 cr1-cr2 eqiad links - T304712
- 13:34 XioNoX: repool eqiad-eqord link - T304712
- 12:56 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
- 12:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
- 12:50 XioNoX: drain eqiad-eqord link - T304712
- 12:47 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
- 12:43 Amir1: power cycled db1198
- 12:36 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
- 12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[5-9].eqiad.wmnet
- 12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102[012].eqiad.wmnet
- 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102.eqiad.wmnet
- 12:05 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
- 12:02 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
- 11:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 11:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 11:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 11:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 11:32 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
- 11:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 11:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 10:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
- 10:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 10:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 10:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 10:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 10:48 moritzm: installing libtasn1-6 security updates on Bullseye
- 10:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
- 08:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
- 08:46 elukey: powercycle an-worker1125 - soft lockup traces registered in the tty, host frozen
- 08:14 oblivian@deploy1002: Synchronized README: test null deployment for T327041 (duration: 07m 12s)
- 08:09 Emperor: stopped swift_rclone_sync on ms-be1069
- 07:48 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse20(0[6-9]|10).codfw.wmnet
- 07:44 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw23([12][0-9]|3[0-4]).codfw.wmnet
- 07:41 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw22(59|6[0-9]|70).codfw.wmnet
- 07:26 _joe_: restarting pybal on lvs2009
- 07:10 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=(mw.*|appservers|api)-ro,name=codfw
- 07:10 _joe_: depooling mediawiki in codfw
- 06:47 XioNoX: add 2001:67c:930::/48 to network:external in data.yaml
- 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
- 06:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
- 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1198 maint', diff saved to https://phabricator.wikimedia.org/P43157 and previous config saved to /var/cache/conftool/dbconfig/20230116-062211-ladsgroup.json
- 02:25 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=parsoid-php
- 02:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,service=nginx
- 02:01 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver,service=nginx
- 01:51 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw2283.codfw.wmnet
- 01:35 Amir1: rolling restart of php-fpm across the fleet
- 01:30 thcipriani: 01:29:56 php-fpm-restart: 100% (in-flight: 0; ok: 184; fail: 112; left: 0)
- 01:29 thcipriani@deploy1002: Finished scap: Backport for LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788) (duration: 24m 47s)
- 01:15 thcipriani@deploy1002: thcipriani and func: Backport for LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 01:05 thcipriani@deploy1002: Started scap: Backport for LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)
2023-01-14
- 09:46 godog: issue 'request system reboot member 2' - T327001
- 09:20 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
- 09:19 Emperor: depool thanos-fe2002 T327001
- 09:19 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=ms-fe2010.codfw.wmnet
- 09:19 Emperor: depool ms-fe2010 T327001
2023-01-13
- 23:39 mutante: people2002 - systemctl reset-failed after removing auto_restart_rsync timers
- 22:26 mutante: mirror1001 - systemctl start update-ubuntu-mirror (sometimes sync fails)
- 20:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1011']
- 20:58 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
- 20:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1011']
- 20:49 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
- 20:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['druid1011']
- 20:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
- 20:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1010']
- 20:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
- 20:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1010']
- 20:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
- 20:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
- 20:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1009']
- 20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
- 20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
- 20:04 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict2001.codfw.wmnet
- 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bullseye
- 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict2001.codfw.wmnet on all recursors
- 19:54 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache aphlict2001.codfw.wmnet on all recursors
- 19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:54 dzahn@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
- 19:52 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
- 19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 19:49 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 19:49 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host aphlict2001.codfw.wmnet
- 19:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1001.eqiad.wmnet with OS bullseye
- 19:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 19:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 19:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
- 19:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
- 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
- 19:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
- 19:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bullseye
- 18:25 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Green Giant" "Cromium" # T298707
- 17:34 thcipriani@deploy1002: Finished scap: Backport for TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125) (duration: 13m 25s)
- 17:22 thcipriani@deploy1002: thcipriani and abi: Backport for TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 17:20 thcipriani@deploy1002: Started scap: Backport for TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)
- 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1004.eqiad.wmnet with OS bullseye
- 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 15:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 15:24 jynus: restarted again update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
- 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastion - jmm@cumin2002"
- 15:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1001.eqiad.wmnet with OS bullseye
- 15:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastion - jmm@cumin2002"
- 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1003.eqiad.wmnet with OS bullseye
- 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 15:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
- 15:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
- 15:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1004.eqiad.wmnet with OS bullseye
- 14:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
- 14:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
- 14:49 volans: uploaded cumin_4.2.0 to apt.wikimedia.org bullseye-wikimedia
- 14:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1003.eqiad.wmnet with OS bullseye
- 12:48 moritzm: installing bast6002 T324974
- 12:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
- 12:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
- 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastions - jmm@cumin2002"
- 11:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastions - jmm@cumin2002"
- 10:53 moritzm: installing bast5003 T324974
- 10:49 jynus: restarting update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
- 09:41 moritzm: installing bast4004 T324974
- 09:06 moritzm: installing bast3006 T324974
- 02:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
- 02:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
- 02:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
- 02:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1002']
- 01:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
- 01:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1001']
- 01:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
- 01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1002']
- 01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1001']
- 01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
- 01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
- 01:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1004']
- 01:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
- 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1003']
- 01:02 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
- 00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1004']
- 00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1003']
- 00:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
- 00:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
- 00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:15 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED
2023-01-12
- 23:53 zabe: start running cuc_comment_id population script on rest of sections in screens with --sleep 2 # T233004
- 23:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:13 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3 (duration: 02m 31s)
- 23:10 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3
- 23:08 sbassett: Deployed (temporary) security mitigations for T326691
- 22:45 mutante: people2002 - apt-get remove --purge rsync
- 22:08 zabe: start of "foreachwikiindblist s3.dblist extensions/CheckUser/maintenance/populateCucComment.php" in a screen in mwmaint1002 # T233004
- 22:07 thcipriani: end UTC late backport
- 22:06 thcipriani@deploy1002: Finished scap: Backport for cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), cirrus: Disable incoming link counting (T317023) (duration: 09m 23s)
- 21:59 krinkle@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 08s)
- 21:59 krinkle@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
- 21:59 Krinkle: krinkle@deploy1002$ `scap install-world -v --limit-hosts` for webperf1003.eqiad and webperf2003.codfw, ref T326668
- 21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
- 21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
- 21:58 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), cirrus: Disable incoming link counting (T317023) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
- 21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
- 21:57 thcipriani@deploy1002: Started scap: Backport for cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), cirrus: Disable incoming link counting (T317023)
- 21:56 zabe: run populateCucComment.php on testwiki # T233004
- 21:48 thcipriani@deploy1002: Finished scap: Backport for nlwiki: Add block right to checkuser group (T326355) (duration: 09m 04s)
- 21:41 thcipriani@deploy1002: thcipriani and stang: Backport for nlwiki: Add block right to checkuser group (T326355) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 21:39 thcipriani@deploy1002: Started scap: Backport for nlwiki: Add block right to checkuser group (T326355)
- 21:37 thcipriani@deploy1002: Finished scap: Backport for looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757) (duration: 09m 10s)
- 21:30 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 21:28 thcipriani@deploy1002: Started scap: Backport for looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)
- 21:27 thcipriani@deploy1002: Finished scap: Backport for etwikiquote: Switch logo variant back (T313698) (duration: 09m 25s)
- 21:21 ejegg: restarted fundraising scheduled jobs
- 21:19 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
- 21:19 thcipriani@deploy1002: thcipriani and stang: Backport for etwikiquote: Switch logo variant back (T313698) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 21:17 thcipriani@deploy1002: Started scap: Backport for etwikiquote: Switch logo variant back (T313698)
- 21:16 thcipriani@deploy1002: Finished scap: Backport for Remove Beta Feature for Realtime Preview and enable on plwiki (T323033) (duration: 10m 43s)
- 21:07 thcipriani@deploy1002: thcipriani and samwilson: Backport for Remove Beta Feature for Realtime Preview and enable on plwiki (T323033) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
- 21:05 thcipriani@deploy1002: Started scap: Backport for Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)
- 20:43 ejegg: rolled back CiviCRM to 9afd2789
- 20:31 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
- 20:29 ejegg: disabled fundraising scheduled jobs for civi deploy
- 20:08 brett: Setting thread_pool_max for varnish-frontend to 12000
- 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 T326116', diff saved to https://phabricator.wikimedia.org/P43148 and previous config saved to /var/cache/conftool/dbconfig/20230112-195922-marostegui.json
- 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43147 and previous config saved to /var/cache/conftool/dbconfig/20230112-195651-marostegui.json
- 19:55 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 (mariadb 11) to dbctl, depooled T326116', diff saved to https://phabricator.wikimedia.org/P43146 and previous config saved to /var/cache/conftool/dbconfig/20230112-195514-marostegui.json
- 19:11 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.18 refs T325581
- 18:36 mutante: stat1008 - systemctl reset-failed - clears Icinga alerts from failed things of the past
- 18:35 mutante: stat1007 - systemctl reset-failed - clears Icinga alerts
- 18:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
- 18:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
- 17:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
- 17:45 mutante: powercycling mc2040 via mgmt ocnsole
- 17:34 ejegg: civicrm rolled back from 7ecb5038 to 9afd2789
- 17:08 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
- 17:08 btullis@cumin1001: Added views for new wiki: aswikiquote T321294
- 17:05 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
- 16:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
- 16:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 16:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 16:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 16:43 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
- 16:34 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 16:31 zabe@deploy1002: Finished scap: Backport for Stop writing to cul_user and cul_user_text on a few wikis (T233004), Start writing to rev_comment_id on group1 wikis (T299954) (duration: 09m 49s)
- 16:23 zabe@deploy1002: zabe and zabe: Backport for Stop writing to cul_user and cul_user_text on a few wikis (T233004), Start writing to rev_comment_id on group1 wikis (T299954) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 16:21 zabe@deploy1002: Started scap: Backport for Stop writing to cul_user and cul_user_text on a few wikis (T233004), Start writing to rev_comment_id on group1 wikis (T299954)
- 16:14 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 16:08 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
- 16:08 btullis@cumin1001: Added views for new wiki: bjnwiktionary T312214
- 15:47 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
- 15:46 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
- 15:44 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
- 15:36 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
- 15:36 btullis@cumin1001: Added views for new wiki: shnwikibooks T321256
- 15:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
- 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
- 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
- 15:28 effie: Planet import in codfw (on maps2009) started at 15:26 UTC - T314472
- 15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
- 15:11 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
- 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
- 15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
- 15:05 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
- 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2002.codfw.wmnet
- 14:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 14:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43138 and previous config saved to /var/cache/conftool/dbconfig/20230112-145441-marostegui.json
- 14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe2002.codfw.wmnet
- 14:50 moritzm: installing postgresql-11 security updates on puppetdb1002
- 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1002.eqiad.wmnet
- 14:42 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
- 14:42 btullis@cumin1001: Added views for new wiki: guwwikiquote T321288
- 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43137 and previous config saved to /var/cache/conftool/dbconfig/20230112-143934-marostegui.json
- 14:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1002.eqiad.wmnet
- 14:37 moritzm: installing sqlite3 security updates on buster
- 14:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1040.eqiad.wmnet with OS bullseye
- 14:34 taavi: UTC afternoon backports done
- 14:28 taavi@deploy1002: Finished scap: Backport for Track callers of parseRevisionParsoidHtml. (duration: 09m 34s)
- 14:26 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
- 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43136 and previous config saved to /var/cache/conftool/dbconfig/20230112-142428-marostegui.json
- 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
- 14:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
- 14:20 taavi@deploy1002: taavi and matmarex: Backport for Track callers of parseRevisionParsoidHtml. synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
- 14:18 taavi@deploy1002: Started scap: Backport for Track callers of parseRevisionParsoidHtml.
- 14:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
- 14:17 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
- 14:16 taavi@deploy1002: Finished scap: Backport for Allow administrators to revoke autopatroller rights on sh.WP (T325938) (duration: 13m 30s)
- 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43135 and previous config saved to /var/cache/conftool/dbconfig/20230112-140921-marostegui.json
- 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43134 and previous config saved to /var/cache/conftool/dbconfig/20230112-140659-marostegui.json
- 14:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1040.eqiad.wmnet with OS bullseye
- 14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43133 and previous config saved to /var/cache/conftool/dbconfig/20230112-140649-marostegui.json
- 14:05 taavi@deploy1002: taavi and aleksandar: Backport for Allow administrators to revoke autopatroller rights on sh.WP (T325938) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
- 14:03 taavi@deploy1002: Started scap: Backport for Allow administrators to revoke autopatroller rights on sh.WP (T325938)
- 13:53 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
- 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43132 and previous config saved to /var/cache/conftool/dbconfig/20230112-135143-marostegui.json
- 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43131 and previous config saved to /var/cache/conftool/dbconfig/20230112-133636-marostegui.json
- 13:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 13:29 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 13:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 13:28 ladsgroup@deploy1002: Finished scap: Backport for Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY. (duration: 21m 44s)
- 13:26 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 13:26 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43130 and previous config saved to /var/cache/conftool/dbconfig/20230112-132130-marostegui.json
- 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43129 and previous config saved to /var/cache/conftool/dbconfig/20230112-131908-marostegui.json
- 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 13:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43128 and previous config saved to /var/cache/conftool/dbconfig/20230112-131847-marostegui.json
- 13:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 13:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 13:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 13:08 ladsgroup@deploy1002: ladsgroup and daniel: Backport for Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY. synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 13:06 ladsgroup@deploy1002: Started scap: Backport for Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY.
- 13:05 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
- 13:05 btullis@cumin1001: Added views for new wiki: gorwiktionary T326138
- 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43127 and previous config saved to /var/cache/conftool/dbconfig/20230112-130341-marostegui.json
- 12:58 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 12:56 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43125 and previous config saved to /var/cache/conftool/dbconfig/20230112-124834-marostegui.json
- 12:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 12:41 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
- 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43123 and previous config saved to /var/cache/conftool/dbconfig/20230112-123328-marostegui.json
- 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43122 and previous config saved to /var/cache/conftool/dbconfig/20230112-123106-marostegui.json
- 12:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43121 and previous config saved to /var/cache/conftool/dbconfig/20230112-123045-marostegui.json
- 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43120 and previous config saved to /var/cache/conftool/dbconfig/20230112-121538-marostegui.json
- 12:13 XioNoX: repool esams
- 12:10 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 12:09 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 12:09 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 12:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 12:08 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 12:08 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 12:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43119 and previous config saved to /var/cache/conftool/dbconfig/20230112-120032-marostegui.json
- 11:54 XioNoX: re-seating cr2-esams fpc0 linecard - T318783
- 11:52 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43116 and previous config saved to /var/cache/conftool/dbconfig/20230112-114524-marostegui.json
- 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43115 and previous config saved to /var/cache/conftool/dbconfig/20230112-114302-marostegui.json
- 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
- 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
- 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43114 and previous config saved to /var/cache/conftool/dbconfig/20230112-114212-marostegui.json
- 11:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 11:39 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 11:37 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 11:29 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43113 and previous config saved to /var/cache/conftool/dbconfig/20230112-112705-marostegui.json
- 11:24 urbanecm@deploy1002: Finished scap: Backport for throttle: Add new rule for cswiki course (T326792) (duration: 07m 47s)
- 11:17 urbanecm@deploy1002: Started scap: Backport for throttle: Add new rule for cswiki course (T326792)
- 11:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
- 11:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
- 11:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3303
- 11:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
- 11:12 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3302
- 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43112 and previous config saved to /var/cache/conftool/dbconfig/20230112-111159-marostegui.json
- 11:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
- 11:11 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Defender" "Elton" # T298707
- 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43111 and previous config saved to /var/cache/conftool/dbconfig/20230112-105652-marostegui.json
- 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43110 and previous config saved to /var/cache/conftool/dbconfig/20230112-105430-marostegui.json
- 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43109 and previous config saved to /var/cache/conftool/dbconfig/20230112-105358-marostegui.json
- 10:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 36 hosts
- 10:49 ayounsi@cumin1001: START - Cookbook sre.hosts.remove-downtime for 36 hosts
- 10:41 hashar@deploy1002: Finished deploy [integration/docroot@577d68a]: zuul: Link to report_url if available (duration: 00m 14s)
- 10:41 hashar@deploy1002: Started deploy [integration/docroot@577d68a]: zuul: Link to report_url if available
- 10:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
- 10:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
- 10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8932
- 10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8932
- 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43108 and previous config saved to /var/cache/conftool/dbconfig/20230112-103852-marostegui.json
- 10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
- 10:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
- 10:24 XioNoX: rollback redirect ns2 to authdns1001 - T316532
- 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43107 and previous config saved to /var/cache/conftool/dbconfig/20230112-102345-marostegui.json
- 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43106 and previous config saved to /var/cache/conftool/dbconfig/20230112-100839-marostegui.json
- 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43105 and previous config saved to /var/cache/conftool/dbconfig/20230112-100616-marostegui.json
- 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
- 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
- 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
- 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
- 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43104 and previous config saved to /var/cache/conftool/dbconfig/20230112-100456-marostegui.json
- 10:01 XioNoX: reboot asw2-esams for upgrade - T316532
- 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3003.esams.wmnet
- 09:58 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
- 09:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
- 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping3003.esams.wmnet on all recursors
- 09:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping3003.esams.wmnet on all recursors
- 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
- 09:53 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
- 09:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping3003.esams.wmnet
- 09:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
- 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43103 and previous config saved to /var/cache/conftool/dbconfig/20230112-094950-marostegui.json
- 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2003.codfw.wmnet
- 09:47 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
- 09:47 btullis@cumin1001: Added views for new wiki: pcmwiki T310879
- 09:46 XioNoX: redirect ns2 to authdns1001 - T316532
- 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping2003.codfw.wmnet on all recursors
- 09:43 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping2003.codfw.wmnet on all recursors
- 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
- 09:41 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
- 09:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:39 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping2003.codfw.wmnet
- 09:37 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43102 and previous config saved to /var/cache/conftool/dbconfig/20230112-093443-marostegui.json
- 09:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
- 09:31 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
- 09:25 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc1039.eqiad.wmnet
- 09:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
- 09:24 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mc1039.eqiad.wmnet
- 09:22 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
- 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43101 and previous config saved to /var/cache/conftool/dbconfig/20230112-091937-marostegui.json
- 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43100 and previous config saved to /var/cache/conftool/dbconfig/20230112-091716-marostegui.json
- 09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
- 09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
- 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43099 and previous config saved to /var/cache/conftool/dbconfig/20230112-091654-marostegui.json
- 09:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
- 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43098 and previous config saved to /var/cache/conftool/dbconfig/20230112-090148-marostegui.json
- 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1003.eqiad.wmnet
- 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping1003.eqiad.wmnet on all recursors
- 08:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping1003.eqiad.wmnet on all recursors
- 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
- 08:55 phedenskog@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 22s)
- 08:54 phedenskog@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
- 08:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
- 08:54 phedenskog@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 17s)
- 08:53 phedenskog@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
- 08:50 XioNoX: depool esams for network maintenance - T316532
- 08:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping1003.eqiad.wmnet
- 08:49 zabe: deployed updated patch for T311337
- 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43097 and previous config saved to /var/cache/conftool/dbconfig/20230112-084641-marostegui.json
- 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast5003.wikimedia.org
- 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43096 and previous config saved to /var/cache/conftool/dbconfig/20230112-083135-marostegui.json
- 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43095 and previous config saved to /var/cache/conftool/dbconfig/20230112-082813-marostegui.json
- 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
- 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
- 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43094 and previous config saved to /var/cache/conftool/dbconfig/20230112-082752-marostegui.json
- 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5003.wikimedia.org on all recursors
- 08:17 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast5003.wikimedia.org on all recursors
- 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
- 08:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
- 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43093 and previous config saved to /var/cache/conftool/dbconfig/20230112-081245-marostegui.json
- 07:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 07:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast5003.wikimedia.org
- 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43092 and previous config saved to /var/cache/conftool/dbconfig/20230112-075739-marostegui.json
- 07:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584
- 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43091 and previous config saved to /var/cache/conftool/dbconfig/20230112-074232-marostegui.json
- 07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9584
- 07:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 37002
- 07:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 37002
- 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43090 and previous config saved to /var/cache/conftool/dbconfig/20230112-074010-marostegui.json
- 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
- 07:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
- 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43089 and previous config saved to /var/cache/conftool/dbconfig/20230112-073949-marostegui.json
- 07:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
- 07:38 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
- 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43088 and previous config saved to /var/cache/conftool/dbconfig/20230112-072443-marostegui.json
- 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43087 and previous config saved to /var/cache/conftool/dbconfig/20230112-070936-marostegui.json
- 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43086 and previous config saved to /var/cache/conftool/dbconfig/20230112-065430-marostegui.json
- 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43085 and previous config saved to /var/cache/conftool/dbconfig/20230112-065208-marostegui.json
- 06:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
- 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
- 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43084 and previous config saved to /var/cache/conftool/dbconfig/20230112-065147-marostegui.json
- 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43083 and previous config saved to /var/cache/conftool/dbconfig/20230112-063640-marostegui.json
- 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43082 and previous config saved to /var/cache/conftool/dbconfig/20230112-062134-marostegui.json
- 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43081 and previous config saved to /var/cache/conftool/dbconfig/20230112-060627-marostegui.json
- 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43080 and previous config saved to /var/cache/conftool/dbconfig/20230112-060404-marostegui.json
- 06:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
- 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
- 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43079 and previous config saved to /var/cache/conftool/dbconfig/20230112-060343-marostegui.json
- 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43078 and previous config saved to /var/cache/conftool/dbconfig/20230112-054837-marostegui.json
- 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43077 and previous config saved to /var/cache/conftool/dbconfig/20230112-053330-marostegui.json
- 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43076 and previous config saved to /var/cache/conftool/dbconfig/20230112-051823-marostegui.json
- 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43075 and previous config saved to /var/cache/conftool/dbconfig/20230112-051601-marostegui.json
- 05:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
- 05:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
- 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43074 and previous config saved to /var/cache/conftool/dbconfig/20230112-051539-marostegui.json
- 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43073 and previous config saved to /var/cache/conftool/dbconfig/20230112-050033-marostegui.json
- 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43072 and previous config saved to /var/cache/conftool/dbconfig/20230112-044526-marostegui.json
- 04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43071 and previous config saved to /var/cache/conftool/dbconfig/20230112-043020-marostegui.json
- 04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43070 and previous config saved to /var/cache/conftool/dbconfig/20230112-042757-marostegui.json
- 04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
- 04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
- 04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43069 and previous config saved to /var/cache/conftool/dbconfig/20230112-042741-marostegui.json
- 04:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43068 and previous config saved to /var/cache/conftool/dbconfig/20230112-041234-marostegui.json
- 03:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43067 and previous config saved to /var/cache/conftool/dbconfig/20230112-035727-marostegui.json
- 03:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43066 and previous config saved to /var/cache/conftool/dbconfig/20230112-034221-marostegui.json
- 03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43065 and previous config saved to /var/cache/conftool/dbconfig/20230112-033958-marostegui.json
- 03:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
- 03:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
- 03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43064 and previous config saved to /var/cache/conftool/dbconfig/20230112-033937-marostegui.json
- 03:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43063 and previous config saved to /var/cache/conftool/dbconfig/20230112-032430-marostegui.json
- 03:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43062 and previous config saved to /var/cache/conftool/dbconfig/20230112-030924-marostegui.json
- 02:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43061 and previous config saved to /var/cache/conftool/dbconfig/20230112-025417-marostegui.json
- 02:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43060 and previous config saved to /var/cache/conftool/dbconfig/20230112-025153-marostegui.json
- 02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
- 02:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
- 02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
- 02:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
- 02:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43059 and previous config saved to /var/cache/conftool/dbconfig/20230112-020046-marostegui.json
- 01:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43058 and previous config saved to /var/cache/conftool/dbconfig/20230112-014539-marostegui.json
- 01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43057 and previous config saved to /var/cache/conftool/dbconfig/20230112-013033-marostegui.json
- 01:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43056 and previous config saved to /var/cache/conftool/dbconfig/20230112-011526-marostegui.json
- 01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43055 and previous config saved to /var/cache/conftool/dbconfig/20230112-011302-marostegui.json
- 01:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 01:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43054 and previous config saved to /var/cache/conftool/dbconfig/20230112-011241-marostegui.json
- 00:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43053 and previous config saved to /var/cache/conftool/dbconfig/20230112-005734-marostegui.json
- 00:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43052 and previous config saved to /var/cache/conftool/dbconfig/20230112-004228-marostegui.json
- 00:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43051 and previous config saved to /var/cache/conftool/dbconfig/20230112-002721-marostegui.json
- 00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43050 and previous config saved to /var/cache/conftool/dbconfig/20230112-002457-marostegui.json
- 00:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 00:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43049 and previous config saved to /var/cache/conftool/dbconfig/20230112-002436-marostegui.json
- 00:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43048 and previous config saved to /var/cache/conftool/dbconfig/20230112-000929-marostegui.json
2023-01-11
- 23:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43047 and previous config saved to /var/cache/conftool/dbconfig/20230111-235423-marostegui.json
- 23:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43045 and previous config saved to /var/cache/conftool/dbconfig/20230111-233916-marostegui.json
- 23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43044 and previous config saved to /var/cache/conftool/dbconfig/20230111-233652-marostegui.json
- 23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43043 and previous config saved to /var/cache/conftool/dbconfig/20230111-233616-marostegui.json
- 23:22 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.18 refs T325581 (duration: 06m 57s)
- 23:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43042 and previous config saved to /var/cache/conftool/dbconfig/20230111-232109-marostegui.json
- 23:15 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.18 refs T325581
- 23:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43041 and previous config saved to /var/cache/conftool/dbconfig/20230111-230603-marostegui.json
- 22:51 zabe@deploy1002: Finished scap: Backport for Start reading from cuc_actor on group0 and group1 wikis (T233004), Start writing to rev_comment_id on group0 wikis (T299954), Stop writing to cul_user and cul_user_text on testwiki (T233004) (duration: 09m 28s)
- 22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43040 and previous config saved to /var/cache/conftool/dbconfig/20230111-225056-marostegui.json
- 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43039 and previous config saved to /var/cache/conftool/dbconfig/20230111-224832-marostegui.json
- 22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43038 and previous config saved to /var/cache/conftool/dbconfig/20230111-224810-marostegui.json
- 22:44 zabe@deploy1002: zabe and zabe: Backport for Start reading from cuc_actor on group0 and group1 wikis (T233004), Start writing to rev_comment_id on group0 wikis (T299954), Stop writing to cul_user and cul_user_text on testwiki (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 22:42 zabe@deploy1002: Started scap: Backport for Start reading from cuc_actor on group0 and group1 wikis (T233004), Start writing to rev_comment_id on group0 wikis (T299954), Stop writing to cul_user and cul_user_text on testwiki (T233004)
- 22:40 effie: upload memkeys_20181031-2~bullseye0_ on bullseye-wikimedia
- 22:39 kindrobot: close UTC late backport window
- {{safesubst:SAL entry|1=22:38 kindrobot@deploy1002: Finished scap: Backport for Fix exception in `<gallery mode="slideshow">` with missing images, Fix phan error when Excimer is enabled, Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T30106}}
- 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43037 and previous config saved to /var/cache/conftool/dbconfig/20230111-223304-marostegui.json
- {{safesubst:SAL entry|1=22:21 kindrobot@deploy1002: kindrobot and matmarex: Backport for Fix exception in `<gallery mode="slideshow">` with missing images, Fix phan error when Excimer is enabled, Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view}}
- 22:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43036 and previous config saved to /var/cache/conftool/dbconfig/20230111-221757-marostegui.json
- 22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43035 and previous config saved to /var/cache/conftool/dbconfig/20230111-220251-marostegui.json
- 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43034 and previous config saved to /var/cache/conftool/dbconfig/20230111-220026-marostegui.json
- 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43033 and previous config saved to /var/cache/conftool/dbconfig/20230111-220005-marostegui.json
- {{safesubst:SAL entry|1=21:58 kindrobot@deploy1002: Started scap: Backport for Fix exception in `<gallery mode="slideshow">` with missing images, Fix phan error when Excimer is enabled, Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063}}
- 21:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43031 and previous config saved to /var/cache/conftool/dbconfig/20230111-214458-marostegui.json
- 21:34 kindrobot@deploy1002: Finished scap: Backport for Fix mustache template rendering when TOC is rerendered after an edit (T326682), Enable page tools on beta cluster (duration: 10m 17s)
- 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43030 and previous config saved to /var/cache/conftool/dbconfig/20230111-212952-marostegui.json
- 21:25 kindrobot@deploy1002: kindrobot and jdrewniak and jdlrobson: Backport for Fix mustache template rendering when TOC is rerendered after an edit (T326682), Enable page tools on beta cluster synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
- 21:23 kindrobot@deploy1002: Started scap: Backport for Fix mustache template rendering when TOC is rerendered after an edit (T326682), Enable page tools on beta cluster
- 21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43029 and previous config saved to /var/cache/conftool/dbconfig/20230111-211445-marostegui.json
- 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43028 and previous config saved to /var/cache/conftool/dbconfig/20230111-211222-marostegui.json
- 21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43027 and previous config saved to /var/cache/conftool/dbconfig/20230111-211200-marostegui.json
- 21:06 kindrobot: start UTC late backport window
- 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43025 and previous config saved to /var/cache/conftool/dbconfig/20230111-205654-marostegui.json
- 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43024 and previous config saved to /var/cache/conftool/dbconfig/20230111-204147-marostegui.json
- 20:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43023 and previous config saved to /var/cache/conftool/dbconfig/20230111-203141-root.json
- 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43022 and previous config saved to /var/cache/conftool/dbconfig/20230111-202641-marostegui.json
- 20:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43021 and previous config saved to /var/cache/conftool/dbconfig/20230111-202417-marostegui.json
- 20:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 20:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 20:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43020 and previous config saved to /var/cache/conftool/dbconfig/20230111-202345-marostegui.json
- 20:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43019 and previous config saved to /var/cache/conftool/dbconfig/20230111-201636-root.json
- 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43018 and previous config saved to /var/cache/conftool/dbconfig/20230111-200838-marostegui.json
- 20:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43017 and previous config saved to /var/cache/conftool/dbconfig/20230111-200131-root.json
- 19:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43016 and previous config saved to /var/cache/conftool/dbconfig/20230111-195332-marostegui.json
- 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43015 and previous config saved to /var/cache/conftool/dbconfig/20230111-194626-root.json
- 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43014 and previous config saved to /var/cache/conftool/dbconfig/20230111-193825-marostegui.json
- 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43013 and previous config saved to /var/cache/conftool/dbconfig/20230111-193601-marostegui.json
- 19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43012 and previous config saved to /var/cache/conftool/dbconfig/20230111-193506-marostegui.json
- 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43011 and previous config saved to /var/cache/conftool/dbconfig/20230111-193121-root.json
- 19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
- 19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
- 19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43010 and previous config saved to /var/cache/conftool/dbconfig/20230111-192000-marostegui.json
- 19:19 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 19:19 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43009 and previous config saved to /var/cache/conftool/dbconfig/20230111-191616-root.json
- 19:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43008 and previous config saved to /var/cache/conftool/dbconfig/20230111-190453-marostegui.json
- 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43007 and previous config saved to /var/cache/conftool/dbconfig/20230111-190111-root.json
- 18:57 marostegui: dbmaint deploy schema change with replication on s3 eqiad T321391
- 18:52 brett: Removing legacy vips from dns servers - T239993
- 18:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43006 and previous config saved to /var/cache/conftool/dbconfig/20230111-184946-marostegui.json
- 18:47 marostegui: dbmaint deploy schema change with replication on s2 eqiad T321391
- 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43005 and previous config saved to /var/cache/conftool/dbconfig/20230111-184723-marostegui.json
- 18:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 18:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P43004 and previous config saved to /var/cache/conftool/dbconfig/20230111-184701-marostegui.json
- 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43003 and previous config saved to /var/cache/conftool/dbconfig/20230111-184051-root.json
- 18:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level (duration: 02m 33s)
- 18:33 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level
- 18:33 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 18:32 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43002 and previous config saved to /var/cache/conftool/dbconfig/20230111-183155-marostegui.json
- 18:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 18:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 18:28 bblack: repool eqsin edge DC
- 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43001 and previous config saved to /var/cache/conftool/dbconfig/20230111-182546-root.json
- 18:22 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
- 18:22 btullis@cumin1001: Added views for new wiki: blkwiki T310872
- 18:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43000 and previous config saved to /var/cache/conftool/dbconfig/20230111-181648-marostegui.json
- 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42999 and previous config saved to /var/cache/conftool/dbconfig/20230111-181041-root.json
- 18:09 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 18:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 18:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 18:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 18:07 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 18:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 18:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P42998 and previous config saved to /var/cache/conftool/dbconfig/20230111-180142-marostegui.json
- 18:01 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 17:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 17:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P42997 and previous config saved to /var/cache/conftool/dbconfig/20230111-175919-marostegui.json
- 17:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 17:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42996 and previous config saved to /var/cache/conftool/dbconfig/20230111-175857-marostegui.json
- 17:58 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
- 17:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
- 17:55 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
- 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42995 and previous config saved to /var/cache/conftool/dbconfig/20230111-175536-root.json
- 17:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 17:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42994 and previous config saved to /var/cache/conftool/dbconfig/20230111-174351-marostegui.json
- 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42993 and previous config saved to /var/cache/conftool/dbconfig/20230111-174031-root.json
- 17:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 17:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 17:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42992 and previous config saved to /var/cache/conftool/dbconfig/20230111-172844-marostegui.json
- 17:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 17:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42991 and previous config saved to /var/cache/conftool/dbconfig/20230111-172526-root.json
- 17:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 17:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 17:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 17:20 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 17:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 17:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42989 and previous config saved to /var/cache/conftool/dbconfig/20230111-171338-marostegui.json
- 17:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42988 and previous config saved to /var/cache/conftool/dbconfig/20230111-171114-marostegui.json
- 17:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
- 17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
- 17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
- 17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
- 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 1%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42987 and previous config saved to /var/cache/conftool/dbconfig/20230111-171021-root.json
- 17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
- 17:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
- 17:04 marostegui: dbmaint deploy schema change with replication on s7 eqiad T321391
- 17:03 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 17:03 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 16:38 marostegui: dbmaint deploy schema change with replication on s5 eqiad T321391
- 16:31 marostegui: dbmaint deploy schema change with replication on s4 eqiad T321391
- 16:25 marostegui: dbmaint deploy schema change with replication on s8 eqiad T321391
- 16:22 marostegui: dbmaint deploy schema change with replication on s6 eqiad T321391
- 16:06 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:06 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
- 16:05 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
- 16:03 volans@cumin1001: START - Cookbook sre.dns.netbox
- 16:01 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host mc1038.eqiad.wmnet with OS bullseye
- 16:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 15:53 zabe@deploy1002: Finished scap: T233004 (duration: 07m 54s)
- 15:45 zabe@deploy1002: Started scap: T233004
- 15:38 zabe@deploy1002: backport aborted: (duration: 04m 25s)
- 15:38 zabe@deploy1002: sync-world aborted: Backport for Start reading from cul_actor everywhere (T233004) (duration: 04m 00s)
- 15:36 zabe@deploy1002: zabe and zabe: Backport for Start reading from cul_actor everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 15:34 zabe@deploy1002: Started scap: Backport for Start reading from cul_actor everywhere (T233004)
- 15:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 15:21 marostegui: Stop mariadb on db1106 to reclone db1206 (there will be lag on s1 on wikireplicas) T326669
- 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P42982 and previous config saved to /var/cache/conftool/dbconfig/20230111-151712-marostegui.json
- 14:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 14:47 Lucas_WMDE: UTC afternoon backport+config window done
- 14:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1005.eqiad.wmnet with OS bullseye
- 14:46 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
- 14:46 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.18/extensions/Wikibase/repo/tests/jest/wikibase.vector.searchClient.spec.js: Backport: Add missing parentheses to vector search match text (T326633) (2/2) (duration: 06m 46s)
- 14:42 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
- 14:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.18/extensions/Wikibase/repo/resources/wikibase.vector.searchClient.js: Backport: Add missing parentheses to vector search match text (T326633) (1/2) (duration: 07m 09s)
- 14:28 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix test constructing HTMLFormField without parent (T326621) (duration: 08m 38s)
- 14:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
- 14:22 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
- 14:21 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for Fix test constructing HTMLFormField without parent (T326621) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 14:19 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix test constructing HTMLFormField without parent (T326621)
- 14:14 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
- 14:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
- 14:10 moritzm: installing postgresql 11 security updates on maps/eqiad
- 14:06 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bullseye
- 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1004.eqiad.wmnet with OS bullseye
- 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
- 14:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
- 13:55 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
- 13:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37002
- 13:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37002
- 13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3302
- 13:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
- 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
- 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9584
- 13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9584
- 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35753
- 13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35753
- 13:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
- 13:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
- 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast6002.wikimedia.org
- 13:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
- 13:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
- 13:12 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast6002.wikimedia.org on all recursors
- 13:11 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast6002.wikimedia.org on all recursors
- 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
- 13:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
- 13:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bullseye
- 13:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc1038.eqiad.wmnet with OS bullseye
- 13:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast6002.wikimedia.org
- 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast4004.wikimedia.org
- 12:42 moritzm: installing postgresql 11 security updates on maps/codfw
- 12:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8849
- 12:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8849
- 12:35 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast4004.wikimedia.org on all recursors
- 12:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast4004.wikimedia.org on all recursors
- 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
- 12:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
- 12:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56630
- 12:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56630
- 12:24 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
- 12:24 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
- 12:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:18 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast4004.wikimedia.org
- 12:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 12:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 12:10 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bullseye
- 12:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1003.eqiad.wmnet with OS bullseye
- 12:10 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
- 12:08 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
- 11:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
- 11:51 claime: repooled mw1486 in api_appserver eqiad after hardware investigation - T326425
- 11:50 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
- 11:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1486.eqiad.wmnet
- 11:50 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1486.eqiad.wmnet
- 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast3006.wikimedia.org
- 11:47 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1486.eqiad.wmnet
- 11:38 cgoubert@cumin1001: conftool action : set/pooled=yes:weight=10; selector: cluster=aux-k8s,service=kubesvc
- 11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
- 11:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
- 11:30 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast3006.wikimedia.org on all recursors
- 11:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast3006.wikimedia.org on all recursors
- 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
- 11:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
- 11:22 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
- 11:22 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
- 11:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bullseye
- 11:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:19 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast3006.wikimedia.org
- 11:16 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
- 11:15 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
- 11:15 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.reboot-workers (exit_code=99) for Druid test cluster: Reboot Druid nodes
- 11:12 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1003.eqiad.wmnet with OS bullseye
- 10:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
- 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
- 10:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
- 10:31 zabe@deploy1002: Finished scap: Backport for Simplify expensive check (T326690), Start reading from cuc_actor on test wikis (T233004) (duration: 09m 34s)
- 10:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
- 10:24 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
- 10:23 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid test cluster: Reboot Druid nodes
- 10:23 zabe@deploy1002: zabe and zabe: Backport for Simplify expensive check (T326690), Start reading from cuc_actor on test wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 10:21 zabe@deploy1002: Started scap: Backport for Simplify expensive check (T326690), Start reading from cuc_actor on test wikis (T233004)
- 10:18 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
- 10:16 moritzm: installing postgresql-11 security updates
- 10:02 XioNoX: asw1-eqsin> request system reboot all-members - T316532
- 09:49 moritzm: installing python3.7 security updates
- 08:31 kartik@deploy1002: Finished scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) (duration: 11m 45s)
- 08:21 kartik@deploy1002: kartik and kartik: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 08:20 kartik@deploy1002: Started scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278)
- 05:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
- 05:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
- 05:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
- 05:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
2023-01-10
- 23:58 krinkle@deploy1002: Finished deploy [integration/docroot@b7c82a3]: (no justification provided) (duration: 00m 15s)
- 23:58 krinkle@deploy1002: Started deploy [integration/docroot@b7c82a3]: (no justification provided)
- 23:46 mutante: cumin2002 - sudo systemctl status httpbb_hourly_appserver
- 23:30 zabe@deploy1002: Finished scap: Backport for Start writing to rev_comment_id on test wikis (T299954) (duration: 09m 39s)
- 23:22 zabe@deploy1002: zabe and zabe: Backport for Start writing to rev_comment_id on test wikis (T299954) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 23:21 zabe@deploy1002: Started scap: Backport for Start writing to rev_comment_id on test wikis (T299954)
- 22:42 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.18 refs T325581
- 22:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
- 22:28 jhuneidi@deploy1002: Pruned MediaWiki: 1.40.0-wmf.14, 1.40.0-wmf.13 (duration: 02m 35s)
- 22:21 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.18 refs T325581 (duration: 45m 04s)
- 22:10 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 22:09 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 22:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206 T325046', diff saved to https://phabricator.wikimedia.org/P42980 and previous config saved to /var/cache/conftool/dbconfig/20230110-220942-marostegui.json
- 22:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
- 22:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
- 21:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
- 21:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
- 21:54 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 21:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 21:36 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.18 refs T325581
- 21:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
- 21:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
- 21:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
- 21:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
- 21:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42979 and previous config saved to /var/cache/conftool/dbconfig/20230110-211826-root.json
- 21:18 zabe@deploy1002: Finished scap: Backport for Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), Start reading from cul_actor on group1 wikis (T233004) (duration: 10m 08s)
- 21:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
- 21:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
- 21:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
- 21:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
- 21:09 zabe@deploy1002: zabe and zabe and matmarex: Backport for Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), Start reading from cul_actor on group1 wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 21:08 zabe@deploy1002: Started scap: Backport for Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), Start reading from cul_actor on group1 wikis (T233004)
- 21:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42978 and previous config saved to /var/cache/conftool/dbconfig/20230110-210321-root.json
- 20:55 mutante: repooling eqsin
- 20:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42977 and previous config saved to /var/cache/conftool/dbconfig/20230110-204816-root.json
- 20:37 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 20:37 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 20:33 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42976 and previous config saved to /var/cache/conftool/dbconfig/20230110-203311-root.json
- 20:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 20:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 20:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 20:29 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 20:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 20:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 20:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 20:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 20:26 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 20:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
- 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42975 and previous config saved to /var/cache/conftool/dbconfig/20230110-201807-ladsgroup.json
- 20:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42974 and previous config saved to /var/cache/conftool/dbconfig/20230110-201806-root.json
- 20:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 20:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 20:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 20:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 20:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
- 20:07 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 20:06 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 20:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
- 20:04 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 20:04 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42972 and previous config saved to /var/cache/conftool/dbconfig/20230110-200302-ladsgroup.json
- 20:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42971 and previous config saved to /var/cache/conftool/dbconfig/20230110-200301-root.json
- 20:02 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 42s)
- 20:01 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 20:01 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 20:00 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
- 20:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
- 19:58 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
- 19:52 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 06s)
- 19:51 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
- 19:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 19:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42970 and previous config saved to /var/cache/conftool/dbconfig/20230110-194757-ladsgroup.json
- 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42969 and previous config saved to /var/cache/conftool/dbconfig/20230110-194756-root.json
- 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42968 and previous config saved to /var/cache/conftool/dbconfig/20230110-194750-ladsgroup.json
- 19:43 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 19:42 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 19:39 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 19:38 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 19:38 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 19:38 dancy@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
- 19:37 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 19:37 dancy@deploy1002: Installing scap version "4.32.0" for 1 hosts
- 19:35 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42965 and previous config saved to /var/cache/conftool/dbconfig/20230110-193253-ladsgroup.json
- 19:32 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42964 and previous config saved to /var/cache/conftool/dbconfig/20230110-193245-ladsgroup.json
- 19:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 19:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 19:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 19:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 19:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 19:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1158 maint', diff saved to https://phabricator.wikimedia.org/P42963 and previous config saved to /var/cache/conftool/dbconfig/20230110-192929-ladsgroup.json
- 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42962 and previous config saved to /var/cache/conftool/dbconfig/20230110-191740-ladsgroup.json
- 19:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
- 19:08 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
- 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42958 and previous config saved to /var/cache/conftool/dbconfig/20230110-190235-ladsgroup.json
- 19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 19:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 18:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
- 18:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
- 18:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bullseye
- 18:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bullseye
- 18:29 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
- 18:23 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagemaster2001.codfw.wmnet with OS bullseye
- 18:23 jayme@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
- 18:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 18:20 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
- 18:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 18:16 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 18:16 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 18:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2001.codfw.wmnet with reason: host reimage
- 18:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2001.codfw.wmnet with reason: host reimage
- 18:01 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bullseye
- 18:01 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bullseye
- 17:55 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagemaster2001.codfw.wmnet with OS bullseye
- 17:51 zabe: run populateCulActor on all wikis # T325484
- 17:48 claime: Finished rolling reboots of eqiad appservers
- 17:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 17:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 17:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 maint', diff saved to https://phabricator.wikimedia.org/P42956 and previous config saved to /var/cache/conftool/dbconfig/20230110-173807-ladsgroup.json
- 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 T325652', diff saved to https://phabricator.wikimedia.org/P42955 and previous config saved to /var/cache/conftool/dbconfig/20230110-173027-marostegui.json
- 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42954 and previous config saved to /var/cache/conftool/dbconfig/20230110-173002-ladsgroup.json
- 17:29 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 11s)
- 17:28 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
- 17:28 ayounsi@deploy1002: deploy aborted: help (duration: 00m 01s)
- 17:28 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: help
- 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42953 and previous config saved to /var/cache/conftool/dbconfig/20230110-171457-ladsgroup.json
- 17:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:10 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:03 ayounsi@deploy1002: deploy aborted: netbox-next to 3.2.9 (duration: 00m 07s)
- 17:03 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
- 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42952 and previous config saved to /var/cache/conftool/dbconfig/20230110-165952-ladsgroup.json
- 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After the incident', diff saved to https://phabricator.wikimedia.org/P42951 and previous config saved to /var/cache/conftool/dbconfig/20230110-165406-root.json
- 16:48 bblack: depooling eqsin from DNS
- 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42950 and previous config saved to /var/cache/conftool/dbconfig/20230110-164447-ladsgroup.json
- 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After the incident', diff saved to https://phabricator.wikimedia.org/P42949 and previous config saved to /var/cache/conftool/dbconfig/20230110-163901-root.json
- 16:36 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2003.codfw.wmnet with OS bullseye
- 16:24 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
- 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After the incident', diff saved to https://phabricator.wikimedia.org/P42948 and previous config saved to /var/cache/conftool/dbconfig/20230110-162356-root.json
- 16:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2003.codfw.wmnet with reason: host reimage
- 16:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2003.codfw.wmnet with reason: host reimage
- 16:14 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2002.codfw.wmnet with OS bullseye
- 16:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After the incident', diff saved to https://phabricator.wikimedia.org/P42947 and previous config saved to /var/cache/conftool/dbconfig/20230110-160851-root.json
- 16:08 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:08 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2003.codfw.wmnet with OS bullseye
- 16:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2002.codfw.wmnet with reason: host reimage
- 16:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
- 16:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
- 16:01 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2002.codfw.wmnet with reason: host reimage
- 15:59 SandraEbele: reran failed pageview-druid-hourly-coord oozie job for 2023-1-10-10.
- 15:59 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:58 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1373,1384-1385,1387].eqiad.wmnet
- 15:55 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1373,1384-1385,1387].eqiad.wmnet
- 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After the incident', diff saved to https://phabricator.wikimedia.org/P42946 and previous config saved to /var/cache/conftool/dbconfig/20230110-155346-root.json
- 15:52 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2002.codfw.wmnet with OS bullseye
- 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After the incident', diff saved to https://phabricator.wikimedia.org/P42945 and previous config saved to /var/cache/conftool/dbconfig/20230110-153841-root.json
- 15:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
- 15:30 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 15:29 claime: Restarting rolling reboots of eqiad appservers
- 15:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
- 15:25 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:25 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After the incident', diff saved to https://phabricator.wikimedia.org/P42944 and previous config saved to /var/cache/conftool/dbconfig/20230110-152336-root.json
- 15:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2001.codfw.wmnet
- 15:17 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader2001.codfw.wmnet
- 15:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2001.codfw.wmnet with reason: host reimage
- 15:11 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2001.codfw.wmnet with reason: host reimage
- 15:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
- 15:02 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
- 15:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2037.codfw.wmnet
- 15:01 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:01 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2037.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 14:56 XioNoX: start VC link maintenance in eqiad - T325803
- 14:55 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
- 14:55 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
- 14:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1001.eqiad.wmnet
- 14:49 zabe: UTC afternoon deploys done
- 14:49 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader1001.eqiad.wmnet
- 14:48 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2037.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 14:47 zabe@deploy1002: Finished scap: Backport for Start reading from cul_actor on remaining test wikis and group0 wikis (T233004) (duration: 08m 59s)
- 14:46 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 14:40 zabe@deploy1002: zabe and zabe: Backport for Start reading from cul_actor on remaining test wikis and group0 wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 14:38 zabe@deploy1002: Started scap: Backport for Start reading from cul_actor on remaining test wikis and group0 wikis (T233004)
- 14:36 zabe: run populateCulActor on group0 wikis # T325484
- 14:35 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
- 14:35 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2037.codfw.wmnet
- 14:34 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host apifeatureusage2001.codfw.wmnet
- 14:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2036.codfw.wmnet
- 14:33 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:33 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2036.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 14:28 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2036.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 14:28 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
- 14:28 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
- 14:26 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 14:25 zabe@deploy1002: Finished scap: Backport for [config]: GDI Safety Survey Wave 4 (T325136) (duration: 17m 42s)
- 14:21 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage2001.codfw.wmnet
- 14:19 claime: Pausing reboots of eqiad appservers for deployments
- 14:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1369-1372].eqiad.wmnet
- 14:18 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1369-1372].eqiad.wmnet
- 14:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage1001.eqiad.wmnet
- 14:11 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2036.codfw.wmnet
- 14:10 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 14:09 zabe@deploy1002: zabe and essexigyan: Backport for [config]: GDI Safety Survey Wave 4 (T325136) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 14:07 zabe@deploy1002: Started scap: Backport for [config]: GDI Safety Survey Wave 4 (T325136)
- 14:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-codfw with k8s 1.23
- 14:06 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage1001.eqiad.wmnet
- 14:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-codfw with k8s 1.23
- 14:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2035.codfw.wmnet
- 14:03 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:03 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2035.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 13:49 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2035.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 13:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1002.eqiad.wmnet with OS bullseye
- 13:46 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
- 13:46 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 13:44 godog: delete grafana dashboards from "sre dashboards for deletion" folder - T178690
- 13:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
- 13:37 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2035.codfw.wmnet
- 13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
- 13:34 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
- 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2001.wikimedia.org
- 13:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2001.wikimedia.org
- 13:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
- 13:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
- 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetdb-test2001.codfw.wmnet
- 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb-test2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:59 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
- 12:59 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1002.eqiad.wmnet with OS bullseye
- 12:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb-test2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:53 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 12:50 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 12:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 12:50 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetdb-test2001.codfw.wmnet
- 12:49 claime: Starting rolling reboot of eqiad appservers
- 12:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
- 12:36 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
- 12:34 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1002.eqiad.wmnet with OS bullseye
- 12:31 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
- 12:31 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
- 12:31 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
- 12:31 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
- 12:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
- 12:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
- 12:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2034.codfw.wmnet
- 12:18 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:18 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 12:12 claime: Finished rolling reboot of eqiad jobrunners
- 12:07 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 12:06 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 12:06 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 12:05 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 12:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 11:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 11:58 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 11:57 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 11:57 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 11:56 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 11:53 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 11:52 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 11:48 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 11:35 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
- 11:33 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
- 11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
- 11:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2047.codfw.wmnet
- 11:00 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2034.codfw.wmnet
- 10:31 godog: upgrade thanos to 0.30.1 on thanos-fe2* - T303154
- 10:24 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
- 10:23 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
- 10:21 claime: Starting rolling reboot of eqiad jobrunners
- 10:21 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 10:18 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1002.eqiad.wmnet with OS bullseye
- 10:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
- 10:14 claime: repooled parse1002.eqiad.wmnet - T326119
- 10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
- 10:13 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
- 10:07 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
- 10:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2033.codfw.wmnet
- 10:06 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:06 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 10:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 09:59 cgoubert@cumin1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1002.eqiad.wmnet
- 09:55 godog: upgrade thanos to 0.30.1 on prometheus hosts - T303154
- 09:53 moritzm: installing systemd bugfix updates from Bullseye point release
- 09:45 aqu@deploy1002: Finished deploy [airflow-dags/analytics@9568478]: Fix bug fix in HDFS usage pipeline [airflow-dags@9568478] (duration: 00m 13s)
- 09:45 aqu@deploy1002: Started deploy [airflow-dags/analytics@9568478]: Fix bug fix in HDFS usage pipeline [airflow-dags@9568478]
- 09:43 godog: upgrade thanos to 0.30.1 on thanos-fe100[2-3] - T303154
- 09:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9568478]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@9568478] (duration: 00m 11s)
- 09:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 09:34 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9568478]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@9568478]
- 09:25 XioNoX: repool ulsfo (maintenance cancelled) - T316532
- 09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
- 09:22 taavi: added zabe to wmf-deployment gerrit group T326327
- 09:19 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2033.codfw.wmnet
- 09:18 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
- 09:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2032.codfw.wmnet
- 09:17 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:17 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 09:15 kart_: Done: UTC morning backport window
- 09:14 kartik@deploy1002: Finished scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) (duration: 09m 20s)
- 09:07 kartik@deploy1002: kartik and kartik: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 09:05 kartik@deploy1002: Started scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278)
- 08:58 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 08:56 godog: upgrade thanos to 0.30.1 on thanos-fe1001 - T303154
- 08:54 godog: upgrade thanos to 0.30.1 on prometheus2006 - T303154
- 08:49 kartik@deploy1002: Finished scap: Backport for CX: Fix usage of categories translation unit as array (T326278) (duration: 12m 08s)
- 08:38 kartik@deploy1002: kartik and kartik: Backport for CX: Fix usage of categories translation unit as array (T326278) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 08:37 kartik@deploy1002: Started scap: Backport for CX: Fix usage of categories translation unit as array (T326278)
- 08:20 kartik@deploy1002: Finished scap: Backport for ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721) (duration: 17m 21s)
- 08:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 08:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 08:08 kartik@deploy1002: kartik and kartik: Backport for ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 08:03 kartik@deploy1002: Started scap: Backport for ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721)
- 08:02 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 07:45 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2032.codfw.wmnet
- 07:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2031.codfw.wmnet
- 07:37 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:37 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 07:36 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 07:33 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mc2044.codfw.wmnet
- 07:28 XioNoX: depool ulsfo for network maintenance - T316532
- 07:27 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 07:22 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2031.codfw.wmnet
- 07:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
- 07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if dns update is needed after change of rec-dns-lb IPs status - ayounsi@cumin1001"
- 07:14 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if dns update is needed after change of rec-dns-lb IPs status - ayounsi@cumin1001"
- 07:11 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
- 07:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 T326133', diff saved to https://phabricator.wikimedia.org/P42941 and previous config saved to /var/cache/conftool/dbconfig/20230110-070628-ladsgroup.json
- 07:03 XioNoX: remove static routes for legacy dns-rec-lb IPs - T239993
- 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write T326133', diff saved to https://phabricator.wikimedia.org/P42940 and previous config saved to /var/cache/conftool/dbconfig/20230110-070223-ladsgroup.json
- 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T326133', diff saved to https://phabricator.wikimedia.org/P42939 and previous config saved to /var/cache/conftool/dbconfig/20230110-070152-ladsgroup.json
- 07:01 Amir1: Starting s5 eqiad failover from db1130 to db1100 - T326133
- 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 T326133', diff saved to https://phabricator.wikimedia.org/P42938 and previous config saved to /var/cache/conftool/dbconfig/20230110-062309-ladsgroup.json
- 06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T326133
- 06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T326133
- 05:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Sync idm-test1001 - slyngshede@cumin1001"
- 05:38 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Sync idm-test1001 - slyngshede@cumin1001"
- 03:14 eileen: civicrm upgraded from 391e8482 to 9afd2789
- 03:12 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
- 02:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
- 02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
- 02:08 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
- 01:50 krinkle@deploy1002: Finished deploy [integration/docroot@f59119c]: (no justification provided) (duration: 00m 14s)
- 01:50 krinkle@deploy1002: Started deploy [integration/docroot@f59119c]: (no justification provided)
- 01:28 eileen: civicrm upgraded from e3405a4e to 391e8482
- 00:48 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
2023-01-09
- 22:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
- 22:33 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
- 22:32 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
- 22:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
- 22:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2030.codfw.wmnet
- 22:25 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:25 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 22:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 22:11 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 22:05 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2030.codfw.wmnet
- 22:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2029.codfw.wmnet
- 22:03 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:03 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 22:00 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 21:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
- 21:52 kindrobot: close UTC late backport window
- 21:50 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 21:47 kindrobot@deploy1002: Sync cancelled.
- 21:47 kindrobot@deploy1002: kindrobot and trainbranchbot: Backport for Revert "[config]: Deploy GDI Safety Survey Wave 4" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 21:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
- 21:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
- 21:45 kindrobot@deploy1002: Started scap: Backport for Revert "[config]: Deploy GDI Safety Survey Wave 4"
- 21:39 kindrobot@deploy1002: Sync cancelled.
- 21:38 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2029.codfw.wmnet
- 21:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2027.codfw.wmnet
- 21:37 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:37 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 21:34 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - bking@cumin1001 - T324247
- 21:29 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 21:27 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 21:26 kindrobot@deploy1002: kindrobot and essexigyan: Backport for [config]: Deploy GDI Safety Survey Wave 4 (T325136) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 21:24 kindrobot@deploy1002: Started scap: Backport for [config]: Deploy GDI Safety Survey Wave 4 (T325136)
- 21:21 kindrobot: starting UTC late backport window
- 21:21 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2027.codfw.wmnet
- 21:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2026.codfw.wmnet
- 21:18 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:18 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 21:09 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 21:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P42936 and previous config saved to /var/cache/conftool/dbconfig/20230109-210940-marostegui.json
- 21:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
- 21:03 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 20:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
- 20:57 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2026.codfw.wmnet
- 20:52 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - bking@cumin1001 - T324247
- 20:52 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
- 20:44 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
- 20:44 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
- 20:44 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
- 20:36 Amir1: deleting global usage coming from commons in commons (T322588)
- 20:36 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
- 20:35 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
- 20:34 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 20:33 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 20:25 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
- 20:24 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
- 20:21 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 20:20 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 20:20 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 20:20 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 19:37 bblack: cp5032: set param transit_buffer=1M via varnishadm
- 19:33 bblack: cp5032: set param transit_buffer=4M via varnishadm
- 19:26 bblack: cp5032: set param transit_buffer=1M via varnishadm
- 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2025.codfw.wmnet
- 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2025.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 19:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2025.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 19:11 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 19:05 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2025.codfw.wmnet
- 19:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2024.codfw.wmnet
- 19:04 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:04 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 19:00 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 18:57 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 18:48 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2024.codfw.wmnet
- 18:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2023.codfw.wmnet
- 18:43 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:43 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 18:41 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 18:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
- 18:36 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 18:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
- 18:30 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2023.codfw.wmnet
- 18:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2022.codfw.wmnet
- 18:07 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:07 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2022.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 18:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 18:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2022.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 18:00 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 17:56 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2022.codfw.wmnet
- 17:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
- 17:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
- 17:46 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc2021.codfw.wmnet
- 17:46 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:46 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2021.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 17:42 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 17:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 17:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 17:41 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 17:36 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2021.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 17:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 17:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 17:34 claime: Finished codfw jobrunner rolling reboot
- 17:32 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 17:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 16:59 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2021.codfw.wmnet
- 16:49 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 16:48 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 16:46 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc2020.codfw.wmnet
- 16:46 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:46 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 16:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
- 16:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
- 16:40 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 16:32 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 16:11 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2020.codfw.wmnet
- 16:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2019.codfw.wmnet
- 16:11 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:11 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 16:08 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
- 16:04 XioNoX: start VC link maintenance in eqiad - T325803
- 16:03 jiji@cumin1001: START - Cookbook sre.dns.netbox
- 15:58 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2019.codfw.wmnet
- 15:37 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 15:37 claime: Starting codfw jobrunner rolling reboot
- 15:35 Lucas_WMDE: UTC afternoon backport+config window done
- 15:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for CX: Allow composer/installers plugin (duration: 10m 03s)
- 15:29 claime: Not starting codfw jobrunner rolling reboot, deploy in progress
- 15:28 claime: Starting codfw jobrunner rolling reboot
- 15:26 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and kartik: Backport for CX: Allow composer/installers plugin synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 15:24 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for CX: Allow composer/installers plugin
- 15:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps2009.codfw.wmnet,maps1009.eqiad.wmnet with reason: Removing redis service
- 15:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps2009.codfw.wmnet,maps1009.eqiad.wmnet with reason: Removing redis service
- 15:11 effie: disable puppet on all 'P:mediawiki::mcrouter_wancache' hosts to merge 875894
- 15:09 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for extwiki: Install SandboxLink extension (T326450) (duration: 08m 37s)
- 15:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
- 15:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
- 15:02 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for extwiki: Install SandboxLink extension (T326450) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 15:00 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for extwiki: Install SandboxLink extension (T326450)
- 15:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2003.codfw.wmnet
- 14:59 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ echo 'https://en.wikipedia.org/static/images/project-logos/jawikisource.png' | mwscript purgeList.php # T326488
- 14:56 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for jawikisource: Update project logo and wordmark (T326488) (duration: 09m 24s)
- 14:55 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2003.codfw.wmnet
- 14:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1004.eqiad.wmnet
- 14:48 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for jawikisource: Update project logo and wordmark (T326488) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 14:47 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1004.eqiad.wmnet
- 14:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for jawikisource: Update project logo and wordmark (T326488)
- 14:45 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for arwiki: Create extendedmover group (T326434) (duration: 08m 56s)
- 14:38 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for arwiki: Create extendedmover group (T326434) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 14:36 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for arwiki: Create extendedmover group (T326434)
- 14:31 godog: upgrade thanos to 0.30.1 on prometheus2005 - T303154
- 14:27 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for mediawikiwiki: Disable Flow on new pages by default (T325907) (duration: 18m 19s)
- 14:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for mediawikiwiki: Disable Flow on new pages by default (T325907) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 14:09 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for mediawikiwiki: Disable Flow on new pages by default (T325907)
- 13:55 moritzm: installing systemd bugfix updates from Bullseye point release
- 13:41 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1003.eqiad.wmnet
- 13:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1003.eqiad.wmnet
- 13:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 13:35 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
- 12:53 hnowlan@deploy1002: Finished deploy [restbase/deploy@bcb0a69]: New wikis T321284 T321290 T321296 T326140 (duration: 18m 56s)
- 12:34 hnowlan@deploy1002: Started deploy [restbase/deploy@bcb0a69]: New wikis T321284 T321290 T321296 T326140
- 12:18 vgutierrez: repool cp5025
- 11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15954
- 11:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15954
- 11:29 vgutierrez: restart purged on cp5025
- 11:28 vgutierrez: depool cp5025 due to purging issues
- 11:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum1001.eqiad.wmnet
- 11:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum1001.eqiad.wmnet
- 11:06 XioNoX: repool ulsfo - T316532
- 11:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
- 10:55 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 10:55 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
- 10:54 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
- 10:54 claime: Starting codfw appserver rolling reboot
- 10:54 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
- 10:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
- 10:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet
- 10:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
- 10:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet
- 10:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
- 10:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode2001.codfw.wmnet
- 10:46 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
- 10:46 effie: switching maps to eqiad
- 10:45 moritzm: installing avahi security updates
- 10:44 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
- 10:41 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
- 09:35 dcausse: restarting blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
- 09:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
- 09:04 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
- 08:58 moritzm: installing glibc security updates
- 08:56 XioNoX: depool ulsfo for network maintenance - T316532
- 08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 327700
- 08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 327700
- 08:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 48237
- 08:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 48237
- 08:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32035
- 08:21 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idm-test1001.wikimedia.org
- 08:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32035
- 08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idm-test1001.wikimedia.org on all recursors
- 08:12 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache idm-test1001.wikimedia.org on all recursors
- 08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"
- 08:08 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"
- 08:06 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
- 08:06 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host idm-test1001.wikimedia.org
2023-01-06
- 18:57 mutante: systemctl start docker-gc on all gitlab-runners via cumin T310593
- 18:56 mutante: gitlab-runner1002 - systemctl start docker-gc; run puppet on all gitlab-runners T310593
- 18:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: debugging
- 18:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: debugging
- 18:36 sukhe: pool cp5032 [bullseye upgrade completed]: T325797
- 18:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5032.eqsin.wmnet,service=ats-be
- 18:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5032.eqsin.wmnet,service=cdn
- 18:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425
- 18:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425
- 18:13 Krinkle: krinkle@cloudweb1003$ Run `UPDATE actor SET actor_user=31136 WHERE actor_id=14640;` to partially fix T326431
- 17:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5032.eqsin.wmnet with OS bullseye
- 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
- 17:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
- 16:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
- 16:53 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
- 16:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
- 16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
- 16:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
- 16:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
- 15:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1486.eqiad.wmnet
- 15:53 claime: depooling mw1486.eqiad.wmnet for hardware troubleshooting
- 15:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
- 15:30 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
- 15:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
- 15:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts cp5032.eqsin.wmnet
- 15:08 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp5032.eqsin.wmnet
- 15:07 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5032.eqsin.wmnet,service=ats-be
- 15:07 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5032.eqsin.wmnet,service=cdn
- 15:07 sukhe: depool cp5032 for bullseye upgrade (starting with NIC firmware upgrade): T325797
- 14:42 jbond: remove bgpalerter from apt
- 14:06 reedy@deploy1002: Synchronized php-1.40.0-wmf.17/extensions/SecurePoll/cli/wm-scripts/ucoc2023/populateEditCount.php: T326408 (duration: 07m 09s)
- 12:42 stevemunene@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
- 12:36 tzatziki: running extensions/SecurePoll/cli/wm-scripts/ucoc2023/ucoc2023_tables.sql on each wiki
- 12:29 stevemunene@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
- 11:38 jbond: upload bgpalerter to bullseye-wikimedia
- 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2113.codfw.wmnet with reason: Maintenance
- 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 11:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 10:10 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 21245
- 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 21245
- 09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36994
- 09:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36994
- 09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266925
- 09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 266925
- 09:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9038
- 09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9038
- 09:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5713
- 09:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5713
- 09:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37473
- 09:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37473
- 09:03 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 4788
- 09:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4788
- 09:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32035
- 09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32035
- 09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15954
- 09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15954
- 09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 60427
- 09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 60427
- 09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58717
- 09:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58717
- 09:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45489
- 08:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45489
- 08:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 24482
- 08:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 24482
- 08:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9119
- 08:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9119
- 08:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049
- 08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 64049
- 08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263237
- 08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 263237
- 08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 51185
- 08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 51185
- 08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 201746
- 08:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 201746
- 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62597
- 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62597
- 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 327700
- 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 327700
- 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56630
- 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56630
- 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21245
- 08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21245
- 08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37282
- 08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37282
- 08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37558
- 08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37558
- 08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13113
- 08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13113
- 08:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 41095
- 08:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 41095
- 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61573
- 08:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 61573
- 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21320
- 08:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21320
- 08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 39405
- 08:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 39405
- 08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 48237
- 08:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 48237
- 08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 47794
- 08:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 47794
- 08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22822
- 08:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22822
- 08:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58715
- 08:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58715
- 08:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 51254
- 08:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 51254
- 08:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35432
- 08:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35432
- 08:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132602
- 08:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 132602
- 08:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42473
- 08:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42473
- 08:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16347
- 08:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16347
- 08:05 XioNoX: drmrs offload Vodafone from Tata - T324955
- 01:08 urbanecm@deploy1002: Finished scap: Backport for Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394) (duration: 08m 48s)
- 01:01 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 00:59 urbanecm@deploy1002: Started scap: Backport for Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394)
- 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42928 and previous config saved to /var/cache/conftool/dbconfig/20230106-004102-ladsgroup.json
- 00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42927 and previous config saved to /var/cache/conftool/dbconfig/20230106-002556-ladsgroup.json
- 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42926 and previous config saved to /var/cache/conftool/dbconfig/20230106-001049-ladsgroup.json
2023-01-05
- 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42925 and previous config saved to /var/cache/conftool/dbconfig/20230105-235543-ladsgroup.json
- 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42924 and previous config saved to /var/cache/conftool/dbconfig/20230105-235325-ladsgroup.json
- 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42923 and previous config saved to /var/cache/conftool/dbconfig/20230105-235304-ladsgroup.json
- 23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42922 and previous config saved to /var/cache/conftool/dbconfig/20230105-233758-ladsgroup.json
- 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42921 and previous config saved to /var/cache/conftool/dbconfig/20230105-232251-ladsgroup.json
- 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42920 and previous config saved to /var/cache/conftool/dbconfig/20230105-230745-ladsgroup.json
- 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42919 and previous config saved to /var/cache/conftool/dbconfig/20230105-230629-ladsgroup.json
- 23:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 23:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42918 and previous config saved to /var/cache/conftool/dbconfig/20230105-230607-ladsgroup.json
- 22:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42917 and previous config saved to /var/cache/conftool/dbconfig/20230105-225101-ladsgroup.json
- 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42916 and previous config saved to /var/cache/conftool/dbconfig/20230105-223554-ladsgroup.json
- 22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42915 and previous config saved to /var/cache/conftool/dbconfig/20230105-222048-ladsgroup.json
- 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42914 and previous config saved to /var/cache/conftool/dbconfig/20230105-221932-ladsgroup.json
- 22:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 22:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42913 and previous config saved to /var/cache/conftool/dbconfig/20230105-221911-ladsgroup.json
- 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42912 and previous config saved to /var/cache/conftool/dbconfig/20230105-220404-ladsgroup.json
- 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42911 and previous config saved to /var/cache/conftool/dbconfig/20230105-214858-ladsgroup.json
- 21:43 TheresNoTime: closing UTC late backport window
- 21:42 samtar@deploy1002: Finished scap: Backport for Turn off wgNavigationTimingOversampleFactor campaigns (T286703) (duration: 08m 45s)
- 21:35 samtar@deploy1002: samtar and krinkle: Backport for Turn off wgNavigationTimingOversampleFactor campaigns (T286703) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 21:33 samtar@deploy1002: Started scap: Backport for Turn off wgNavigationTimingOversampleFactor campaigns (T286703)
- 21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42910 and previous config saved to /var/cache/conftool/dbconfig/20230105-213351-ladsgroup.json
- 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42909 and previous config saved to /var/cache/conftool/dbconfig/20230105-213235-ladsgroup.json
- 21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2137.codfw.wmnet with reason: Maintenance
- 21:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2137.codfw.wmnet with reason: Maintenance
- 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42908 and previous config saved to /var/cache/conftool/dbconfig/20230105-213214-ladsgroup.json
- 21:31 samtar@deploy1002: Finished scap: Backport for actions: Actually store CommentFormatter in McrUndoAction (T326336) (duration: 10m 31s)
- 21:23 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 21:23 samtar@deploy1002: samtar and zabe: Backport for actions: Actually store CommentFormatter in McrUndoAction (T326336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 21:21 samtar@deploy1002: Started scap: Backport for actions: Actually store CommentFormatter in McrUndoAction (T326336)
- 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42907 and previous config saved to /var/cache/conftool/dbconfig/20230105-211707-ladsgroup.json
- 21:16 samtar@deploy1002: Finished scap: Backport for Start writing to cuc_comment_id everywhere (T233004) (duration: 10m 07s)
- 21:08 samtar@deploy1002: samtar and zabe: Backport for Start writing to cuc_comment_id everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 21:06 samtar@deploy1002: Started scap: Backport for Start writing to cuc_comment_id everywhere (T233004)
- 21:04 samtar@deploy1002: backport aborted: (duration: 01m 22s)
- 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42906 and previous config saved to /var/cache/conftool/dbconfig/20230105-210201-ladsgroup.json
- 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42905 and previous config saved to /var/cache/conftool/dbconfig/20230105-204654-ladsgroup.json
- 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42904 and previous config saved to /var/cache/conftool/dbconfig/20230105-204438-ladsgroup.json
- 20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
- 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
- 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42903 and previous config saved to /var/cache/conftool/dbconfig/20230105-204403-ladsgroup.json
- 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42902 and previous config saved to /var/cache/conftool/dbconfig/20230105-202856-ladsgroup.json
- 20:17 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@9568478]: Bumping platform_eng airflow instance to latest (duration: 00m 09s)
- 20:17 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@9568478]: Bumping platform_eng airflow instance to latest
- 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42901 and previous config saved to /var/cache/conftool/dbconfig/20230105-201350-ladsgroup.json
- 19:59 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.17 refs T325580
- 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42900 and previous config saved to /var/cache/conftool/dbconfig/20230105-195843-ladsgroup.json
- 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42899 and previous config saved to /var/cache/conftool/dbconfig/20230105-195627-ladsgroup.json
- 19:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
- 19:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
- 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42898 and previous config saved to /var/cache/conftool/dbconfig/20230105-195606-ladsgroup.json
- 19:48 taavi@deploy1002: Finished scap: Backport for actions: Pass CommentFormatter to McrRestoreAction (T326275) (duration: 10m 11s)
- 19:41 taavi@deploy1002: taavi and zabe: Backport for actions: Pass CommentFormatter to McrRestoreAction (T326275) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42897 and previous config saved to /var/cache/conftool/dbconfig/20230105-194059-ladsgroup.json
- 19:38 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.10-1wm3_amd64.changes: T325797
- 19:37 taavi@deploy1002: Started scap: Backport for actions: Pass CommentFormatter to McrRestoreAction (T326275)
- 19:31 Amir1: creating new cu tables
- 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42896 and previous config saved to /var/cache/conftool/dbconfig/20230105-192553-ladsgroup.json
- 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42895 and previous config saved to /var/cache/conftool/dbconfig/20230105-191046-ladsgroup.json
- 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42894 and previous config saved to /var/cache/conftool/dbconfig/20230105-190830-ladsgroup.json
- 19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2111.codfw.wmnet with reason: Maintenance
- 19:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2111.codfw.wmnet with reason: Maintenance
- 19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2101.codfw.wmnet with reason: Maintenance
- 19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2101.codfw.wmnet with reason: Maintenance
- 19:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42893 and previous config saved to /var/cache/conftool/dbconfig/20230105-190724-ladsgroup.json
- 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42892 and previous config saved to /var/cache/conftool/dbconfig/20230105-185217-ladsgroup.json
- 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42891 and previous config saved to /var/cache/conftool/dbconfig/20230105-183711-ladsgroup.json
- 18:22 taavi: delete some nostalgiawiki pages using maintenance/deleteBatch.php for T326334
- 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42890 and previous config saved to /var/cache/conftool/dbconfig/20230105-182204-ladsgroup.json
- 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42889 and previous config saved to /var/cache/conftool/dbconfig/20230105-181949-ladsgroup.json
- 18:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 18:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42888 and previous config saved to /var/cache/conftool/dbconfig/20230105-181928-ladsgroup.json
- 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42887 and previous config saved to /var/cache/conftool/dbconfig/20230105-180421-ladsgroup.json
- 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42886 and previous config saved to /var/cache/conftool/dbconfig/20230105-174915-ladsgroup.json
- 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42885 and previous config saved to /var/cache/conftool/dbconfig/20230105-173408-ladsgroup.json
- 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42884 and previous config saved to /var/cache/conftool/dbconfig/20230105-173154-ladsgroup.json
- 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42883 and previous config saved to /var/cache/conftool/dbconfig/20230105-173133-ladsgroup.json
- 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42882 and previous config saved to /var/cache/conftool/dbconfig/20230105-171626-ladsgroup.json
- 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42880 and previous config saved to /var/cache/conftool/dbconfig/20230105-170119-ladsgroup.json
- 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42878 and previous config saved to /var/cache/conftool/dbconfig/20230105-164612-ladsgroup.json
- 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42877 and previous config saved to /var/cache/conftool/dbconfig/20230105-164358-ladsgroup.json
- 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42876 and previous config saved to /var/cache/conftool/dbconfig/20230105-164258-ladsgroup.json
- 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42875 and previous config saved to /var/cache/conftool/dbconfig/20230105-162751-ladsgroup.json
- 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42874 and previous config saved to /var/cache/conftool/dbconfig/20230105-161245-ladsgroup.json
- 16:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 16:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 16:04 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 16:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42873 and previous config saved to /var/cache/conftool/dbconfig/20230105-155738-ladsgroup.json
- 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42872 and previous config saved to /var/cache/conftool/dbconfig/20230105-155524-ladsgroup.json
- 15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance
- 15:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance
- 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42871 and previous config saved to /var/cache/conftool/dbconfig/20230105-155503-ladsgroup.json
- 15:52 matthiasmullie: UTC afternoon backports done
- 15:51 mlitn@deploy1002: Finished scap: Backport for Fix URL construction (duration: 12m 21s)
- 15:41 mlitn@deploy1002: mlitn and mlitn: Backport for Fix URL construction synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42870 and previous config saved to /var/cache/conftool/dbconfig/20230105-153956-ladsgroup.json
- 15:39 mlitn@deploy1002: Started scap: Backport for Fix URL construction
- 15:37 mlitn@deploy1002: Finished scap: Backport for Fix URL construction (duration: 08m 04s)
- 15:31 mlitn@deploy1002: mlitn and mlitn: Backport for Fix URL construction synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 15:29 mlitn@deploy1002: Started scap: Backport for Fix URL construction
- 15:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 15:26 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42869 and previous config saved to /var/cache/conftool/dbconfig/20230105-152447-ladsgroup.json
- 15:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 15:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 15:14 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 15:10 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 15:10 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 15:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42868 and previous config saved to /var/cache/conftool/dbconfig/20230105-150939-ladsgroup.json
- 15:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42867 and previous config saved to /var/cache/conftool/dbconfig/20230105-150825-ladsgroup.json
- 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance
- 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance
- 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42866 and previous config saved to /var/cache/conftool/dbconfig/20230105-150804-ladsgroup.json
- 14:58 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 14:58 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 14:56 claime: hard resetting mw1486
- 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42865 and previous config saved to /var/cache/conftool/dbconfig/20230105-145257-ladsgroup.json
- 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42864 and previous config saved to /var/cache/conftool/dbconfig/20230105-143751-ladsgroup.json
- 14:30 mlitn@deploy1002: Finished scap: Backport for Also get central description (T325831) (duration: 08m 32s)
- 14:23 mlitn@deploy1002: mlitn and mlitn: Backport for Also get central description (T325831) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42862 and previous config saved to /var/cache/conftool/dbconfig/20230105-142244-ladsgroup.json
- 14:21 mlitn@deploy1002: Started scap: Backport for Also get central description (T325831)
- 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42861 and previous config saved to /var/cache/conftool/dbconfig/20230105-142029-ladsgroup.json
- 14:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1110.eqiad.wmnet with reason: Maintenance
- 14:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1110.eqiad.wmnet with reason: Maintenance
- 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42860 and previous config saved to /var/cache/conftool/dbconfig/20230105-142008-ladsgroup.json
- 14:17 mlitn@deploy1002: Finished scap: Backport for Also get central description (T325831) (duration: 07m 57s)
- 14:11 mlitn@deploy1002: mlitn and mlitn: Backport for Also get central description (T325831) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 14:09 mlitn@deploy1002: Started scap: Backport for Also get central description (T325831)
- 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42859 and previous config saved to /var/cache/conftool/dbconfig/20230105-140501-ladsgroup.json
- 13:58 Amir1: start of externallinks migration in elwiki (and rest of large wikis in s3) (T326314)
- 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42858 and previous config saved to /var/cache/conftool/dbconfig/20230105-134955-ladsgroup.json
- 13:46 ladsgroup@deploy1002: Finished scap: Backport for Enable write both for externallinks in ten largest s3 wikis (T321662) (duration: 08m 54s)
- 13:42 urbanecm: aswikiquote: Run importDump.php to import a XML dump (per new wiki importers request, running into issues with a largish page)
- 13:39 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Enable write both for externallinks in ten largest s3 wikis (T321662) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 13:38 XioNoX: start [eqiad] faulty VC optics maintenance - T325803
- 13:37 ladsgroup@deploy1002: Started scap: Backport for Enable write both for externallinks in ten largest s3 wikis (T321662)
- 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42857 and previous config saved to /var/cache/conftool/dbconfig/20230105-133448-ladsgroup.json
- 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42856 and previous config saved to /var/cache/conftool/dbconfig/20230105-133234-ladsgroup.json
- 13:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1100.eqiad.wmnet with reason: Maintenance
- 13:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1100.eqiad.wmnet with reason: Maintenance
- 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42855 and previous config saved to /var/cache/conftool/dbconfig/20230105-133211-ladsgroup.json
- 13:30 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 13:29 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 13:21 effie: enable puppet on all mw servers
- 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42854 and previous config saved to /var/cache/conftool/dbconfig/20230105-131705-ladsgroup.json
- 13:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 13:03 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 13:03 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 13:03 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 13:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 13:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 13:02 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 13:02 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 13:02 oblivian@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
- 13:02 oblivian@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
- 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42853 and previous config saved to /var/cache/conftool/dbconfig/20230105-130158-ladsgroup.json
- 13:02 oblivian@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
- 13:01 oblivian@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
- 13:01 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
- 13:01 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
- 13:01 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
- 13:01 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
- 13:01 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 13:01 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 13:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 13:00 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 13:00 hashar: Restarted Gerrit for a plugin update
- 12:58 hashar@deploy1002: Finished deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages (duration: 00m 08s)
- 12:58 hashar@deploy1002: Started deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages
- 12:52 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 12:49 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 12:49 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 12:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42852 and previous config saved to /var/cache/conftool/dbconfig/20230105-124651-ladsgroup.json
- 12:45 hashar@deploy1002: Finished deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages (duration: 00m 10s)
- 12:45 hashar@deploy1002: Started deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages
- 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42851 and previous config saved to /var/cache/conftool/dbconfig/20230105-124437-ladsgroup.json
- 12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance
- 12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance
- 12:44 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 12:42 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 12:42 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 12:31 ladsgroup:: Deployed security patch for T233004 T326293
- 12:02 hashar: gerrit: running `copy-approvals` script to prepare for Gerrit 3.6 upgrade (T309870): `ssh -p 29418 gerrit.wikimedia.org gerrit copy-approvals --verbose`
- 11:59 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 11:58 hashar: Restarting Gerrit
- 11:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler (duration: 00m 09s)
- 11:57 hashar@deploy1002: Started deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler
- 11:57 hashar: Stopping Gerrit for plugin deployment
- 11:45 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 11:40 effie: disabling puppet on all hosts running mcrouter to merge 860102
- 11:24 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mwdebug,name=eqiad
- 11:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 11:23 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=mwdebug,name=eqiad
- 11:23 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 11:22 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mwdebug,name=codfw
- 11:20 hashar@deploy1002: Finished deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler (duration: 00m 10s)
- 11:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 11:20 hashar@deploy1002: Started deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler
- 11:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 11:19 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=mwdebug,name=codfw
- 11:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:13 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 11:13 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 11:12 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:12 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42850 and previous config saved to /var/cache/conftool/dbconfig/20230105-105808-root.json
- 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42849 and previous config saved to /var/cache/conftool/dbconfig/20230105-104303-root.json
- 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42848 and previous config saved to /var/cache/conftool/dbconfig/20230105-102758-root.json
- 10:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 10:26 claime: Rolling reboot of api_appserver hosts in eqiad
- 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 100%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42847 and previous config saved to /var/cache/conftool/dbconfig/20230105-102357-root.json
- 10:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42846 and previous config saved to /var/cache/conftool/dbconfig/20230105-101253-root.json
- 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 75%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42845 and previous config saved to /var/cache/conftool/dbconfig/20230105-100852-root.json
- 10:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 10:06 claime: Restarting rolling reboot of api_appserver hosts in codfw
- 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42844 and previous config saved to /var/cache/conftool/dbconfig/20230105-095748-root.json
- 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 60%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42843 and previous config saved to /var/cache/conftool/dbconfig/20230105-095347-root.json
- 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42841 and previous config saved to /var/cache/conftool/dbconfig/20230105-094243-root.json
- 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 50%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42840 and previous config saved to /var/cache/conftool/dbconfig/20230105-093842-root.json
- 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42839 and previous config saved to /var/cache/conftool/dbconfig/20230105-092738-root.json
- 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 40%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42838 and previous config saved to /var/cache/conftool/dbconfig/20230105-092336-root.json
- 09:14 XioNoX: turn up BGP to NTT in drmrs - T314929
- 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 25%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42837 and previous config saved to /var/cache/conftool/dbconfig/20230105-090831-root.json
- 08:56 hashar@deploy1002: Finished scap: Backport for [SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367) (duration: 11m 38s)
- 08:46 hashar@deploy1002: hashar and mlitn: Backport for [SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 08:44 hashar@deploy1002: Started scap: Backport for [SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367)
- 07:58 moritzm: installing glibc security updates on bullseye
- 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db2151 in s6 T326206', diff saved to https://phabricator.wikimedia.org/P42836 and previous config saved to /var/cache/conftool/dbconfig/20230105-075046-marostegui.json
- 07:28 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 07:27 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 07:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 07:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 to clone db1176 T326211', diff saved to https://phabricator.wikimedia.org/P42833 and previous config saved to /var/cache/conftool/dbconfig/20230105-064153-marostegui.json
- 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2151 for the first time in s6 T326206', diff saved to https://phabricator.wikimedia.org/P42832 and previous config saved to /var/cache/conftool/dbconfig/20230105-063937-marostegui.json
- 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
2023-01-04
- 23:01 mutante: deploy2002 - re-arming keyholder T324014
- 23:00 mutante: deploy1002 - re-arming keyholder T324014
- 22:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 22:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 22:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42831 and previous config saved to /var/cache/conftool/dbconfig/20230104-223545-marostegui.json
- 22:27 kindrobot: finished UTC late backport window
- 22:27 kindrobot@deploy1002: Finished scap: Backport for Fix underlinkedness rescore logic (T301096), Fix underlinkedness rescore logic (T301096) (duration: 15m 20s)
- 22:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P42828 and previous config saved to /var/cache/conftool/dbconfig/20230104-222038-marostegui.json
- 22:13 kindrobot@deploy1002: kindrobot and tgr: Backport for Fix underlinkedness rescore logic (T301096), Fix underlinkedness rescore logic (T301096) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 22:11 kindrobot@deploy1002: Started scap: Backport for Fix underlinkedness rescore logic (T301096), Fix underlinkedness rescore logic (T301096)
- 22:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P42827 and previous config saved to /var/cache/conftool/dbconfig/20230104-220532-marostegui.json
- 21:51 kindrobot@deploy1002: backport aborted: (duration: 02m 12s)
- 21:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42826 and previous config saved to /var/cache/conftool/dbconfig/20230104-215025-marostegui.json
- 21:48 taavi: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki mediawikiwiki "African Wikimedia Technical Community/Project Scope" "Africa Wikimedia Technical Community/Project Scope" "Taavi" --reason "per request phab:T318292" # T318292
- 21:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42825 and previous config saved to /var/cache/conftool/dbconfig/20230104-214616-marostegui.json
- 21:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 21:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42824 and previous config saved to /var/cache/conftool/dbconfig/20230104-214555-marostegui.json
- 21:44 kindrobot@deploy1002: Finished scap: Backport for Add namespace to gorwiktionary (T326253) (duration: 11m 26s)
- 21:35 kindrobot@deploy1002: kindrobot and jhsoby: Backport for Add namespace to gorwiktionary (T326253) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 21:33 kindrobot@deploy1002: Started scap: Backport for Add namespace to gorwiktionary (T326253)
- 21:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P42823 and previous config saved to /var/cache/conftool/dbconfig/20230104-213049-marostegui.json
- 21:28 kindrobot@deploy1002: Finished scap: Backport for Start writing to cuc_comment_id on group0 and group1 wikis (T233004) (duration: 17m 28s)
- 21:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P42820 and previous config saved to /var/cache/conftool/dbconfig/20230104-211542-marostegui.json
- 21:12 kindrobot@deploy1002: kindrobot and zabe: Backport for Start writing to cuc_comment_id on group0 and group1 wikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
- 21:10 kindrobot@deploy1002: Started scap: Backport for Start writing to cuc_comment_id on group0 and group1 wikis (T233004)
- 21:05 kindrobot: starting UTC late backport window
- 21:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42819 and previous config saved to /var/cache/conftool/dbconfig/20230104-210036-marostegui.json
- 20:58 Amir1: running refreshGlobalimagelinks.php on all wikis (T322588)
- 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42818 and previous config saved to /var/cache/conftool/dbconfig/20230104-205628-marostegui.json
- 20:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 20:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42817 and previous config saved to /var/cache/conftool/dbconfig/20230104-205607-marostegui.json
- 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P42816 and previous config saved to /var/cache/conftool/dbconfig/20230104-204100-marostegui.json
- 20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P42815 and previous config saved to /var/cache/conftool/dbconfig/20230104-202554-marostegui.json
- 20:14 cstone: payments-wiki upgraded from ede93d62 to f075991f
- 20:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42814 and previous config saved to /var/cache/conftool/dbconfig/20230104-201047-marostegui.json
- 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42813 and previous config saved to /var/cache/conftool/dbconfig/20230104-200638-marostegui.json
- 20:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 20:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42812 and previous config saved to /var/cache/conftool/dbconfig/20230104-200617-marostegui.json
- 19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P42811 and previous config saved to /var/cache/conftool/dbconfig/20230104-195110-marostegui.json
- 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P42810 and previous config saved to /var/cache/conftool/dbconfig/20230104-193604-marostegui.json
- 19:32 dduvall@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.17 refs T325580 (duration: 06m 58s)
- 19:25 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.17 refs T325580
- 19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42809 and previous config saved to /var/cache/conftool/dbconfig/20230104-192057-marostegui.json
- 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42808 and previous config saved to /var/cache/conftool/dbconfig/20230104-191648-marostegui.json
- 19:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 19:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42807 and previous config saved to /var/cache/conftool/dbconfig/20230104-191627-marostegui.json
- 19:07 dancy@deploy1002: Installing scap version "4.32.0" for 560 hosts
- 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P42806 and previous config saved to /var/cache/conftool/dbconfig/20230104-190121-marostegui.json
- 18:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P42805 and previous config saved to /var/cache/conftool/dbconfig/20230104-184614-marostegui.json
- 18:40 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@84f5f50]: (no justification provided) (duration: 00m 05s)
- 18:40 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@84f5f50]: (no justification provided)
- 18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42804 and previous config saved to /var/cache/conftool/dbconfig/20230104-183108-marostegui.json
- 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42803 and previous config saved to /var/cache/conftool/dbconfig/20230104-182700-marostegui.json
- 18:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 18:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
- 18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
- 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42802 and previous config saved to /var/cache/conftool/dbconfig/20230104-182425-marostegui.json
- 18:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (after remembering to update the submodules) (duration: 00m 54s)
- 18:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (after remembering to update the submodules)
- 18:13 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (duration: 03m 54s)
- 18:09 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling
- 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P42801 and previous config saved to /var/cache/conftool/dbconfig/20230104-180918-marostegui.json
- 18:00 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
- 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P42800 and previous config saved to /var/cache/conftool/dbconfig/20230104-175412-marostegui.json
- 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42799 and previous config saved to /var/cache/conftool/dbconfig/20230104-173905-marostegui.json
- 17:37 dancy@deploy1002: Installing scap version "4.31.1" for 560 hosts
- 17:36 dancy@deploy1002: Finished scap: testing (duration: 07m 50s)
- 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42798 and previous config saved to /var/cache/conftool/dbconfig/20230104-173455-marostegui.json
- 17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
- 17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
- 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42797 and previous config saved to /var/cache/conftool/dbconfig/20230104-173434-marostegui.json
- 17:28 dancy@deploy1002: Started scap: testing
- 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P42796 and previous config saved to /var/cache/conftool/dbconfig/20230104-171928-marostegui.json
- 17:10 mutante: new Wikipedia (and other projects) language added: guc - https://en.wikipedia.org/wiki/Wayuu_language - https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Wayuu T321880
- 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P42795 and previous config saved to /var/cache/conftool/dbconfig/20230104-170421-marostegui.json
- 17:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 17:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 16:55 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@84f5f50]: Bumping platform_eng airflow instance to latest (duration: 00m 17s)
- 16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@84f5f50]: Bumping platform_eng airflow instance to latest
- 16:49 dancy@deploy1002: Installing scap version "4.30.3-1" for 560 hosts
- 16:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42794 and previous config saved to /var/cache/conftool/dbconfig/20230104-164915-marostegui.json
- 16:48 dancy@deploy1002: Finished scap: testing (duration: 13m 16s)
- 16:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42793 and previous config saved to /var/cache/conftool/dbconfig/20230104-164504-marostegui.json
- 16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 16:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1112.eqiad.wmnet with reason: Maintenance
- 16:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1112.eqiad.wmnet with reason: Maintenance
- 16:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
- 16:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
- 16:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 16:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 16:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 16:37 dancy@deploy1002: Started scap: testing
- 16:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 16:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 16:33 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 16:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 16:30 dancy@deploy1002: Installing scap version "4.31.0" for 560 hosts
- 16:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42792 and previous config saved to /var/cache/conftool/dbconfig/20230104-162828-marostegui.json
- 16:29 dancy@deploy1002: sync-world aborted: (no justification provided) (duration: 00m 13s)
- 16:27 dancy@deploy1002: Started scap: (no justification provided)
- 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P42791 and previous config saved to /var/cache/conftool/dbconfig/20230104-161321-marostegui.json
- 15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2402.*
- 15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2401.*
- 15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2400.*
- 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P42790 and previous config saved to /var/cache/conftool/dbconfig/20230104-155815-marostegui.json
- 15:51 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42789 and previous config saved to /var/cache/conftool/dbconfig/20230104-154308-marostegui.json
- 15:34 moritzm: installing glibc security updates on bullseye
- 15:34 moritzm: installing glibc security updates
- 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42788 and previous config saved to /var/cache/conftool/dbconfig/20230104-153435-marostegui.json
- 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42787 and previous config saved to /var/cache/conftool/dbconfig/20230104-153413-marostegui.json
- 15:33 ladsgroup@deploy1002: Finished scap: Backport for Disable LoadMonitor in CLI (T322156) (duration: 09m 48s)
- 15:32 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 15:32 claime: Restarting rolling reboot of api_appserver hosts in codfw
- 15:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Disable LoadMonitor in CLI (T322156) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 15:23 ladsgroup@deploy1002: Started scap: Backport for Disable LoadMonitor in CLI (T322156)
- 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P42786 and previous config saved to /var/cache/conftool/dbconfig/20230104-151907-marostegui.json
- 15:06 marostegui: dbmaint deploy schema change on s5 eqiad T326224
- 15:05 marostegui: dbmaint deploy schema change on s3 eqiad T326224
- 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P42785 and previous config saved to /var/cache/conftool/dbconfig/20230104-150400-marostegui.json
- 15:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
- 15:00 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
- 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42784 and previous config saved to /var/cache/conftool/dbconfig/20230104-144853-marostegui.json
- 14:46 marostegui: dbmaint deploy schema change on s3 eqiad T326222
- 14:44 marostegui: dbmaint deploy schema change on s5 eqiad T326222
- 14:42 XioNoX: fix inconsistent mtu betwen cr1-eqiad<->lsw1-f1 - T315838
- 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42783 and previous config saved to /var/cache/conftool/dbconfig/20230104-144025-marostegui.json
- 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 14:40 urbanecm: UTC afternoon B&C window done
- 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
- 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42782 and previous config saved to /var/cache/conftool/dbconfig/20230104-143949-marostegui.json
- 14:38 marostegui: dbmaint deploy schema change on s3 eqiad T326223
- 14:38 urbanecm@deploy1002: Finished scap: Backport for Start reading from cul_actor on testwiki (T233004), aswikiquote: Set timezone to Asia/Kolkata (T321246) (duration: 09m 50s)
- 14:37 marostegui: dbmaint deploy schema change on s5 eqiad T326223
- 14:32 XioNoX: fix inconsistent mtu on mr1-eqiad - T315838
- 14:30 urbanecm@deploy1002: urbanecm and urbanecm and zabe: Backport for Start reading from cul_actor on testwiki (T233004), aswikiquote: Set timezone to Asia/Kolkata (T321246) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 14:28 urbanecm@deploy1002: Started scap: Backport for Start reading from cul_actor on testwiki (T233004), aswikiquote: Set timezone to Asia/Kolkata (T321246)
- 14:27 urbanecm@deploy1002: Finished scap: Backport for plwiki: Add editcontentmodel to interface-admin (T325819), Mark active sections even when their headings are in wrapper elements (T318044 T324869) (duration: 09m 32s)
- 14:27 XioNoX: fix inconsistent mtu on mr1-codfw - T315838
- 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P42781 and previous config saved to /var/cache/conftool/dbconfig/20230104-142442-marostegui.json
- 14:24 marostegui: dbmaint deploy schema change on s7 eqiad T326227
- 14:22 XioNoX: fix inconsistent mtu on mr1-eqsin - T315838
- 14:19 urbanecm@deploy1002: urbanecm and stang and matmarex: Backport for plwiki: Add editcontentmodel to interface-admin (T325819), Mark active sections even when their headings are in wrapper elements (T318044 T324869) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 14:18 urbanecm@deploy1002: Started scap: Backport for plwiki: Add editcontentmodel to interface-admin (T325819), Mark active sections even when their headings are in wrapper elements (T318044 T324869)
- 14:16 urbanecm@deploy1002: backport aborted: (duration: 00m 07s)
- 14:16 urbanecm@deploy1002: Finished scap: Backport for Revert "trwiki: Add 20 years celebration logos" (T325823), kuwiki: Install SandboxLink (T325469) (duration: 09m 37s)
- 14:16 marostegui: Sanitize new wikis T326138 T321294 T321288 T321256
- 14:15 XioNoX: fix inconsistent mtu on mr1-esams - T315838
- 14:14 marostegui: dbmaint deploy schema change on s7 eqiad T326228
- 14:13 marostegui: dbmaint deploy schema change on s7 eqiad T326226
- 14:11 marostegui: dbmaint deploy schema change on s8 eqiad T326221
- 14:11 marostegui: dbmaint deploy schema change on s7 eqiad T326221
- 14:11 marostegui: dbmaint deploy schema change on s6 eqiad T326221
- 14:11 marostegui: dbmaint deploy schema change on s5 eqiad T326221
- 14:11 marostegui: dbmaint deploy schema change on s4 eqiad T326221
- 14:11 marostegui: dbmaint deploy schema change on s3 eqiad T326221
- 14:11 marostegui: dbmaint deploy schema change on s2 eqiad T326221
- 14:11 marostegui: dbmaint deploy schema change on s1 eqiad T326221
- 14:10 marostegui: dbmaint deploy schema change on s7 eqiad T326225
- 14:10 marostegui: dbmaint deploy schema change on s7 T326225
- 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetdb2002.codfw.wmnet with reason: maintenance
- 14:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetdb2002.codfw.wmnet with reason: maintenance
- 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P42780 and previous config saved to /var/cache/conftool/dbconfig/20230104-140936-marostegui.json
- 14:08 urbanecm@deploy1002: urbanecm and stang: Backport for Revert "trwiki: Add 20 years celebration logos" (T325823), kuwiki: Install SandboxLink (T325469) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 14:06 urbanecm@deploy1002: Started scap: Backport for Revert "trwiki: Add 20 years celebration logos" (T325823), kuwiki: Install SandboxLink (T325469)
- 14:04 XioNoX: fix inconsistent mtu on mr1-ulsfo - T315838
- 14:02 marostegui: dbmaint deploy schema change on s3 T326221
- 14:02 moritzm: updating buster nodes running 5.10 to 5.10.158-2~deb10u1 (only rollout of the new kernel, no reboots)
- 14:02 urbanecm@deploy1002: Finished scap: Backport for Update interwiki cache (duration: 08m 00s)
- 13:58 marostegui: dbmaint deploy schema change on s7 T326221
- 13:57 marostegui: dbmaint deploy schema change on s8 T326221
- 13:57 marostegui: dbmaint deploy schema change on s6 T326221
- 13:56 marostegui: dbmaint deploy schema change on s5 T326221
- 13:55 marostegui: dbmaint deploy schema change on s4 T326221
- 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42779 and previous config saved to /var/cache/conftool/dbconfig/20230104-135429-marostegui.json
- 13:54 urbanecm@deploy1002: Started scap: Backport for Update interwiki cache
- 13:54 marostegui: dbmaint deploy schema change on s2 T326221
- 13:53 marostegui: dbmaint deploy schema change on s1 T326221
- 13:52 urbanecm@deploy1002: Finished scap: Creating gorwiktionary (T326137), fixing aswikiquote logo (T321246) (duration: 07m 52s)
- 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42778 and previous config saved to /var/cache/conftool/dbconfig/20230104-134544-marostegui.json
- 13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 13:45 XioNoX: repool esams-eqiad link for mtu change - T315838
- 13:44 urbanecm@deploy1002: Started scap: Creating gorwiktionary (T326137), fixing aswikiquote logo (T321246)
- 13:41 XioNoX: drain esams-eqiad link for mtu change - T315838
- 13:39 urbanecm@deploy1002: Finished scap: Backport for Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137), Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137) (duration: 38m 23s)
- 13:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 13:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42777 and previous config saved to /var/cache/conftool/dbconfig/20230104-133830-marostegui.json
- 13:33 XioNoX: fix missmatch MTU on pfw3-codfw - T315838
- 13:31 urbanecm: New wiki creation will run over by a couple of minutes
- 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42776 and previous config saved to /var/cache/conftool/dbconfig/20230104-132323-marostegui.json
- 13:15 XioNoX: fix missmatch MTU on cloudsw switches - T315838
- 13:11 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
- 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42775 and previous config saved to /var/cache/conftool/dbconfig/20230104-130816-marostegui.json
- 13:00 urbanecm@deploy1002: Started scap: Backport for Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137), Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137)
- 12:58 urbanecm@deploy1002: Finished scap: Creating shnwikibooks (T321248) (duration: 07m 38s)
- 12:56 moritzm: installing emacs security updates
- 12:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
- 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42774 and previous config saved to /var/cache/conftool/dbconfig/20230104-125330-root.json
- 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42773 and previous config saved to /var/cache/conftool/dbconfig/20230104-125310-marostegui.json
- 12:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
- 12:50 urbanecm@deploy1002: Started scap: Creating shnwikibooks (T321248)
- 12:48 urbanecm@deploy1002: Finished scap: Creating guwwikiquote (T321247) (duration: 07m 44s)
- 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42772 and previous config saved to /var/cache/conftool/dbconfig/20230104-124424-marostegui.json
- 12:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
- 12:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
- 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42771 and previous config saved to /var/cache/conftool/dbconfig/20230104-124403-marostegui.json
- 12:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 12:41 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 12:41 urbanecm@deploy1002: Started scap: Creating guwwikiquote (T321247)
- 12:40 claime: Rolling reboot of api_appserver hosts in codfw paused for https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230104T1200
- 12:38 urbanecm@deploy1002: Finished scap: Creating aswikiquote (T321246) (duration: 07m 49s)
- 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42770 and previous config saved to /var/cache/conftool/dbconfig/20230104-123825-root.json
- 12:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
- 12:31 urbanecm@deploy1002: Started scap: Creating aswikiquote (T321246)
- 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P42769 and previous config saved to /var/cache/conftool/dbconfig/20230104-122857-marostegui.json
- 12:27 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 12:26 urbanecm@deploy1002: Finished scap: Backport for Add namespace translations in Wayuu (T321881), Add namespace translations in Wayuu (T321881) (duration: 10m 36s)
- 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42768 and previous config saved to /var/cache/conftool/dbconfig/20230104-122320-root.json
- 12:18 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Add namespace translations in Wayuu (T321881), Add namespace translations in Wayuu (T321881) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 12:16 urbanecm@deploy1002: Started scap: Backport for Add namespace translations in Wayuu (T321881), Add namespace translations in Wayuu (T321881)
- 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P42767 and previous config saved to /var/cache/conftool/dbconfig/20230104-121350-marostegui.json
- 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42766 and previous config saved to /var/cache/conftool/dbconfig/20230104-120815-root.json
- 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42765 and previous config saved to /var/cache/conftool/dbconfig/20230104-115844-marostegui.json
- 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42764 and previous config saved to /var/cache/conftool/dbconfig/20230104-115310-root.json
- 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42763 and previous config saved to /var/cache/conftool/dbconfig/20230104-115011-marostegui.json
- 11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2109.codfw.wmnet with reason: Maintenance
- 11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2109.codfw.wmnet with reason: Maintenance
- 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42761 and previous config saved to /var/cache/conftool/dbconfig/20230104-113805-root.json
- 11:33 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetdb2003.codfw.wmnet
- 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2151 to dbctl depooled T326206', diff saved to https://phabricator.wikimedia.org/P42759 and previous config saved to /var/cache/conftool/dbconfig/20230104-112801-marostegui.json
- 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42758 and previous config saved to /var/cache/conftool/dbconfig/20230104-112300-root.json
- 11:02 vgutierrez: testing HAProxy 2.4.20 in cp4037 and cp4045
- 10:56 vgutierrez: (apt1001) import HAproxy 2.4.20 from third-party repo for buster and bullseye
- 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 1098 hosts
- 10:48 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 1098 hosts
- 10:48 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 894 hosts
- 10:47 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 894 hosts
- 10:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 10:37 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124 T326206', diff saved to https://phabricator.wikimedia.org/P42756 and previous config saved to /var/cache/conftool/dbconfig/20230104-103109-marostegui.json
- 10:29 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 10:29 claime: Rolling reboot of api_appserver hosts in codfw
- 10:24 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 10:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 10:14 claime: Rolling reboot of mwdebug hosts in eqiad
- 10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 10:04 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 10:04 marostegui: dbmaint eqiad deploy schema change on s5 T326011
- 10:04 claime: Rolling reboot of mwdebug hosts in codfw
- 10:04 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 10:04 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 10:04 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 10:03 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 10:03 filippo@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
- 10:03 filippo@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
- 10:03 filippo@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
- 10:03 filippo@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
- 10:03 filippo@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
- 10:03 filippo@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
- 10:03 filippo@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
- 10:03 filippo@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
- 10:03 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 10:03 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 10:03 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 10:03 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 10:03 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 10:03 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 10:03 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 10:02 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 10:02 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 10:01 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 10:01 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 10:00 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 09:53 effie: Upload imposm3_0.11.1-1 to buster-wikimedia - T325293
- 09:48 XioNoX: drmrs: offload traffic from Tata - T324955
- 09:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56286
- 09:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56286
- 09:37 marostegui: dbmaint codfw deploy schema change on s5 T326011
- 09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet
- 09:29 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
- 09:08 matthiasmullie: UTC morning backports done
- 09:07 mlitn@deploy1002: Finished scap: Backport for Squashed diff to catch up to wmf/1.40.0-wmf.17 (duration: 08m 13s)
- 09:01 mlitn@deploy1002: mlitn and mlitn: Backport for Squashed diff to catch up to wmf/1.40.0-wmf.17 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
- 09:00 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb1003.eqiad.wmnet
- 08:59 mlitn@deploy1002: Started scap: Backport for Squashed diff to catch up to wmf/1.40.0-wmf.17
- 08:57 mlitn@deploy1002: Finished scap: Backport for Change IW breakpoint to be enabled on smaller screen (T321377) (duration: 08m 56s)
- 08:50 mlitn@deploy1002: mlitn and mlitn: Backport for Change IW breakpoint to be enabled on smaller screen (T321377) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 08:48 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
- 08:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb1003.eqiad.wmnet
- 08:48 mlitn@deploy1002: Started scap: Backport for Change IW breakpoint to be enabled on smaller screen (T321377)
- 08:32 mlitn@deploy1002: Finished scap: Backport for Always show search results at full width (T321377) (duration: 08m 22s)
- 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: After testing', diff saved to https://phabricator.wikimedia.org/P42755 and previous config saved to /var/cache/conftool/dbconfig/20230104-082942-root.json
- 08:26 marostegui: dbmaint codfw deploy schema change on s8 T326011
- 08:26 marostegui: dbmaint eqiad deploy schema change on s8 T326011
- 08:26 marostegui: dbmaint eqiad deploy schema change on s4 T326011
- 08:26 marostegui: dbmaint codfw deploy schema change on s4 T326011
- 08:26 marostegui: dbmaint codfw deploy schema change on s4 T255174
- 08:26 marostegui: dbmaint eqiad deploy schema change on s4 T255174
- 08:25 mlitn@deploy1002: mlitn and mlitn: Backport for Always show search results at full width (T321377) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
- 08:23 mlitn@deploy1002: Started scap: Backport for Always show search results at full width (T321377)
- 08:22 marostegui: dbmaint eqiad deploy schema change on s8 T255174
- 08:20 marostegui: dbmaint codfw deploy schema change on s8 T255174
- 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: After testing', diff saved to https://phabricator.wikimedia.org/P42754 and previous config saved to /var/cache/conftool/dbconfig/20230104-081437-root.json
- 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: After testing', diff saved to https://phabricator.wikimedia.org/P42753 and previous config saved to /var/cache/conftool/dbconfig/20230104-075932-root.json
- 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: After testing', diff saved to https://phabricator.wikimedia.org/P42752 and previous config saved to /var/cache/conftool/dbconfig/20230104-074427-root.json
- 07:38 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 07:38 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 07:38 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 07:38 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 07:38 marostegui: Switch x1 back to RBR T255174
- 07:35 marostegui: dbmaint codfw deploy schema change on x1 T255174
- 07:35 marostegui: dbmaint eqiad deploy schema change on x1 T255174
- 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: After testing', diff saved to https://phabricator.wikimedia.org/P42751 and previous config saved to /var/cache/conftool/dbconfig/20230104-072922-root.json
- 07:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 07:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 07:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 07:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: After testing', diff saved to https://phabricator.wikimedia.org/P42750 and previous config saved to /var/cache/conftool/dbconfig/20230104-071417-root.json
- 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: After testing', diff saved to https://phabricator.wikimedia.org/P42749 and previous config saved to /var/cache/conftool/dbconfig/20230104-065912-root.json
2023-01-03
- 22:47 eileen: config 34754c69 -> 03c4d7a6
- 22:33 eileen: config revision changed from 5c73975a to 34754c69
- 21:55 mutante: gitlab-runner* - correction: allowing connections TO kubestagemaster.svc.eqiad.wmnet port 6443 FROM trusted runners, of course - T325385
- 21:53 mutante: gitlab-runner* - allowing kubestagemaster.svc.eqiad.wmnet to connect to port 6443, run puppet via cumin, deploy gerrit:868737 - T325385
- 21:47 taavi: UTC late backports done
- 21:46 taavi@deploy1002: Finished scap: Backport for Specify Citoid RESTBase URL separately (T325425), Use new DiscussionTools heading markup on group1 wikis (T314714) (duration: 12m 12s)
- 21:35 taavi@deploy1002: taavi and matmarex: Backport for Specify Citoid RESTBase URL separately (T325425), Use new DiscussionTools heading markup on group1 wikis (T314714) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 21:34 taavi@deploy1002: Started scap: Backport for Specify Citoid RESTBase URL separately (T325425), Use new DiscussionTools heading markup on group1 wikis (T314714)
- 21:30 taavi@deploy1002: Finished scap: Backport for Start writing to cuc_comment_id on test wikis (T233004) (duration: 12m 54s)
- 21:19 taavi@deploy1002: taavi and zabe: Backport for Start writing to cuc_comment_id on test wikis (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 21:17 taavi@deploy1002: Started scap: Backport for Start writing to cuc_comment_id on test wikis (T233004)
- 21:15 taavi@deploy1002: Finished scap: Backport for Stop setting $wgActorTableSchemaMigrationStage (T215466), Pin $wgCommentTempTableSchemaMigrationStage to default value (T299954), Pin cu_changes comment migration to old schema (T233004) (duration: 08m 49s)
- 21:08 taavi@deploy1002: taavi and zabe: Backport for Stop setting $wgActorTableSchemaMigrationStage (T215466), Pin $wgCommentTempTableSchemaMigrationStage to default value (T299954), Pin cu_changes comment migration to old schema (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 21:06 taavi@deploy1002: Started scap: Backport for Stop setting $wgActorTableSchemaMigrationStage (T215466), Pin $wgCommentTempTableSchemaMigrationStage to default value (T299954), Pin cu_changes comment migration to old schema (T233004)
- 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.17 refs T325580
- 19:18 dduvall@deploy1002: deploy-promote aborted: (duration: 08m 55s)
- 19:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1001.eqiad.wmnet with OS bullseye
- 17:37 claime: Finished parse reboots in eqiad
- 17:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 17:30 sukhe: sudo cumin -b 1 -s 5 'A:codfw and P{O:swift::proxy}' 'depool && sleep 3 && systemctl restart swift-proxy && sleep 3 && pool'
- 16:40 ejegg: fundraising EOY receipt calculation finished, restarted scheduled jobs
- 16:21 ejegg: fundraising scheduled jobs disabled for EOY receipt calculation
- 15:37 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
- 15:30 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1001.eqiad.wmnet with OS bullseye
- 15:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 15:13 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 15:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 15:13 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 15:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 15:11 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
- 15:10 andrewbogott: upgrading and rebooting wikitech-static
- 15:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 15:06 claime: Starting rolling reboot of parse* hosts in eqiad
- 15:05 taavi: UTC afternoon backports done
- 15:04 taavi@deploy1002: Finished scap: Backport for SecurePoll: Add files for UCoC 2023 vote (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793) (duration: 08m 10s)
- 15:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts graphite1004.eqiad.wmnet
- 14:59 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:59 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: graphite1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
- 14:58 taavi@deploy1002: taavi and taavi: Backport for SecurePoll: Add files for UCoC 2023 vote (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
- 14:56 taavi@deploy1002: Started scap: Backport for SecurePoll: Add files for UCoC 2023 vote (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793), ucoc2023: Update populateEditCount to count Flow edits (T324793)
- 14:53 taavi@deploy1002: Finished scap: Backport for Revert "Revert "Start mobile DiscussionTools A/B test"" (T321961) (duration: 09m 13s)
- 14:48 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: graphite1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
- 14:45 taavi@deploy1002: taavi and matmarex: Backport for Revert "Revert "Start mobile DiscussionTools A/B test"" (T321961) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 14:44 filippo@cumin1001: START - Cookbook sre.dns.netbox
- 14:44 taavi@deploy1002: Started scap: Backport for Revert "Revert "Start mobile DiscussionTools A/B test"" (T321961)
- 14:41 taavi@deploy1002: Finished scap: Backport for Log token for the DiscussionTools mobile a/b test (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961), a/b test anonymous ID was being reset because of cookie prefixes (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961) (duration: 08m 31s)
- 14:39 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts graphite1004.eqiad.wmnet
- 14:34 taavi@deploy1002: taavi and matmarex: Backport for Log token for the DiscussionTools mobile a/b test (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961), a/b test anonymous ID was being reset because of cookie prefixes (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961) synced to the testservers:
- 14:33 taavi@deploy1002: Started scap: Backport for Log token for the DiscussionTools mobile a/b test (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961), a/b test anonymous ID was being reset because of cookie prefixes (T321961), Log bucket/token for the DiscussionTools mobile a/b test (T321961)
- 14:13 oblivian@deploy1002: Finished scap: Backport for etcd: use the v3-style SRV record (T320397) (duration: 07m 58s)
- 14:07 oblivian@deploy1002: oblivian and oblivian: Backport for etcd: use the v3-style SRV record (T320397) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
- 14:05 oblivian@deploy1002: Started scap: Backport for etcd: use the v3-style SRV record (T320397)
- 13:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 13:46 moritzm: installing libksba security updates
- 13:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab1004.eqiad.wmnet
- 13:19 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host phab1004.eqiad.wmnet
- 12:33 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (duration: 02m 49s)
- 12:30 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling
- 12:28 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): pushing wmf-puppet-dashboard updates for enc git handling (duration: 01m 12s)
- 12:27 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): pushing wmf-puppet-dashboard updates for enc git handling
- 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2131', diff saved to https://phabricator.wikimedia.org/P42744 and previous config saved to /var/cache/conftool/dbconfig/20230103-114030-marostegui.json
- 11:35 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 11:34 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 11:34 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 11:33 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 11:30 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2001.wikimedia.org
- 11:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 11:25 claime: Starting rolling reboot of parse* hosts in codfw
- 11:06 hashar: contint2001: starting Jenkins manually
- 11:04 marostegui: Change x1 binlog format to STATEMENT T255174
- 11:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
- 10:59 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
- 10:59 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org
- 10:58 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2002.wikimedia.org
- 10:53 marostegui: Restart eqiad sanitarium T326105
- 10:53 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2002.wikimedia.org
- 10:50 marostegui: Restart codfw sanitarium masters T326105
- 10:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1002.wikimedia.org
- 10:43 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint1002.wikimedia.org
- 10:37 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
- 10:36 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
- 10:36 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit1001.wikimedia.org
- 10:31 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit1001.wikimedia.org
- 10:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
- 10:18 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
- 09:27 vgutierrez: restarting varnish on cp5032 to clear VarnishChildRestarted alert - T325797
- 08:19 kartik@deploy1002: Finished scap: Backport for Content Translation: Move ttwiki out of Beta (T319177) (duration: 16m 09s)
- 08:16 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse1002.eqiad.wmnet
- 08:12 moritzm: installing Linux 4.19.269 on Buster hosts
- 08:12 phedenskog@deploy1002: Finished deploy [performance/navtiming@4f8c010]: (no justification provided) (duration: 00m 08s)
- 08:12 phedenskog@deploy1002: Started deploy [performance/navtiming@4f8c010]: (no justification provided)
- 08:05 kartik@deploy1002: kartik and kartik: Backport for Content Translation: Move ttwiki out of Beta (T319177) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
- 08:03 kartik@deploy1002: Started scap: Backport for Content Translation: Move ttwiki out of Beta (T319177)
- 04:58 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.17 refs T325580 (duration: 55m 31s)
- 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.17 refs T325580
2023-01-02
- 10:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host otrs1001.eqiad.wmnet
- 10:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host otrs1001.eqiad.wmnet
Other archives
2000s
- Archive 1: 2004 Jun - 2004 Sep
- Archive 2: 2004 Oct - 2004 Nov
- Archive 3: 2004 Dec - 2005 Mar
- Archive 4: 2005 Apr - 2005 Jul
- Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
- Archive 6: 2005 Nov - 2006 Feb
- Archive 7: 2006 Mar - 2006 Jun
- Archive 8: 2006 Jul - 2006 Sep
- Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
- Archive 10: 2007 Feb - 2007 Jun
- Archive 11: 2007 Jul - 2007 Dec
- Archive 12: 2008 Jan - 2008 Jul
- Archive 12a: 2008 Aug
- Archive 12b: 2008 Sept
- Archive 13: 2008 Oct - 2009 Jun
- Archive 14: 2009 Jun - 2009 Dec
2010s
- Archive 15: 2010 Jan - 2010 Jun
- Archive 16: 2010 Jul - 2010 Oct
- Archive 17: 2010 Nov - 2010 Dec
- Archive 18: 2011 Jan - 2011 Jun
- Archive 19: 2011 Jul - 2011 Dec
- Archive 20: 2011 Dec - 2012 Jun, with revision history 2007-02-21 to 2012-03-27
- Archive 21: 2012 Jul - 2013 Jan
- Archive 22: 2013 Jan - 2013 Jul
- Archive 23: 2013 Aug - 2013 Dec
- Archive 24: 2014 Jan - 2014 Mar
- Archive 25: 2014 April - 2014 September
- Archive 26: 2014 October - 2014 December
- Archive 27: 2015 January - 2015 July
- Archive 28: 2015 August - 2015 December
- Archive 29: 2016 January - 2016 May
- Archive 30: 2016 June - 2016 August
- Archive 31: 2016 September - 2016 December
- Archive 32: 2017 January - 2017 July
- Archive 33: 2017 August - 2017 December
- Archive 34: 2018 January - 2018 April
- Archive 35: 2018 May - 2018 August
- Archive 36: 2018 September - 2018 December
- Archive 37: 2019 January - 2019 April
- Archive 38: 2019 May - 2019 August
- Archive 39: 2019 September - 2019 December
2020s
- Archive 40: 2020 January - 2020 April
- Archive 41: 2020 May - 2020 July
- Archive 42: 2020 August - 2020 November
- Archive 43: 2020 December
- Archive 44: 2021 January - 2021 April
- Archive 45: 2021 May - 2021 July
- Archive 46: 2021 August - 2021 October
- Archive 47: 2021 November - 2021 December
- Archive 48: 2022 January
- Archive 49: 2022 February
- Archive 50: 2022 March
- Archive 51: 2022 April 1-15
- Archive 52: 2022 April 16-30
- Archive 53: 2022 May
- Archive 54: 2022 June
- Archive 55: 2022 July
- Archive 56: 2022 August
- Archive 57: 2022 September
- Archive 58: 2022 October
- Archive 59: 2022 November 1-15
- Archive 60: 2022 November 16-30
- Archive 61: 2022 December
- Archive 62: 2023 January
- Archive 63: 2023 February
- Archive 64: 2023 March
- Archive 65: 2023 April
- Archive 66: 2023 May
- Archive 67: 2023 June
- Archive 68: 2023 July
- Archive 69: 2023 August 1-15
- Archive 70: 2023 August 16-31
- Archive 71: 2023 September
- Archive 72: 2023 October
- Archive 73: 2023 November
- Archive 74: 2023 December
- Archive 75: 2024 January
- Archive 76: 2024 February
- Archive 77: 2024 March
- Archive 78: 2024 April
- Archive 79: 2024 May 1-15
- Archive 80: 2024 May 16-31
- Archive 81: 2024 June 1-15
- Archive 82: 2024 June 16-30
- Archive 83: 2024 July
- Archive 84: 2024 August
- Archive 85: 2024 September
- Archive 86: 2024 October
- Archive 87: 2024 November