Server Admin Log/Archive 62

From Wikitech

2023-01-31

  • 23:51 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS bullseye
  • 23:45 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3055.esams.wmnet with OS bullseye
  • 23:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
  • 23:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3055.esams.wmnet with OS bullseye
  • 23:34 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3054.esams.wmnet with reason: host reimage
  • 23:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3054.esams.wmnet with OS bullseye
  • 22:54 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2040.codfw.wmnet
  • 22:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2040.codfw.wmnet with OS bullseye
  • 22:35 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), Stop writing to cuc_comment in testwiki (T233004) (duration: 07m 34s)
  • 22:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
  • 22:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
  • 22:30 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), Stop writing to cuc_comment in testwiki (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 22:28 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text in group0 wikis (T233004), Stop writing to cuc_comment in testwiki (T233004)
  • 22:26 zabe@deploy1002: Finished scap: Backport for Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097) (duration: 08m 43s)
  • 22:19 zabe@deploy1002: zabe and bawolff: Backport for Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:17 zabe@deploy1002: Started scap: Backport for Restrict flow-edit-title to autoconfirmed on mediawikiwiki (T328097)
  • 22:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2040.codfw.wmnet with OS bullseye
  • 22:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet
  • 22:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2038.codfw.wmnet with OS bullseye
  • 22:07 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=ats-be
  • 22:07 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet,service=cdn
  • 22:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5020.eqsin.wmnet with OS bullseye
  • 21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
  • 21:44 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cassandra-dev2002.codfw.wmnet
  • 21:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
  • 21:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host cassandra-dev2002.codfw.wmnet
  • 21:36 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
  • 21:35 kindrobot: close UTC late backport window. Did not deploy bawolff 884142 as I ran out of time. zabe may reopen the window in around 30 minutes to finish it out
  • 21:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
  • 21:33 kindrobot@deploy1002: Finished scap: Backport for Enable ClientPreferences for group0 (T327979) (duration: 10m 17s)
  • 21:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
  • 21:29 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2002.codfw.wmnet: Trying to induce errors - eevans@cumin1001
  • 21:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2038.codfw.wmnet with OS bullseye
  • 21:25 kindrobot@deploy1002: kindrobot and nray: Backport for Enable ClientPreferences for group0 (T327979) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2039.codfw.wmnet
  • 21:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2036.codfw.wmnet
  • 21:23 kindrobot@deploy1002: Started scap: Backport for Enable ClientPreferences for group0 (T327979)
  • 21:17 kindrobot@deploy1002: Finished scap: Backport for Enable Linter write namespace, tag and template for group0 and group1 (T299612) (duration: 13m 20s)
  • 21:06 kindrobot@deploy1002: sbailey and kindrobot: Backport for Enable Linter write namespace, tag and template for group0 and group1 (T299612) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:04 kindrobot@deploy1002: Started scap: Backport for Enable Linter write namespace, tag and template for group0 and group1 (T299612)
  • 21:04 jgleeson: smashpig updated from d1434aeb to 683df497
  • 21:03 kindrobot: start UTC late backport window
  • 20:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
  • 20:57 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5020.eqsin.wmnet with OS bullseye
  • 20:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2036.codfw.wmnet with OS bullseye
  • 20:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS bullseye
  • 20:45 zabe: start running "foreachwikiindblist s5.dblist migrateRevisionCommentTemp.php --sleep 2" in screen # T275246
  • 20:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
  • 20:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
  • 20:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
  • 20:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
  • 20:11 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2036.codfw.wmnet with OS bullseye
  • 20:09 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5029.eqsin.wmnet,service=ats-be
  • 20:09 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5029.eqsin.wmnet,service=cdn
  • 20:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS bullseye
  • 20:05 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5029.eqsin.wmnet with OS bullseye
  • 20:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2037.codfw.wmnet with OS bullseye
  • 20:00 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
  • 19:59 sukhe: sudo rm /etc/dhcp/automation/ttyS1-115200/cp5020.conf
  • 19:58 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5020.eqsin.wmnet with OS bullseye
  • 19:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
  • 19:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
  • 19:40 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
  • 19:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
  • 19:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
  • 19:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2037.codfw.wmnet with OS bullseye
  • 19:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
  • 19:16 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
  • 19:12 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.21 refs T325584
  • 18:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-be
  • 18:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=cdn
  • 18:44 mutante: gitlab-prod-1001.devtools (cloud) - rebooted VM ; ip addr del 172.16.7.146/32 dev eth0 - T318521
  • 18:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
  • 18:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
  • 18:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2034.codfw.wmnet with OS bullseye
  • 18:26 mutante: gitlab-prod-1001.devtools (cloud) - ip addr del 172.16.7.146/21 dev eth0 - T318521
  • 18:25 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:25 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:24 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1075']
  • 18:24 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1075']
  • 18:22 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1075.eqiad.wmnet']
  • 18:22 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1075.eqiad.wmnet']
  • 18:21 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1075.eqiad.wmnet
  • 18:21 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1075.eqiad.wmnet
  • 18:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
  • 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=cdn
  • 18:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
  • 18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
  • 18:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
  • 18:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
  • 17:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp5029.eqsin.wmnet with OS bullseye
  • 17:53 sukhe: depool cp1075.eqiad.wmnet for iDRAC firmware testing: T321309
  • 17:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
  • 17:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=cdn
  • 17:50 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2034.codfw.wmnet with OS bullseye
  • 17:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp5019.eqsin.wmnet
  • 17:47 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp5019.eqsin.wmnet
  • 17:39 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1090.eqiad.wmnet
  • 17:38 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1090.eqiad.wmnet
  • 17:38 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1076.eqiad.wmnet
  • 17:37 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1076.eqiad.wmnet
  • 17:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet,service=ats-be
  • 17:35 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet,service=cdn
  • 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5018.eqsin.wmnet with OS bullseye
  • 17:33 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet
  • 17:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5028.eqsin.wmnet,service=ats-be
  • 17:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5028.eqsin.wmnet,service=cdn
  • 17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5028.eqsin.wmnet with OS bullseye
  • 17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
  • 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp5029.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
  • 17:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=ats-be
  • 17:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=cdn
  • 17:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5029.eqsin.wmnet,service=ats-be
  • 17:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5029.eqsin.wmnet,service=cdn
  • 17:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2032.codfw.wmnet with OS bullseye
  • 17:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp5019.eqsin.wmnet
  • 17:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
  • 17:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
  • 17:03 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp5019.eqsin.wmnet
  • 16:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
  • 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
  • 16:54 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
  • 16:54 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
  • 16:52 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 11s)
  • 16:52 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
  • 16:52 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:52 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 10s)
  • 16:52 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:52 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
  • 16:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS bullseye
  • 16:49 mutante: mw2271 - renabling disabled puppet
  • 16:49 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2032.codfw.wmnet with OS bullseye
  • 16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:46 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:45 cwhite@cumin2002: conftool action : set/weight=10; selector: name=logstash2032.codfw.wmnet
  • 16:44 cwhite@cumin2002: conftool action : set/weight=10; selector: name=logstash1032.eqiad.wmnet
  • 16:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:40 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:38 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:37 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:37 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
  • 16:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 5:00:00 on cp5019.eqsin.wmnet with reason: testing reimaging cookbook stalling failure
  • 16:29 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Grants:Programs/Wikimedia Community Fund" "Grants:Programs/Wikimedia Community Fund/General Support Fund" "Zabe" --reason "per request T328456" --skip-subpages # T328456
  • 16:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet,service=ats-be
  • 16:29 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet,service=cdn
  • 16:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS bullseye
  • 16:20 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:19 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
  • 16:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS bullseye
  • 16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5028.eqsin.wmnet with OS bullseye
  • 16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5018.eqsin.wmnet with OS bullseye
  • 16:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bullseye
  • 16:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
  • 16:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
  • 16:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS bullseye
  • 16:01 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
  • 16:00 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: sync
  • 15:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS bullseye
  • 15:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
  • 15:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
  • 15:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=cdn
  • 15:54 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS bullseye
  • 15:49 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagemaster1001.eqiad.wmnet with OS bullseye
  • 15:40 ladsgroup@deploy1002: Finished scap: Backport for Set 'groupLoadsBySection' for s11 (T326980) (duration: 09m 49s)
  • 15:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
  • 15:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
  • 15:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
  • 15:32 ladsgroup@deploy1002: ladsgroup and zabe: Backport for Set 'groupLoadsBySection' for s11 (T326980) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 15:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1001.eqiad.wmnet with reason: host reimage
  • 15:30 ladsgroup@deploy1002: Started scap: Backport for Set 'groupLoadsBySection' for s11 (T326980)
  • 15:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2035.codfw.wmnet with OS bullseye
  • 15:23 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bullseye
  • 15:20 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagemaster1001.eqiad.wmnet with OS bullseye
  • 15:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
  • 15:01 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1005.eqiad.wmnet with OS bullseye
  • 15:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
  • 14:56 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1004.eqiad.wmnet with OS bullseye
  • 14:56 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubestagetcd1006.eqiad.wmnet with OS bullseye
  • 14:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1005.eqiad.wmnet with reason: host reimage
  • 14:46 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:46 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:46 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1004.eqiad.wmnet with reason: host reimage
  • 14:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:43 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd1006.eqiad.wmnet with reason: host reimage
  • 14:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2035.codfw.wmnet with OS bullseye
  • 14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1005.eqiad.wmnet with reason: host reimage
  • 14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1004.eqiad.wmnet with reason: host reimage
  • 14:41 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd1006.eqiad.wmnet with reason: host reimage
  • 14:34 urbanecm@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) (duration: 07m 23s)
  • 14:33 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1006.eqiad.wmnet with OS bullseye
  • 14:33 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1005.eqiad.wmnet with OS bullseye
  • 14:32 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd1004.eqiad.wmnet with OS bullseye
  • 14:28 urbanecm@deploy1002: dreamyjazz and urbanecm and daniel: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwde
  • 14:26 urbanecm@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534)
  • 14:20 urbanecm@deploy1002: Finished scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) (duration: 16m 33s)
  • 14:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:07 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:06 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:05 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:05 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:05 urbanecm@deploy1002: urbanecm and dreamyjazz and daniel: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwde
  • 14:05 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
  • 14:04 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-eqiad with k8s 1.23
  • 14:03 urbanecm@deploy1002: Started scap: Backport for Disable write old for CheckUserLog reason field for testwiki (T233004), Remove redundant definition of wgCheckUserEnableSpecialInvestigate, Bump parsoid parser cache writes to 25%. (T320534)
  • 14:01 urbanecm@deploy1002: Backport cancelled.
  • 12:36 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 35s)
  • 12:36 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
  • 11:51 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@42a07d3] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 35s)
  • 11:50 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@42a07d3] (eqiad): Disable traffic mirroring from codfw to eqiad
  • 11:21 moritzm: installing bind9 security updates (client-side tools/libs only)
  • 10:57 jayme@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=k8s-ingress-staging
  • 10:57 jayme@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=k8s-ingress-staging
  • 10:18 jayme: switching active kubernetes staging cluster from eqiad to codfw - T327664
  • 09:20 marostegui: dbmaint Install MariaDB 10.6 on db2093 (db_inventory) T328408
  • 09:05 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:00 zabe@deploy1002: Finished scap: Backport for Stop writing to cuc_user and cuc_user_text in testwiki (T233004) (duration: 08m 11s)
  • 09:00 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:54 elukey: roll restart kafka on kafka-logging1001 to pick up new pki certs
  • 08:53 zabe@deploy1002: zabe: Backport for Stop writing to cuc_user and cuc_user_text in testwiki (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 08:51 zabe@deploy1002: Started scap: Backport for Stop writing to cuc_user and cuc_user_text in testwiki (T233004)
  • 08:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:45 elukey: restore previously removed password for keystore to kafka-logging clusters
  • 08:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:36 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 07:56 moritzm: installing bash bugfix updates from Bullseye point release
  • 07:22 marostegui: dbmaint Schema change on s3 eqiad T328373
  • 07:22 marostegui: dbmaint Schema change on s1 eqiad T328373
  • 07:10 marostegui: Failover m2 from db1164 to db1195 - T328253
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
  • 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2133,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Primary switchover m2 T328253
  • 07:03 marostegui: dbmaint Schema change on s5 eqiad T328373
  • 06:59 marostegui: dbmaint Schema change on s7 eqiad T328373
  • 06:57 marostegui: dbmaint Schema change on s4 eqiad T328373
  • 06:52 marostegui: dbmaint Schema change on s8 eqiad T328373
  • 05:02 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.19 (duration: 02m 15s)
  • 05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 05:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.21 refs T325584 (duration: 52m 56s)
  • 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.21 refs T325584
  • 02:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet,service=ats-be
  • 02:43 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3053.esams.wmnet,service=cdn
  • 02:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3053.esams.wmnet with OS bullseye
  • 02:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
  • 01:59 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3053.esams.wmnet with reason: host reimage
  • 01:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
  • 01:33 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3053.esams.wmnet']
  • 01:31 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3053.esams.wmnet']
  • 00:50 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet
  • 00:42 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5027.eqsin.wmnet with OS bullseye
  • 00:14 mutante: etherpad - maintenance downtime for about 5 minutes to test monitoring
  • 00:09 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
  • 00:06 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage

2023-01-30

  • 23:30 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS bullseye
  • 23:29 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3053.esams.wmnet with OS bullseye
  • 23:07 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
  • 22:58 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
  • 22:50 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3053.esams.wmnet with OS bullseye
  • 22:38 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp3053.esams.wmnet with OS bullseye
  • 22:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-be
  • 22:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=cdn
  • 22:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2030.codfw.wmnet with OS bullseye
  • 21:56 urbanecm@deploy1002: Finished scap: Backport for Try to determine what's adding to Parsoid init times (T328201), Update interwiki cache (duration: 12m 24s)
  • 21:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
  • 21:46 urbanecm@deploy1002: arlolra and urbanecm: Backport for Try to determine what's adding to Parsoid init times (T328201), Update interwiki cache synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:44 urbanecm@deploy1002: Started scap: Backport for Try to determine what's adding to Parsoid init times (T328201), Update interwiki cache
  • 21:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
  • 21:42 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Update campaign configuration (T321370) (duration: 08m 47s)
  • 21:35 urbanecm@deploy1002: tgr and urbanecm: Backport for GrowthExperiments: Update campaign configuration (T321370) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:34 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2020.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
  • 21:34 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Update campaign configuration (T321370)
  • 21:33 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
  • 21:31 urbanecm@deploy1002: Finished scap: Backport for Enable WelcomeSurvey at viwiki (T325376), Fix grid blowout with limited width turned off (T327423), Support new style of table of contents (T327942) (duration: 09m 52s)
  • 21:26 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bullseye
  • 21:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2030.codfw.wmnet with OS bullseye
  • 21:24 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2020.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
  • 21:23 urbanecm@deploy1002: tgr and urbanecm and jdlrobson and legoktm: Backport for Enable WelcomeSurvey at viwiki (T325376), Fix grid blowout with limited width turned off (T327423), Support new style of table of contents (T327942) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:21 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2019.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
  • 21:21 urbanecm@deploy1002: Started scap: Backport for Enable WelcomeSurvey at viwiki (T325376), Fix grid blowout with limited width turned off (T327423), Support new style of table of contents (T327942)
  • 21:21 urbanecm@deploy1002: Finished scap: Backport for InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387) (duration: 19m 51s)
  • 21:11 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2019.codfw.wmnet: Replace Cassandra keys & certs - eevans@cumin1001
  • 21:03 urbanecm@deploy1002: urbanecm and musikanimal: Backport for InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-be
  • 21:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=cdn
  • 21:01 urbanecm@deploy1002: Started scap: Backport for InitialiseSettings: add zhwiki to wgPageAssessmentsSubprojects (T326387)
  • 20:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 20:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 20:51 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:50 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 20:35 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
  • 20:35 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bullseye
  • 20:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2033.codfw.wmnet with OS bullseye
  • 20:23 zabe@deploy1002: Finished scap: Backport for slwiki: Raise AF emergency disable treshold+count (T328366) (duration: 07m 32s)
  • 20:17 zabe@deploy1002: zabe: Backport for slwiki: Raise AF emergency disable treshold+count (T328366) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:16 zabe@deploy1002: Started scap: Backport for slwiki: Raise AF emergency disable treshold+count (T328366)
  • 20:15 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
  • 20:14 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
  • 20:12 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS bullseye
  • 20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
  • 20:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
  • 19:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet,service=ats-be
  • 19:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3052.esams.wmnet,service=cdn
  • 19:50 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 19:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3052.esams.wmnet with OS bullseye
  • 19:47 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 19:44 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2033.codfw.wmnet with OS bullseye
  • 19:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:35 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 19:26 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
  • 19:26 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4044.ulsfo.wmnet with OS bullseye
  • 19:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
  • 19:22 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3052.esams.wmnet with reason: host reimage
  • 19:21 cstone: payments-wiki upgraded from 653c7cc8 to f20a2208
  • 19:16 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
  • 19:15 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4051.ulsfo.wmnet
  • 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
  • 18:46 sukhe@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp3052.esams.wmnet']
  • 18:46 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
  • 18:46 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3052.esams.wmnet']
  • 18:45 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
  • 18:45 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3052.esams.wmnet with OS bullseye
  • 18:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
  • 18:37 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3052.esams.wmnet with OS bullseye
  • 18:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3052.esams.wmnet']
  • 18:37 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3052.esams.wmnet']
  • 18:34 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4051.ulsfo.wmnet with OS bullseye
  • 18:29 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 18:29 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 18:19 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
  • 18:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3052.esams.wmnet with OS bullseye
  • 18:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3052.esams.wmnet with OS bullseye
  • 18:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3052.esams.wmnet
  • 18:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
  • 18:04 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
  • 18:01 urbanecm@deploy1002: Finished scap: Backport for [Growth] Remove wgGERecentChangesUnstarredMenteesFilterEnabled (duration: 07m 59s)
  • 17:53 urbanecm@deploy1002: Started scap: Backport for [Growth] Remove wgGERecentChangesUnstarredMenteesFilterEnabled
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43517 and previous config saved to /var/cache/conftool/dbconfig/20230130-174957-ladsgroup.json
  • 17:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3052.esams.wmnet
  • 17:43 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS bullseye
  • 17:43 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4051.ulsfo.wmnet with OS bullseye
  • 17:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=ats-be
  • 17:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5026.eqsin.wmnet,service=cdn
  • 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43516 and previous config saved to /var/cache/conftool/dbconfig/20230130-173450-ladsgroup.json
  • 17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=ats-be
  • 17:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=cdn
  • 17:31 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS bullseye
  • 17:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet
  • 17:27 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS bullseye
  • 17:24 inflatador: bking@build2001 rebuilding docker images for 884351 complete
  • 17:22 inflatador: bking@build2001 rebuilding docker images for 884351
  • 17:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5026.eqsin.wmnet with OS bullseye
  • 17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43515 and previous config saved to /var/cache/conftool/dbconfig/20230130-171944-ladsgroup.json
  • 17:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3050.esams.wmnet with OS bullseye
  • 17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43514 and previous config saved to /var/cache/conftool/dbconfig/20230130-170437-ladsgroup.json
  • 16:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 16:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43513 and previous config saved to /var/cache/conftool/dbconfig/20230130-165359-ladsgroup.json
  • 16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43512 and previous config saved to /var/cache/conftool/dbconfig/20230130-165348-ladsgroup.json
  • 16:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
  • 16:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
  • 16:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
  • 16:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3050.esams.wmnet with reason: host reimage
  • 16:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43511 and previous config saved to /var/cache/conftool/dbconfig/20230130-163842-ladsgroup.json
  • 16:35 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
  • 16:35 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS bullseye
  • 16:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
  • 16:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
  • 16:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43510 and previous config saved to /var/cache/conftool/dbconfig/20230130-162336-ladsgroup.json
  • 16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=ats-be
  • 16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=cdn
  • 16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-be
  • 16:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=cdn
  • 16:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3050.esams.wmnet with OS bullseye
  • 16:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
  • 16:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2029.codfw.wmnet with OS bullseye
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43509 and previous config saved to /var/cache/conftool/dbconfig/20230130-161324-root.json
  • 16:11 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
  • 16:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
  • 16:10 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
  • 16:10 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3050.esams.wmnet,service=ats-be
  • 16:10 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3050.esams.wmnet,service=cdn
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43508 and previous config saved to /var/cache/conftool/dbconfig/20230130-160829-ladsgroup.json
  • 16:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS bullseye
  • 16:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5026.eqsin.wmnet with OS bullseye
  • 16:03 sukhe: racreset cp3050.esams.wmnet: firmware cookbook iDRAC upgrade test
  • 16:03 moritzm: upgrading idp-test to latest Java security update
  • 15:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp3050.esams.wmnet
  • 15:59 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3050.esams.wmnet
  • 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43507 and previous config saved to /var/cache/conftool/dbconfig/20230130-155819-root.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43506 and previous config saved to /var/cache/conftool/dbconfig/20230130-155802-ladsgroup.json
  • 15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43505 and previous config saved to /var/cache/conftool/dbconfig/20230130-155747-ladsgroup.json
  • 15:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS bullseye
  • 15:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
  • 15:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43504 and previous config saved to /var/cache/conftool/dbconfig/20230130-154314-root.json
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43503 and previous config saved to /var/cache/conftool/dbconfig/20230130-154241-ladsgroup.json
  • 15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3051.esams.wmnet with OS bullseye
  • 15:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2029.codfw.wmnet with OS bullseye
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43502 and previous config saved to /var/cache/conftool/dbconfig/20230130-152809-root.json
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43501 and previous config saved to /var/cache/conftool/dbconfig/20230130-152734-ladsgroup.json
  • 15:14 marostegui: Retrospective: Starting s4 codfw failover from db2110 to db2140 - T328022
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43500 and previous config saved to /var/cache/conftool/dbconfig/20230130-151304-root.json
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43499 and previous config saved to /var/cache/conftool/dbconfig/20230130-151228-ladsgroup.json
  • 15:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
  • 15:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3051.esams.wmnet with reason: host reimage
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43498 and previous config saved to /var/cache/conftool/dbconfig/20230130-150132-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43497 and previous config saved to /var/cache/conftool/dbconfig/20230130-145759-root.json
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 T328022', diff saved to https://phabricator.wikimedia.org/P43496 and previous config saved to /var/cache/conftool/dbconfig/20230130-145508-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary T328022', diff saved to https://phabricator.wikimedia.org/P43495 and previous config saved to /var/cache/conftool/dbconfig/20230130-145421-root.json
  • 14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43494 and previous config saved to /var/cache/conftool/dbconfig/20230130-145229-ladsgroup.json
  • 14:47 moritzm: updating puppetdb 7 hosts to 7.12.1 T321783
  • 14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable Linter write namespace, tag and template from core, group0 (T299612) (duration: 11m 11s)
  • 14:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3051.esams.wmnet with OS bullseye
  • 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43493 and previous config saved to /var/cache/conftool/dbconfig/20230130-144213-ladsgroup.json
  • 14:38 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43492 and previous config saved to /var/cache/conftool/dbconfig/20230130-143723-ladsgroup.json
  • 14:36 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and sbailey: Backport for Enable Linter write namespace, tag and template from core, group0 (T299612) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable Linter write namespace, tag and template from core, group0 (T299612)
  • 14:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Remove references to mediawiki.Uri" (T328143), Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143) (duration: 12m 07s)
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43491 and previous config saved to /var/cache/conftool/dbconfig/20230130-142708-ladsgroup.json
  • 14:22 lucaswerkmeister-wmde@deploy1002: matmarex and lucaswerkmeister-wmde: Backport for Revert "Remove references to mediawiki.Uri" (T328143), Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43490 and previous config saved to /var/cache/conftool/dbconfig/20230130-142216-ladsgroup.json
  • 14:21 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Remove references to mediawiki.Uri" (T328143), Revert "Rewrite mw.libs.ve.getTargetDataFromHref with URL API" (T328143)
  • 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
  • 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 T328022', diff saved to https://phabricator.wikimedia.org/P43489 and previous config saved to /var/cache/conftool/dbconfig/20230130-141822-root.json
  • 14:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
  • 14:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43488 and previous config saved to /var/cache/conftool/dbconfig/20230130-141203-ladsgroup.json
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43487 and previous config saved to /var/cache/conftool/dbconfig/20230130-140710-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43486 and previous config saved to /var/cache/conftool/dbconfig/20230130-135659-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43485 and previous config saved to /var/cache/conftool/dbconfig/20230130-135632-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43484 and previous config saved to /var/cache/conftool/dbconfig/20230130-134406-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:31 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 23s)
  • 13:29 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
  • 13:29 godog: bounce logstash on logstash1025 -- GC unhappy causing kafka lag
  • 13:29 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 01m 13s)
  • 13:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43483 and previous config saved to /var/cache/conftool/dbconfig/20230130-132701-ladsgroup.json
  • 13:23 awight@deploy1002: Finished scap: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113) (duration: 08m 34s)
  • 13:21 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad (duration: 00m 11s)
  • 13:21 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (eqiad): Disable traffic mirroring from codfw to eqiad
  • 13:21 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 00m 22s)
  • 13:20 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
  • 13:16 awight@deploy1002: awight: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:14 awight@deploy1002: Started scap: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113)
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43482 and previous config saved to /var/cache/conftool/dbconfig/20230130-131155-ladsgroup.json
  • 13:00 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:59 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3004.wikimedia.org
  • 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:57 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P43481 and previous config saved to /var/cache/conftool/dbconfig/20230130-125648-ladsgroup.json
  • 12:56 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:55 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:55 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:55 awight@deploy1002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.40.0-wmf.20" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.2oaGSEpQR1"' returned non-zero exit status 255. (duration: 00m 00s)
  • 12:55 awight@deploy1002: Started scap: Backport for Revert "Enable kartographer external data parse time fetch for all wikis" (T323113)
  • 12:46 awight@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f]: Roll back kartotherian (duration: 01m 27s)
  • 12:45 awight@deploy1002: Started deploy [kartotherian/deploy@5c58f8f]: Roll back kartotherian
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43479 and previous config saved to /var/cache/conftool/dbconfig/20230130-124142-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328255)', diff saved to https://phabricator.wikimedia.org/P43478 and previous config saved to /var/cache/conftool/dbconfig/20230130-123004-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43477 and previous config saved to /var/cache/conftool/dbconfig/20230130-122943-ladsgroup.json
  • 12:25 awight@deploy1002: Finished deploy [kartotherian/deploy@42a07d3]: Disable traffic mirroring from codfw to eqiad (duration: 02m 44s)
  • 12:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:23 awight@deploy1002: Started deploy [kartotherian/deploy@42a07d3]: Disable traffic mirroring from codfw to eqiad
  • 12:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43476 and previous config saved to /var/cache/conftool/dbconfig/20230130-121437-ladsgroup.json
  • 12:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3004.wikimedia.org
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P43475 and previous config saved to /var/cache/conftool/dbconfig/20230130-115930-ladsgroup.json
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast6001.wikimedia.org
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast6001.wikimedia.org
  • 11:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42473
  • 11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 42473
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43474 and previous config saved to /var/cache/conftool/dbconfig/20230130-114424-ladsgroup.json
  • 11:42 moritzm: installing install4002 T327867
  • 11:42 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1005.eqiad.wmnet
  • 11:41 Amir1: dropping old wikiadmin user (T326802)
  • 11:35 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1005.eqiad.wmnet
  • 11:35 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1004.eqiad.wmnet
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328255)', diff saved to https://phabricator.wikimedia.org/P43473 and previous config saved to /var/cache/conftool/dbconfig/20230130-113319-ladsgroup.json
  • 11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 11:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43472 and previous config saved to /var/cache/conftool/dbconfig/20230130-113254-ladsgroup.json
  • 11:28 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1004.eqiad.wmnet
  • 11:24 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
  • 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install4002.wikimedia.org
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43471 and previous config saved to /var/cache/conftool/dbconfig/20230130-111748-ladsgroup.json
  • 11:17 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
  • 11:11 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
  • 11:09 phedenskog@deploy1002: Finished deploy [performance/navtiming@4e5ff3f]: (no justification provided) (duration: 00m 05s)
  • 11:09 phedenskog@deploy1002: Started deploy [performance/navtiming@4e5ff3f]: (no justification provided)
  • 11:05 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install4002.wikimedia.org on all recursors
  • 11:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install4002.wikimedia.org on all recursors
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4002.wikimedia.org - jmm@cumin2002"
  • 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4002.wikimedia.org - jmm@cumin2002"
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P43470 and previous config saved to /var/cache/conftool/dbconfig/20230130-110241-ladsgroup.json
  • 11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
  • 10:49 ladsgroup@deploy1002: Finished scap: Backport for Enable write both for externallinks except s4, s7, s8 (T321662) (duration: 13m 10s)
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43468 and previous config saved to /var/cache/conftool/dbconfig/20230130-104735-ladsgroup.json
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast4003.wikimedia.org
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:37 ladsgroup@deploy1002: ladsgroup: Backport for Enable write both for externallinks except s4, s7, s8 (T321662) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:36 ladsgroup@deploy1002: Started scap: Backport for Enable write both for externallinks except s4, s7, s8 (T321662)
  • 10:36 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328255)', diff saved to https://phabricator.wikimedia.org/P43467 and previous config saved to /var/cache/conftool/dbconfig/20230130-103540-ladsgroup.json
  • 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 10:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4003.wikimedia.org
  • 10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43466 and previous config saved to /var/cache/conftool/dbconfig/20230130-102500-ladsgroup.json
  • 10:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593
  • 10:17 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts bast4003.wikimedia.org
  • 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4003.wikimedia.org
  • 10:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 14593
  • 10:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 49544
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43465 and previous config saved to /var/cache/conftool/dbconfig/20230130-100954-ladsgroup.json
  • 10:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 49544
  • 10:00 awight@deploy1002: Finished scap: Backport for Enable kartographer external data parse time fetch for all wikis (T326317) (duration: 07m 53s)
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P43464 and previous config saved to /var/cache/conftool/dbconfig/20230130-095447-ladsgroup.json
  • 09:54 awight@deploy1002: lilients and awight: Backport for Enable kartographer external data parse time fetch for all wikis (T326317) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:52 awight@deploy1002: Started scap: Backport for Enable kartographer external data parse time fetch for all wikis (T326317)
  • 09:52 XioNoX: push pfw policies - T328085
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43463 and previous config saved to /var/cache/conftool/dbconfig/20230130-093941-ladsgroup.json
  • 09:29 jynus: disabling puppet on dbprov2004 to reorganize partitions T327155
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328255)', diff saved to https://phabricator.wikimedia.org/P43462 and previous config saved to /var/cache/conftool/dbconfig/20230130-092804-ladsgroup.json
  • 09:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 09:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43461 and previous config saved to /var/cache/conftool/dbconfig/20230130-092732-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43460 and previous config saved to /var/cache/conftool/dbconfig/20230130-091225-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P43459 and previous config saved to /var/cache/conftool/dbconfig/20230130-085719-ladsgroup.json
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43458 and previous config saved to /var/cache/conftool/dbconfig/20230130-085530-ladsgroup.json
  • 08:48 moritzm: installing install1004 T327867
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43457 and previous config saved to /var/cache/conftool/dbconfig/20230130-084213-ladsgroup.json
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43456 and previous config saved to /var/cache/conftool/dbconfig/20230130-084024-ladsgroup.json
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T328255)', diff saved to https://phabricator.wikimedia.org/P43455 and previous config saved to /var/cache/conftool/dbconfig/20230130-083034-ladsgroup.json
  • 08:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 08:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P43454 and previous config saved to /var/cache/conftool/dbconfig/20230130-082517-ladsgroup.json
  • 08:19 zabe:: Deployed security patch for T278365
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43452 and previous config saved to /var/cache/conftool/dbconfig/20230130-081011-ladsgroup.json
  • 07:54 phedenskog@deploy1002: Finished deploy [performance/navtiming@bfbd6d7]: (no justification provided) (duration: 00m 05s)
  • 07:54 phedenskog@deploy1002: Started deploy [performance/navtiming@bfbd6d7]: (no justification provided)
  • 07:50 moritzm: installing install2004 T327867
  • 07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43451 and previous config saved to /var/cache/conftool/dbconfig/20230130-074502-ladsgroup.json
  • 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T318605)', diff saved to https://phabricator.wikimedia.org/P43450 and previous config saved to /var/cache/conftool/dbconfig/20230130-073827-ladsgroup.json
  • 07:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 07:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43449 and previous config saved to /var/cache/conftool/dbconfig/20230130-073806-ladsgroup.json
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43448 and previous config saved to /var/cache/conftool/dbconfig/20230130-072956-ladsgroup.json
  • 07:26 marostegui: dbmaint Schema change on s7 eqiad T328236
  • 07:25 marostegui: dbmaint Schema change on s2 eqiad T328236
  • 07:25 marostegui: dbmaint Schema change on s1 eqiad T328236
  • 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43447 and previous config saved to /var/cache/conftool/dbconfig/20230130-072300-ladsgroup.json
  • 07:21 marostegui: dbmaint Schema change on s1 eqiad T328236
  • 07:17 marostegui: dbmaint Schema change on s4 eqiad T328236
  • 07:16 marostegui: dbmaint Schema change on s6 eqiad T328236
  • 07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43446 and previous config saved to /var/cache/conftool/dbconfig/20230130-071450-ladsgroup.json
  • 07:11 marostegui: dbmaint Schema change on s5 eqiad T328236
  • 07:10 marostegui: dbmaint Schema change on s8 eqiad T328236
  • 07:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P43445 and previous config saved to /var/cache/conftool/dbconfig/20230130-070753-ladsgroup.json
  • 07:05 marostegui: dbmaint Schema change on s3 eqiad T328086
  • 07:02 marostegui: dbmaint Schema change on s1 eqiad T328086
  • 07:01 marostegui: dbmaint Schema change on s4 eqiad T328086
  • 06:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43444 and previous config saved to /var/cache/conftool/dbconfig/20230130-065943-ladsgroup.json
  • 06:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43443 and previous config saved to /var/cache/conftool/dbconfig/20230130-065247-ladsgroup.json
  • 06:51 marostegui: dbmaint Schema change on s5 eqiad T328086
  • 06:45 marostegui: dbmaint Schema change on s2 eqiad T328086
  • 06:43 marostegui: dbmaint Schema change on s7 eqiad T328086
  • 06:41 marostegui: dbmaint Schema change on s8 eqiad T328086
  • 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
  • 06:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T328022
  • 06:34 marostegui: dbmaint Schema change on s6 eqiad T328086
  • 06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T318605)', diff saved to https://phabricator.wikimedia.org/P43441 and previous config saved to /var/cache/conftool/dbconfig/20230130-061534-ladsgroup.json
  • 06:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 06:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T318605)', diff saved to https://phabricator.wikimedia.org/P43440 and previous config saved to /var/cache/conftool/dbconfig/20230130-061401-ladsgroup.json
  • 06:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 06:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T318605)', diff saved to https://phabricator.wikimedia.org/P43439 and previous config saved to /var/cache/conftool/dbconfig/20230130-053033-ladsgroup.json
  • 05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 05:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance

2023-01-29

  • 14:46 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1002.eqiad.wmnet
  • 14:40 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1002.eqiad.wmnet
  • 14:39 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1008.eqiad.wmnet
  • 14:33 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1008.eqiad.wmnet

2023-01-28

  • 00:36 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet
  • 00:35 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet
  • 00:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS bullseye

2023-01-27

  • 23:55 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
  • 23:52 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
  • 23:31 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
  • 23:31 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4050.ulsfo.wmnet with OS bullseye
  • 23:22 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS bullseye
  • 23:21 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet
  • 22:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS bullseye
  • 22:24 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 22:20 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 22:11 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.2-1+deb11u1_amd64.changes # T328162
  • 22:11 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.2-1_amd64.changes # T328162
  • 22:00 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
  • 21:59 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4042.ulsfo.wmnet with OS bullseye
  • 21:51 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
  • 21:49 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
  • 20:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4049.ulsfo.wmnet with OS bullseye
  • 20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
  • 20:26 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
  • 20:05 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
  • 20:02 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4049.ulsfo.wmnet with OS bullseye
  • 19:38 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
  • 19:38 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4049.ulsfo.wmnet with OS bullseye
  • 19:32 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS bullseye
  • 19:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
  • 19:31 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp404.ulsfo.wmnet
  • 19:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS bullseye
  • 19:02 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
  • 18:57 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
  • 18:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
  • 18:37 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4041.ulsfo.wmnet with OS bullseye
  • 18:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
  • 18:24 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4048.ulsfo.wmnet
  • 18:14 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS bullseye
  • 17:52 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
  • 17:49 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
  • 17:38 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@907fe2a]: (no justification provided) (duration: 00m 14s)
  • 17:38 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@907fe2a]: (no justification provided)
  • 17:28 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS bullseye
  • 17:28 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4048.ulsfo.wmnet with OS bullseye
  • 17:15 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS bullseye
  • 15:50 dancy@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 04s)
  • 15:50 dancy@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
  • 15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet,service=ats-be
  • 15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet,service=cdn
  • 15:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS bullseye
  • 15:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet,service=ats-be
  • 15:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet,service=cdn
  • 15:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2027.codfw.wmnet with OS bullseye
  • 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 15:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
  • 14:58 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
  • 14:55 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 14:55 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 14:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
  • 14:46 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 14:45 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 14:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
  • 14:41 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 14:40 moritzm: installing install3002 T327867
  • 14:39 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 14:34 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 14:34 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 14:27 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts clouddb2001-dev.codfw.wmnet
  • 14:27 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:26 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:22 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts clouddb2001-dev.codfw.wmnet
  • 14:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts clouddb2001-dev.codfw.wmnet
  • 14:20 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:20 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 14:17 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1001"
  • 14:13 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:10 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts clouddb2001-dev.codfw.wmnet
  • 13:46 moritzm: installing install5002 T327867
  • 13:08 moritzm: installing install6002 T327867
  • 12:47 hashar: gerrit1001 running Puppet to deploy https://gerrit.wikimedia.org/r/883965 and restarting Apache 2 to change the `Listen` statements # T326125
  • 12:42 hashar: Rebooting gerrit2002
  • 12:38 hashar: Stopped Puppet on gerrit1001 to prevent auto deployment of https://gerrit.wikimedia.org/r/883965
  • 12:25 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
  • 12:25 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 12:23 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 12:03 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@9690bf9]: (no justification provided) (duration: 00m 15s)
  • 12:03 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@9690bf9]: (no justification provided)
  • 12:01 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 12:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138915
  • 12:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138915
  • 11:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9318
  • 11:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9318
  • 11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55821
  • 11:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55821
  • 11:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398143
  • 11:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398143
  • 11:57 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2001-dev.codfw.wmnet with reason: host reimage
  • 11:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26077
  • 11:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 26077
  • 11:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 50266
  • 11:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 50266
  • 11:54 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2001-dev.codfw.wmnet with reason: host reimage
  • 11:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14593
  • 11:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14593
  • 11:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56898
  • 11:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56898
  • 11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8368
  • 11:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8368
  • 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8560
  • 11:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8560
  • 11:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34309
  • 11:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34309
  • 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12033
  • 11:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12033
  • 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62537
  • 11:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62537
  • 11:41 XioNoX: restart keyholder on deploy1002
  • 11:41 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 11:40 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 11:38 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 11:36 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 11:27 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:26 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 56s)
  • 11:25 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
  • 11:25 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
  • 11:24 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
  • 11:24 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:15 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
  • 11:15 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
  • 11:15 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb2001-dev.mgmt.codfw.wmnet on all recursors
  • 11:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
  • 11:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-worker1087.eqiad.wmnet with reason: Shutting down an-worker1087 to allow for RAID BBU replacement
  • 11:13 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2001-dev.codfw.wmnet with OS bullseye
  • 11:12 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2001-dev.codfw.wmnet with OS bullseye
  • 11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:12 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
  • 11:11 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
  • 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-corp1001.wikimedia.org
  • 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:09 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:08 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 11:08 aborrero@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:05 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp1001.wikimedia.org
  • 11:04 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: apply on main
  • 11:04 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 11:03 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 11:01 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: apply on main
  • 11:01 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 10:53 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ldap-corp1001.wikimedia.org
  • 10:52 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp1001.wikimedia.org
  • 10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:45 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
  • 10:38 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Sync for cloudlb2001-dev - aborrero@cumin2002"
  • 10:37 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
  • 10:37 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:26 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-corp2001.wikimedia.org
  • 10:23 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-corp2001.wikimedia.org
  • 09:40 moritzm: disabling old bastions bast3005/bast4003/bast5002/bast6001, use bast3006/bast4004/bast5003/bast6002 instead
  • 08:23 marostegui: Apply schema change on labtestwiki (clouddb2002-dev)T328086
  • 08:22 marostegui: Apply schema change on db1106 (s1 enwiki) T328086
  • 08:06 elukey: restart kube-apiserver on ml-staging-ctrl2* nodes as attempt to mitigate some LIST API high latency
  • 07:41 elukey: restart kube-apiserver on ml-serve-ctrl2* nodes as attempt to mitigate some 504 API response errors
  • 01:15 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2*: Applying configuration change to cassandra-dev cluster - eevans@cumin1001
  • 01:11 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet
  • 01:10 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4047.ulsfo.wmnet with OS bullseye
  • 00:56 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2*: Applying configuration change to cassandra-dev cluster - eevans@cumin1001
  • 00:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
  • 00:45 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
  • 00:33 zabe@deploy1002: Finished scap: Backport for Stop setting cul_actor migration var (T233004) (duration: 07m 36s)
  • 00:27 zabe@deploy1002: zabe: Backport for Stop setting cul_actor migration var (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 00:26 zabe@deploy1002: Started scap: Backport for Stop setting cul_actor migration var (T233004)
  • 00:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
  • 00:24 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye
  • 00:16 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
  • 00:15 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye
  • 00:11 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
  • 00:10 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS bullseye

2023-01-26

  • 23:59 zabe@deploy1002: Finished scap: Backport for Add a project logo on gorwiktionary (T327987) (duration: 34m 42s)
  • 23:54 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS bullseye
  • 23:52 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4039.ulsfo.wmnet
  • 23:51 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4039.ulsfo.wmnet with OS bullseye
  • 23:28 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
  • 23:26 zabe@deploy1002: zabe and superpes: Backport for Add a project logo on gorwiktionary (T327987) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 23:25 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
  • 23:24 zabe@deploy1002: Started scap: Backport for Add a project logo on gorwiktionary (T327987)
  • 23:13 sbassett@deploy1002: Synchronized private/PrivateSettings.php: T326691 - remove mitigation and monitor (duration: 06m 52s)
  • 23:04 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
  • 23:04 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS bullseye
  • 23:03 zabe@deploy1002: Finished scap: Backport for Pin CheckUserEventTablesMigrationStage to read and write old (T324907) (duration: 08m 36s)
  • 22:56 zabe@deploy1002: dreamyjazz and zabe: Backport for Pin CheckUserEventTablesMigrationStage to read and write old (T324907) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:54 zabe@deploy1002: Started scap: Backport for Pin CheckUserEventTablesMigrationStage to read and write old (T324907)
  • 22:45 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
  • 22:44 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4046.ulsfo.wmnet
  • 22:44 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS bullseye
  • 22:23 zabe: running migrateRevisionCommentTemp.php in cebwiki in screen with --sleep 2 # T275246
  • 22:22 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
  • 22:18 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
  • 21:58 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
  • 21:47 thcipriani@deploy1002: Finished scap: Backport for Increase threshold for table of contents collapsing (T328045), Remove redundant block for search descriptions (T324859) (duration: 08m 49s)
  • 21:40 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for Increase threshold for table of contents collapsing (T328045), Remove redundant block for search descriptions (T324859) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:39 thcipriani@deploy1002: Started scap: Backport for Increase threshold for table of contents collapsing (T328045), Remove redundant block for search descriptions (T324859)
  • 21:36 thcipriani@deploy1002: Finished scap: Backport for ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704) (duration: 08m 43s)
  • 21:35 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS bullseye
  • 21:34 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
  • 21:33 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS bullseye
  • 21:33 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
  • 21:33 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS bullseye
  • 21:29 thcipriani@deploy1002: matmarex and thcipriani: Backport for ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:27 thcipriani@deploy1002: Started scap: Backport for ApiDiscussionToolsEdit: Unwrap Parsoid sections before parsing (T327704)
  • 21:25 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS bullseye
  • 21:25 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
  • 21:24 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
  • 21:20 thcipriani@deploy1002: Finished scap: Backport for Enable write new for CheckUserLog comment fields everywhere (T233004) (duration: 11m 18s)
  • 21:11 thcipriani@deploy1002: thcipriani and dreamyjazz: Backport for Enable write new for CheckUserLog comment fields everywhere (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:09 thcipriani@deploy1002: Started scap: Backport for Enable write new for CheckUserLog comment fields everywhere (T233004)
  • 21:01 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
  • 20:56 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
  • 20:36 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 20:13 ryankemper: `ryankemper@thanos-fe1001:~$ sudo run-puppet-agent` following merge of wdqs recording rule patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/883610
  • 20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on cp2027.codfw.wmnet with reason: reimaging
  • 20:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on cp2027.codfw.wmnet with reason: reimaging
  • 20:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
  • 19:56 brett@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4038.ulsfo.wmnet with OS bullseye
  • 19:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp2027.codfw.wmnet with reason: reimaging
  • 19:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp2027.codfw.wmnet with reason: reimaging
  • 19:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.20 refs T325583
  • 19:00 brennen: 1.40.0-wmf.20 train (T325583): no current blockers, rolling to all wikis.
  • 18:59 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 18:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet
  • 18:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS bullseye
  • 18:20 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
  • 18:17 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
  • 18:17 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 18:16 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 18:16 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 18:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 18:15 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 18:15 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 18:15 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 18:15 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 18:15 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 18:15 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 18:15 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 18:15 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 18:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 18:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 18:14 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 18:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 18:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 18:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 18:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 18:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 18:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 18:10 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 18:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 18:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:59 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS bullseye
  • 17:55 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet
  • 17:49 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS bullseye
  • 17:30 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1015.eqiad.wmnet
  • 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43427 and previous config saved to /var/cache/conftool/dbconfig/20230126-172806-root.json
  • 17:27 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 17:24 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1015.eqiad.wmnet
  • 17:24 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 17:22 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1014.eqiad.wmnet
  • 17:19 dancy@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 11s)
  • 17:19 dancy@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
  • 17:16 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1014.eqiad.wmnet
  • 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43426 and previous config saved to /var/cache/conftool/dbconfig/20230126-171302-root.json
  • 17:12 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1013.eqiad.wmnet
  • 17:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
  • 17:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
  • 17:06 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1013.eqiad.wmnet
  • 17:06 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS bullseye
  • 17:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet
  • 17:05 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 17:05 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 17:04 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6007.drmrs.wmnet
  • 17:03 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS bullseye
  • 17:02 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1012.eqiad.wmnet
  • 16:59 cgoubert@deploy1002: Synchronized tox.ini: Rebuilding mediawiki-webserver (duration: 07m 19s)
  • 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43425 and previous config saved to /var/cache/conftool/dbconfig/20230126-165757-root.json
  • 16:56 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1012.eqiad.wmnet
  • 16:53 claime: Running scap sync-file -D php_fpm_restart_script:/bin/true tox.ini "Rebuilding mediawiki-webserver image" - T326794
  • 16:51 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
  • 16:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2027']
  • 16:48 sukhe: correcting earlier log: pooling lvs2007 after T326564
  • 16:48 sukhe: pooling lvs2009 after T326564
  • 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43424 and previous config saved to /var/cache/conftool/dbconfig/20230126-164252-root.json
  • 16:41 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
  • 16:41 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2027']
  • 16:38 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
  • 16:38 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
  • 16:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
  • 16:31 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1011.eqiad.wmnet
  • 16:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
  • 16:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
  • 16:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
  • 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43423 and previous config saved to /var/cache/conftool/dbconfig/20230126-162747-root.json
  • 16:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
  • 16:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet
  • 16:24 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001-dev
  • 16:23 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001-dev
  • 16:23 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1011.eqiad.wmnet
  • 16:21 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1010.eqiad.wmnet
  • 16:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:20 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:19 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet
  • 16:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:19 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 16:18 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS bullseye
  • 16:14 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1010.eqiad.wmnet
  • 16:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717
  • 16:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp3051.esams.wmnet with reason: extending downtime: T323717
  • 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43422 and previous config saved to /var/cache/conftool/dbconfig/20230126-161242-root.json
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 T328024', diff saved to https://phabricator.wikimedia.org/P43421 and previous config saved to /var/cache/conftool/dbconfig/20230126-161137-root.json
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 primary T328024', diff saved to https://phabricator.wikimedia.org/P43420 and previous config saved to /var/cache/conftool/dbconfig/20230126-161058-marostegui.json
  • 16:10 marostegui: Starting s8 codfw failover from db2161 to db2165 - T328024
  • 16:09 moritzm: installing distro-info-data updates from Bullseye point release
  • 16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudgw2001-dev.codfw.wmnet
  • 16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 16:06 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 16:05 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1009.eqiad.wmnet
  • 15:55 jbond: enable-puppet post deploy requestctl ferm chage gerrit:883935
  • 15:55 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 15:51 hashar: Restarting CI Jenkins for upgrade
  • 15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s8 T328024
  • 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 T328024', diff saved to https://phabricator.wikimedia.org/P43419 and previous config saved to /var/cache/conftool/dbconfig/20230126-155000-root.json
  • 15:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s8 T328024
  • 15:49 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudgw2001-dev.codfw.wmnet
  • 15:46 hashar: Restart Jenkins for upgrade
  • 15:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1009.eqiad.wmnet
  • 15:30 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
  • 15:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
  • 15:30 sukhe: install2003: rm /etc/dhcp/automation/ttyS1-115200/cp2027.conf
  • 15:29 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
  • 15:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
  • 15:27 sukhe: poweroff lvs2007: T326564
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43418 and previous config saved to /var/cache/conftool/dbconfig/20230126-152329-root.json
  • 15:12 jbond: disabl-puppet deplot requestctl ferm chage gerrit:883935
  • 15:09 sukhe: stop pybal on lvs2007: T326564
  • 15:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on lvs2007.codfw.wmnet with reason: powering off for T326564
  • 15:09 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on lvs2007.codfw.wmnet with reason: powering off for T326564
  • 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43417 and previous config saved to /var/cache/conftool/dbconfig/20230126-150824-root.json
  • 15:04 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
  • 15:04 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
  • 15:02 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2027.codfw.wmnet with OS bullseye
  • 15:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
  • 14:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43415 and previous config saved to /var/cache/conftool/dbconfig/20230126-145319-root.json
  • 14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:40 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
  • 14:40 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 14:39 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43414 and previous config saved to /var/cache/conftool/dbconfig/20230126-143814-root.json
  • 14:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
  • 14:37 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 14:36 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-run DNS cookbook after updating zone files - remove esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
  • 14:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:31 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:31 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotating wikiadmin password (T326802) (duration: 07m 04s)
  • 14:27 moritzm: installing containerd security updates
  • 14:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43413 and previous config saved to /var/cache/conftool/dbconfig/20230126-142309-root.json
  • 14:16 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:15 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004) (duration: 09m 16s)
  • 14:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:11 jbond: disable puppet fleet wide to role out etcd ferm change gerrit:883888
  • 14:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:09 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43412 and previous config saved to /var/cache/conftool/dbconfig/20230126-140804-root.json
  • 14:07 lucaswerkmeister-wmde@deploy1002: dreamyjazz and lucaswerkmeister-wmde: Backport for Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123 T328023', diff saved to https://phabricator.wikimedia.org/P43411 and previous config saved to /var/cache/conftool/dbconfig/20230126-140716-root.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2113 to s5 primary T328023', diff saved to https://phabricator.wikimedia.org/P43410 and previous config saved to /var/cache/conftool/dbconfig/20230126-140630-root.json
  • 14:06 marostegui: Starting s5 codfw failover from db2123 to db2113 - T328023
  • 14:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable write new for CheckUserLog comment fields on group 0 and 1 (T233004)
  • 14:00 moritzm: restarting etherpad-lite to pick up nodejs security update
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Remove vslow from db2113, future s5 codfw master T328023', diff saved to https://phabricator.wikimedia.org/P43409 and previous config saved to /var/cache/conftool/dbconfig/20230126-135509-marostegui.json
  • 13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T328023
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2113 with weight 0 T328023', diff saved to https://phabricator.wikimedia.org/P43408 and previous config saved to /var/cache/conftool/dbconfig/20230126-135215-root.json
  • 13:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T328023
  • 13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:44 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
  • 13:37 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove DNS records for removed esams eqiad GRE tunnel link IPs. - cmooney@cumin1001"
  • 13:32 ladsgroup@deploy1002: Finished scap: Backport for Change time zone setting on gorwiktionary (T327986) (duration: 12m 02s)
  • 13:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:25 moritzm: restarting turnilo for nodejs security update
  • 13:22 ladsgroup@deploy1002: superpes and ladsgroup: Backport for Change time zone setting on gorwiktionary (T327986) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:20 ladsgroup@deploy1002: Started scap: Backport for Change time zone setting on gorwiktionary (T327986)
  • 13:10 moritzm: installing nodejs security updates on bullseye
  • 13:09 hashar: Rebooting gerrit2002.wikimedia.org host to validate Apache 2 services starts AFTER network went online | T326125
  • 13:06 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:04 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
  • 12:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
  • 12:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp3051.esams.wmnet with reason: T323717
  • 12:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp3051.esams.wmnet with reason: T323717
  • 12:42 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=ats-be
  • 12:42 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3051.esams.wmnet,service=cdn
  • 12:41 sukhe: depool cp3051.esams.wmnet for firmware update testing: T323717
  • 12:41 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
  • 12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
  • 12:29 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
  • 12:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
  • 12:10 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 12:10 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and not A:thanos-fe and A:swift-fe or A:thanos-fe
  • 12:03 jbond: enable profile::base::firewall::defs_from_etcd: true globally
  • 11:56 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd-client-ssl._tcp.wikimedia.org on all recursors
  • 11:56 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd-client-ssl._tcp.wikimedia.org on all recursors
  • 11:49 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
  • 11:49 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1010.eqiad.wmnet
  • 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flowspec1001
  • 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
  • 11:46 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flowspec1001 decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
  • 11:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:40 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts flowspec1001
  • 11:36 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-aux
  • 11:29 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 11:29 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43405 and previous config saved to /var/cache/conftool/dbconfig/20230126-110822-root.json
  • 11:03 hashar: Restarted Apache 2 on gerrit.wikimedia.org
  • 10:55 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 10:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"
  • 10:54 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 10:54 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43404 and previous config saved to /var/cache/conftool/dbconfig/20230126-105317-root.json
  • 10:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 10:46 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename aux-k8s-ingress service to k8s-ingress-aux - cgoubert@cumin1001"
  • 10:45 moritzm: installing postgresql-13 security updates
  • 10:43 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 10:42 joal@deploy1002: Finished deploy [airflow-dags/analytics@e52205b]: (no justification provided) (duration: 00m 11s)
  • 10:42 joal@deploy1002: Started deploy [airflow-dags/analytics@e52205b]: (no justification provided)
  • 10:41 claime: cgoubert@authdns1001:~$ sudo -i authdns-update
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43403 and previous config saved to /var/cache/conftool/dbconfig/20230126-103812-root.json
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43402 and previous config saved to /var/cache/conftool/dbconfig/20230126-103448-root.json
  • 10:32 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435] (duration: 01m 16s)
  • 10:31 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - third after failure [analytics/refinery@8ed8435]
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43401 and previous config saved to /var/cache/conftool/dbconfig/20230126-102307-root.json
  • 10:21 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435] (duration: 00m 04s)
  • 10:21 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST - Second after failure [analytics/refinery@8ed8435]
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43400 and previous config saved to /var/cache/conftool/dbconfig/20230126-101943-root.json
  • 10:08 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 10:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43399 and previous config saved to /var/cache/conftool/dbconfig/20230126-100802-root.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43398 and previous config saved to /var/cache/conftool/dbconfig/20230126-100438-root.json
  • 09:59 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 09:58 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8ed8435] (duration: 01m 08s)
  • 09:57 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8ed8435]
  • 09:57 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435] (thin): Regular analytics weekly train THIN [analytics/refinery@8ed8435] (duration: 00m 05s)
  • 09:57 joal@deploy1002: Started deploy [analytics/refinery@8ed8435] (thin): Regular analytics weekly train THIN [analytics/refinery@8ed8435]
  • 09:56 joal@deploy1002: Finished deploy [analytics/refinery@8ed8435]: Regular analytics weekly train [analytics/refinery@8ed8435] (duration: 07m 00s)
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43397 and previous config saved to /var/cache/conftool/dbconfig/20230126-095257-root.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43396 and previous config saved to /var/cache/conftool/dbconfig/20230126-095205-root.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43395 and previous config saved to /var/cache/conftool/dbconfig/20230126-094933-root.json
  • 09:49 joal@deploy1002: Started deploy [analytics/refinery@8ed8435]: Regular analytics weekly train [analytics/refinery@8ed8435]
  • 09:48 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 09:48 jbond@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 09:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 09:47 jbond@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 09:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 09:47 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 09:46 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1002.eqiad.wmnet
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43394 and previous config saved to /var/cache/conftool/dbconfig/20230126-093700-root.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43393 and previous config saved to /var/cache/conftool/dbconfig/20230126-093620-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43392 and previous config saved to /var/cache/conftool/dbconfig/20230126-093428-root.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43391 and previous config saved to /var/cache/conftool/dbconfig/20230126-093303-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2144 to x2 primary T313811', diff saved to https://phabricator.wikimedia.org/P43390 and previous config saved to /var/cache/conftool/dbconfig/20230126-092512-root.json
  • 09:24 marostegui: Starting x2 codfw failover from db2142 to db2144 - T328001
  • 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
  • 09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43389 and previous config saved to /var/cache/conftool/dbconfig/20230126-092155-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43388 and previous config saved to /var/cache/conftool/dbconfig/20230126-092115-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43387 and previous config saved to /var/cache/conftool/dbconfig/20230126-091923-root.json
  • 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
  • 09:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T328001
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43386 and previous config saved to /var/cache/conftool/dbconfig/20230126-091758-root.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43385 and previous config saved to /var/cache/conftool/dbconfig/20230126-090650-root.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43384 and previous config saved to /var/cache/conftool/dbconfig/20230126-090610-root.json
  • 09:05 phedenskog@deploy1002: Finished deploy [performance/navtiming@e5fdd6e]: (no justification provided) (duration: 00m 06s)
  • 09:05 phedenskog@deploy1002: Started deploy [performance/navtiming@e5fdd6e]: (no justification provided)
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P43383 and previous config saved to /var/cache/conftool/dbconfig/20230126-090418-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T328000', diff saved to https://phabricator.wikimedia.org/P43382 and previous config saved to /var/cache/conftool/dbconfig/20230126-090302-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43381 and previous config saved to /var/cache/conftool/dbconfig/20230126-090253-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 primary T328000', diff saved to https://phabricator.wikimedia.org/P43380 and previous config saved to /var/cache/conftool/dbconfig/20230126-090212-root.json
  • 09:02 marostegui: Starting s7 codfw failover from db2121 to db2118 - T328000
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43379 and previous config saved to /var/cache/conftool/dbconfig/20230126-085145-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43378 and previous config saved to /var/cache/conftool/dbconfig/20230126-085105-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43377 and previous config saved to /var/cache/conftool/dbconfig/20230126-084748-root.json
  • 08:44 moritzm: added Eoghan to pwstore
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T328000
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2118 with weight 0 T328000', diff saved to https://phabricator.wikimedia.org/P43376 and previous config saved to /var/cache/conftool/dbconfig/20230126-084112-root.json
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T328000
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43375 and previous config saved to /var/cache/conftool/dbconfig/20230126-083640-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43374 and previous config saved to /var/cache/conftool/dbconfig/20230126-083600-root.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2105 T327999', diff saved to https://phabricator.wikimedia.org/P43373 and previous config saved to /var/cache/conftool/dbconfig/20230126-083543-root.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2127 to s3 primary T327999', diff saved to https://phabricator.wikimedia.org/P43372 and previous config saved to /var/cache/conftool/dbconfig/20230126-083459-root.json
  • 08:34 marostegui: Starting s3 codfw failover from db2105 to db2127 - T327999
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43371 and previous config saved to /var/cache/conftool/dbconfig/20230126-083243-root.json
  • 08:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T327999
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2127 with weight 0 T327999', diff saved to https://phabricator.wikimedia.org/P43370 and previous config saved to /var/cache/conftool/dbconfig/20230126-082432-root.json
  • 08:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 23 hosts with reason: Primary switchover s3 T327999
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43369 and previous config saved to /var/cache/conftool/dbconfig/20230126-082055-root.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43368 and previous config saved to /var/cache/conftool/dbconfig/20230126-082038-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 T327998', diff saved to https://phabricator.wikimedia.org/P43367 and previous config saved to /var/cache/conftool/dbconfig/20230126-081916-root.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2107 to s2 primary T327998', diff saved to https://phabricator.wikimedia.org/P43366 and previous config saved to /var/cache/conftool/dbconfig/20230126-081818-root.json
  • 08:17 marostegui: Starting s2 codfw failover from db2104 to db2107 - T327998
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43365 and previous config saved to /var/cache/conftool/dbconfig/20230126-081738-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43364 and previous config saved to /var/cache/conftool/dbconfig/20230126-080533-root.json
  • 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T327998
  • 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T327998
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2107 with weight 0 T327998', diff saved to https://phabricator.wikimedia.org/P43363 and previous config saved to /var/cache/conftool/dbconfig/20230126-080427-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P43362 and previous config saved to /var/cache/conftool/dbconfig/20230126-080233-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103 T327997', diff saved to https://phabricator.wikimedia.org/P43361 and previous config saved to /var/cache/conftool/dbconfig/20230126-080159-root.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 primary T327997', diff saved to https://phabricator.wikimedia.org/P43360 and previous config saved to /var/cache/conftool/dbconfig/20230126-080033-root.json
  • 08:00 marostegui: Starting s1 codfw failover from db2103 to db2112 - T327997
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43359 and previous config saved to /var/cache/conftool/dbconfig/20230126-075028-root.json
  • 07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2012.*
  • 07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2011.*
  • 07:49 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2010.*
  • 07:48 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=inactive; selector: name=wdqs2009.*
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2112 with weight 0 T327997', diff saved to https://phabricator.wikimedia.org/P43358 and previous config saved to /var/cache/conftool/dbconfig/20230126-073616-root.json
  • 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 38 hosts with reason: Primary switchover s1 T327997
  • 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 38 hosts with reason: Primary switchover s1 T327997
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43357 and previous config saved to /var/cache/conftool/dbconfig/20230126-073523-root.json
  • 07:25 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Depool pc2011 (T327925) (duration: 11m 19s)
  • 07:25 dcausse: T322869: depooling wdqs2009 wdqs2010 wdqs2011 wdqs2012 these hosts should not serve user traffic yet they don't have the database loaded
  • 07:23 marostegui: Failover m1 from db1195 to db1176 - T327800
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43356 and previous config saved to /var/cache/conftool/dbconfig/20230126-072017-root.json
  • 07:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
  • 07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1001.eqiad.wmnet with reason: m1 switchover
  • 07:17 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
  • 07:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backupmon1001.eqiad.wmnet with reason: m1 switchover
  • 07:16 marostegui@deploy1002: marostegui: Backport for ProductionServices.php: Depool pc2011 (T327925) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:14 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Depool pc2011 (T327925)
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800
  • 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1176,1195].eqiad.wmnet with reason: Primary switchover m1 T327800
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: After DIMM replacement', diff saved to https://phabricator.wikimedia.org/P43354 and previous config saved to /var/cache/conftool/dbconfig/20230126-070512-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db1103', diff saved to https://phabricator.wikimedia.org/P43353 and previous config saved to /var/cache/conftool/dbconfig/20230126-070220-marostegui.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T327861', diff saved to https://phabricator.wikimedia.org/P43352 and previous config saved to /var/cache/conftool/dbconfig/20230126-070158-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 primary and set section read-write T327861', diff saved to https://phabricator.wikimedia.org/P43351 and previous config saved to /var/cache/conftool/dbconfig/20230126-070035-marostegui.json
  • 07:00 marostegui: Starting x1 eqiad failover from db1120 to db1103 - T327861
  • 06:48 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6015.drmrs.wmnet
  • 06:48 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS bullseye
  • 06:32 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotating wikiuser password (T326802) (duration: 07m 23s)
  • 06:20 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 06:18 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1103 with weight 0 T327861', diff saved to https://phabricator.wikimedia.org/P43350 and previous config saved to /var/cache/conftool/dbconfig/20230126-061751-root.json
  • 06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327861
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327861
  • 05:57 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS bullseye
  • 05:53 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6006.drmrs.wmnet
  • 05:53 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6006.drmrs.wmnet with OS bullseye
  • 05:32 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
  • 05:28 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
  • 05:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6006.drmrs.wmnet with OS bullseye
  • 05:09 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6014.drmrs.wmnet
  • 05:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS bullseye
  • 04:45 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
  • 04:42 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
  • 04:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS bullseye
  • 04:22 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6005.drmrs.wmnet
  • 04:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6005.drmrs.wmnet with OS bullseye
  • 03:52 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
  • 03:49 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
  • 03:29 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6005.drmrs.wmnet with OS bullseye
  • 03:27 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6013.drmrs.wmnet
  • 03:27 ejegg: payments-wiki upgraded from 08b8c3bc to 82d89841
  • 03:26 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS bullseye
  • 03:04 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
  • 03:01 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
  • 02:41 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS bullseye
  • 02:30 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
  • 02:17 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
  • 02:17 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2027.codfw.wmnet with OS bullseye
  • 01:58 ejegg: restarted fundraising scheduled jobs after queue server reboot
  • 01:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS bullseye
  • 01:49 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.codfw.wmnet,service=ats-be
  • 01:49 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.codfw.wmnet,service=cdn
  • 01:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2027.codfw.wmnet with reason: firmware test
  • 01:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2027.codfw.wmnet with reason: firmware test
  • 01:46 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet,service=ats-be
  • 01:46 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet,service=cdn
  • 01:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2028.codfw.wmnet with OS bullseye
  • 01:24 ejegg: payments-wiki upgraded from 15395d05 to 08b8c3bc (upgraded from MW 1.35 to MW 1.39)
  • 01:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
  • 01:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
  • 01:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
  • 01:14 ejegg: disabled fundraising scheduled jobs for queue server reboot
  • 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2028.codfw.wmnet with OS bullseye
  • 01:03 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
  • 01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2028.codfw.wmnet
  • 01:00 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2*: Enable internode encryption - eevans@cumin1001
  • 01:00 ejegg: turned pending transaction resolvers back on after civi deploy
  • 00:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2028.codfw.wmnet
  • 00:50 ejegg: civicrm upgraded from 3e6b21b6 to b5d6a790
  • 00:50 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
  • 00:49 sukhe: depool cp2028 for testing firmware update cookbook: T321309
  • 00:49 ejegg: disabled pending transaction resolvers for civi deploy
  • 00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=ats-be
  • 00:48 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2028.codfw.wmnet,service=cdn

2023-01-25

  • 23:57 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6004.drmrs.wmnet
  • 23:57 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS bullseye
  • 23:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
  • 23:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
  • 23:29 zabe@deploy1002: Finished scap: (no justification provided) (duration: 07m 34s)
  • 23:21 zabe@deploy1002: Started scap: (no justification provided)
  • 23:20 zabe@deploy1002: Backport cancelled.
  • 23:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS bullseye
  • 23:13 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6012.drmrs.wmnet
  • 23:07 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS bullseye
  • 22:43 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
  • 22:40 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
  • 22:21 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS bullseye
  • 22:14 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6003.drmrs.wmnet
  • 21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
  • 21:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
  • 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
  • 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
  • 21:34 samtar@deploy1002: Finished scap: Backport for Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714) (duration: 09m 27s)
  • 21:26 samtar@deploy1002: jdrewniak and samtar: Backport for Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714) synced to the testservers: mwdebug2002.cod
  • 21:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:24 samtar@deploy1002: Started scap: Backport for Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714), Define grid template row for .mw-body grid container to ensure the grid cell containing the content will expand in height when needed (T327714)
  • 21:06 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
  • 20:59 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
  • 20:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS bullseye
  • 20:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts cp2028.codfw.wmnet
  • 20:58 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
  • 20:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
  • 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
  • 20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
  • 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
  • 20:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2028.codfw.wmnet
  • 20:49 ejegg: updated employers.csv on paymentswiki
  • 20:49 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2028.codfw.wmnet
  • 20:33 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
  • 20:32 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka jumbo-eqiad cluster: Reboot kafka nodes
  • 20:30 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
  • 20:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS bullseye
  • 20:00 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6011.drmrs.wmnet
  • 19:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS bullseye
  • 19:52 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host centrallog1002.eqiad.wmnet with OS bullseye
  • 19:38 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
  • 19:36 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
  • 19:33 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on centrallog1002.eqiad.wmnet with reason: host reimage
  • 19:33 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
  • 19:21 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
  • 19:17 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.20 refs T325583 (duration: 07m 04s)
  • 19:12 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS bullseye
  • 19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.20 refs T325583
  • 19:06 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6002.drmrs.wmnet
  • 19:01 brennen: 1.40.0-wmf.20 train (T325583): no blockers, rolling to group1.
  • 19:00 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
  • 19:00 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye
  • 18:59 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS bullseye
  • 18:37 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
  • 18:35 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 18:34 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
  • 18:33 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 18:33 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 18:32 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 18:14 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS bullseye
  • 18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 18:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 18:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:05 brett@cumin1001: conftool action : set/pooled=yes; selector: name=cp6010.drmrs.wmnet
  • 17:58 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS bullseye
  • 17:32 mutante: removing racktables.wikimedia.org from DNS - that's it for this ancient service T327405
  • 16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
  • 16:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
  • 16:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2031.codfw.wmnet with OS bullseye
  • 16:50 btullis@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka jumbo-eqiad cluster: Reboot kafka nodes
  • 16:46 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
  • 16:43 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
  • 16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=ats-be
  • 16:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet,service=cdn
  • 16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
  • 16:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
  • 16:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
  • 16:24 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS bullseye
  • 16:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
  • 16:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
  • 16:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
  • 16:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
  • 16:08 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
  • 16:04 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 16:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
  • 15:56 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
  • 15:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['cp2031']
  • 15:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:50 robh: db1139 ilom wins/netbios disabled and ilom reset T327877
  • 15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 15:47 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
  • 15:46 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
  • 15:45 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031']
  • 15:45 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031']
  • 15:44 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2031.codfw.wmnet']
  • 15:44 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2031.codfw.wmnet']
  • 15:43 robh: netbios wins disabled on db1140 ilom and ilom reset T327877
  • 15:43 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
  • 15:38 papaul: on going maintenance on fasw-c-eqiad
  • 15:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
  • 15:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
  • 15:33 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2031.codfw.wmnet with OS bullseye
  • 15:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
  • 15:23 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
  • 15:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
  • 15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-be
  • 15:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=cdn
  • 15:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS bullseye
  • 15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
  • 15:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 15:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:12 urbanecm@deploy1002: Finished scap: triggering i18n refresh for T327824 (duration: 07m 57s)
  • 15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS bullseye
  • 15:04 urbanecm@deploy1002: Started scap: triggering i18n refresh for T327824
  • 15:04 urbanecm@deploy1002: Finished scap: Backport for Enable the Wikibase REST API on Wikidata (T324999) (duration: 08m 43s)
  • 15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-be
  • 15:02 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=cdn
  • 15:01 urbanecm: Overrunning B&C window
  • 14:57 urbanecm@deploy1002: urbanecm and migr: Backport for Enable the Wikibase REST API on Wikidata (T324999) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
  • 14:55 urbanecm@deploy1002: Started scap: Backport for Enable the Wikibase REST API on Wikidata (T324999)
  • 14:53 btullis@cumin1001: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 14:53 urbanecm@deploy1002: Finished scap: Backport for REST: Use error log level for unexpected errors (T327490), User impact: amend incorrect parameter for the single day streak text (T327824) (duration: 32m 21s)
  • 14:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 14:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install6002.wikimedia.org
  • 14:40 urbanecm@deploy1002: jakob and sgimeno and urbanecm: Backport for REST: Use error log level for unexpected errors (T327490), User impact: amend incorrect parameter for the single day streak text (T327824) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install6002.wikimedia.org on all recursors
  • 14:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install6002.wikimedia.org on all recursors
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
  • 14:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
  • 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6002.wikimedia.org - jmm@cumin2002"
  • 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:28 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 14:25 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install6002.wikimedia.org
  • 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5002.wikimedia.org
  • 14:21 urbanecm@deploy1002: Started scap: Backport for REST: Use error log level for unexpected errors (T327490), User impact: amend incorrect parameter for the single day streak text (T327824)
  • 14:16 urbanecm@deploy1002: Finished scap: Backport for Enable Draft namespace on Serbo-Croatian Wikipedia (T327864) (duration: 12m 59s)
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5002.wikimedia.org on all recursors
  • 14:09 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5002.wikimedia.org on all recursors
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
  • 14:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5002.wikimedia.org - jmm@cumin2002"
  • 14:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 14:05 urbanecm@deploy1002: aleksandar and urbanecm: Backport for Enable Draft namespace on Serbo-Croatian Wikipedia (T327864) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:04 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5002.wikimedia.org
  • 14:03 urbanecm@deploy1002: Started scap: Backport for Enable Draft namespace on Serbo-Croatian Wikipedia (T327864)
  • 13:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
  • 13:51 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
  • 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4002.wikimedia.org
  • 13:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4002.wikimedia.org
  • 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install3002.wikimedia.org
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install3002.wikimedia.org on all recursors
  • 13:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install3002.wikimedia.org on all recursors
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
  • 13:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3002.wikimedia.org - jmm@cumin2002"
  • 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install3002.wikimedia.org
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install2004.wikimedia.org
  • 13:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4037.ulsfo.wmnet with OS bullseye
  • 13:04 jbond: puppet now using vendored version of augeas-core https://gerrit.wikimedia.org/r/c/operations/puppet/+/883233
  • 13:04 jbond: enable puppet fleet wide to post deploy gerrit:883233
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install2004.wikimedia.org on all recursors
  • 13:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install2004.wikimedia.org on all recursors
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
  • 12:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2004.wikimedia.org - jmm@cumin2002"
  • 12:54 jbond: disable puppet fleet wide to deploy gerrit:883233
  • 12:54 jnuche@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 21s)
  • 12:54 jnuche@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
  • 12:45 moritzm: restarting Exim on MXes to pick up new libtasn
  • 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2003.codfw.wmnet
  • 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe2002.codfw.wmnet
  • 12:43 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1003.eqiad.wmnet
  • 12:42 filippo@cumin1001: conftool action : set/pooled=no; selector: service=thanos-web,name=thanos-fe1002.eqiad.wmnet
  • 12:41 moritzm: restarting slapd on r/w servers to pick up new libtasn
  • 12:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:37 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install2004.wikimedia.org
  • 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install1004.wikimedia.org
  • 12:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install1004.wikimedia.org on all recursors
  • 12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install1004.wikimedia.org on all recursors
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
  • 12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1004.wikimedia.org - jmm@cumin2002"
  • 12:12 moritzm: installing libtasn security updates on buster
  • 11:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install1004.wikimedia.org
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
  • 11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
  • 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
  • 11:34 Lucas_WMDE: Updated the Wikidata property suggester with data from 20230102's JSON dump (T325942)
  • 11:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
  • 11:27 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:12 hnowlan: restarting lvs on lvs1019 for thumbor healthcheck change
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43344 and previous config saved to /var/cache/conftool/dbconfig/20230125-111059-root.json
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43343 and previous config saved to /var/cache/conftool/dbconfig/20230125-110924-root.json
  • 11:08 hnowlan: restarting lvs on lvs2009 for thumbor healthcheck change
  • 11:00 hnowlan: restarting lvs on lvs1020 for thumbor healthcheck change
  • 11:00 hnowlan: restarting lvs on lvs1010 for thumbor healthcheck change
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43342 and previous config saved to /var/cache/conftool/dbconfig/20230125-105554-root.json
  • 10:54 hnowlan: restarting lvs on lvs2010 for thumbor healthcheck change
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P43341 and previous config saved to /var/cache/conftool/dbconfig/20230125-105443-root.json
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43340 and previous config saved to /var/cache/conftool/dbconfig/20230125-105419-root.json
  • 10:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:43 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43338 and previous config saved to /var/cache/conftool/dbconfig/20230125-104049-root.json
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P43337 and previous config saved to /var/cache/conftool/dbconfig/20230125-103938-root.json
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43336 and previous config saved to /var/cache/conftool/dbconfig/20230125-103914-root.json
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43335 and previous config saved to /var/cache/conftool/dbconfig/20230125-102544-root.json
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P43334 and previous config saved to /var/cache/conftool/dbconfig/20230125-102433-root.json
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43333 and previous config saved to /var/cache/conftool/dbconfig/20230125-102409-root.json
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43332 and previous config saved to /var/cache/conftool/dbconfig/20230125-101039-root.json
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P43331 and previous config saved to /var/cache/conftool/dbconfig/20230125-100928-root.json
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43330 and previous config saved to /var/cache/conftool/dbconfig/20230125-100904-root.json
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43329 and previous config saved to /var/cache/conftool/dbconfig/20230125-095534-root.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P43328 and previous config saved to /var/cache/conftool/dbconfig/20230125-095423-root.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43327 and previous config saved to /var/cache/conftool/dbconfig/20230125-095400-root.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After cloning db1198', diff saved to https://phabricator.wikimedia.org/P43326 and previous config saved to /var/cache/conftool/dbconfig/20230125-094029-root.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P43325 and previous config saved to /var/cache/conftool/dbconfig/20230125-093918-root.json
  • 09:30 Emperor: rolling depool & update of thanos front-ends T327871
  • 08:40 XioNoX: bump SGIX max prefix limit
  • 08:13 ladsgroup@deploy1002: Finished scap: Backport for Add sandbox link to Serbo-Croatian Wikipedia (T327833) (duration: 10m 13s)
  • 08:05 ladsgroup@deploy1002: ladsgroup and aleksandar: Backport for Add sandbox link to Serbo-Croatian Wikipedia (T327833) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:03 ladsgroup@deploy1002: Started scap: Backport for Add sandbox link to Serbo-Croatian Wikipedia (T327833)
  • 07:49 marostegui: Cloning db1196 from db1206 (lag will appear on s1 wiki replicas) T327859
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206 to clone db1196 T327859', diff saved to https://phabricator.wikimedia.org/P43322 and previous config saved to /var/cache/conftool/dbconfig/20230125-074601-marostegui.json
  • 07:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@bfff15d]: (no justification provided) (duration: 00m 05s)
  • 07:34 phedenskog@deploy1002: Started deploy [performance/navtiming@bfff15d]: (no justification provided)
  • 07:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
  • 07:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 to clone db1198', diff saved to https://phabricator.wikimedia.org/P43320 and previous config saved to /var/cache/conftool/dbconfig/20230125-072033-marostegui.json
  • 07:08 AndyRussG: updated payments (config only) revision 15395d05, config 418160e9
  • 04:10 eileen: config revision changed from dc0a0d3a to 089d0acb
  • 04:01 eileen: civicrm upgraded from 9197ca29 to 3e6b21b6
  • 03:27 eileen: civicrm upgraded from f6093fb2 to 9197ca29
  • 03:05 eileen: config revision changed from 3f641fce to dc0a0d3a
  • 01:17 legoktm: adjusting Gerrit group "Campaigns Team" so it is not recursively a member of itself
  • 00:10 denisse@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host centrallog1002.eqiad.wmnet with OS bullseye
  • 00:10 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host centrallog1002.eqiad.wmnet with OS bullseye

2023-01-24

  • 23:10 zabe@deploy1002: Finished scap: Backport for Start reading from rev_comment_id on testcommonswiki (T299954) (duration: 08m 02s)
  • 23:04 zabe@deploy1002: zabe: Backport for Start reading from rev_comment_id on testcommonswiki (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 23:02 zabe@deploy1002: Started scap: Backport for Start reading from rev_comment_id on testcommonswiki (T299954)
  • 22:47 TheresNoTime: closing UTC late backport window
  • 22:47 samtar@deploy1002: Finished scap: Backport for Add temporary extra grid-area for content translation extension (T327715), Add temporary extra grid-area for content translation extension (T327715) (duration: 09m 04s)
  • 22:39 samtar@deploy1002: jdrewniak and samtar: Backport for Add temporary extra grid-area for content translation extension (T327715), Add temporary extra grid-area for content translation extension (T327715) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:37 samtar@deploy1002: Started scap: Backport for Add temporary extra grid-area for content translation extension (T327715), Add temporary extra grid-area for content translation extension (T327715)
  • 22:30 samtar@deploy1002: Finished scap: Backport for [BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724), newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114) (duration: 07m 59s)
  • 22:23 samtar@deploy1002: jforrester and samtar and stang: Backport for [BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724), newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:22 samtar@deploy1002: Started scap: Backport for [BETA CLUSTER] Don't try to load Kartographer on Wikifunctions at all (T327724), newiki: Fix wgAddGroups/wgRemoveGroups setting (T327114)
  • 22:20 samtar@deploy1002: Finished scap: Backport for newiki: Add new permissions to group reviewer (T327114) (duration: 09m 02s)
  • 22:19 mutante: DNS - adding new project language "gur" (Gurenɛ) - Gurenɛ is a major language of northern Ghana and the predominant language of the Upper East Region of Ghana. It is also widely spoken in Burkina Faso.. T327813
  • 22:13 samtar@deploy1002: samtar and stang: Backport for newiki: Add new permissions to group reviewer (T327114) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:11 samtar@deploy1002: Started scap: Backport for newiki: Add new permissions to group reviewer (T327114)
  • 22:08 samtar@deploy1002: Finished scap: Backport for Fix Wikitext editor preview layout in Vector 2022 (T327778), Fix Wikitext editor preview layout in Vector 2022 (T327778) (duration: 09m 36s)
  • 22:06 TheresNoTime: extending UTC late backport window due to late start
  • 22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=ats-be
  • 22:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet,service=cdn
  • 22:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS bullseye
  • 22:00 samtar@deploy1002: samtar and jdrewniak: Backport for Fix Wikitext editor preview layout in Vector 2022 (T327778), Fix Wikitext editor preview layout in Vector 2022 (T327778) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:59 samtar@deploy1002: Started scap: Backport for Fix Wikitext editor preview layout in Vector 2022 (T327778), Fix Wikitext editor preview layout in Vector 2022 (T327778)
  • 21:56 samtar@deploy1002: Finished scap: Backport for Work around sticky-positioned layers disabling subpixel rendering (T327460) (duration: 13m 31s)
  • 21:45 samtar@deploy1002: nray and samtar: Backport for Work around sticky-positioned layers disabling subpixel rendering (T327460) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1009.eqiad.wmnet with OS bullseye
  • 21:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:43 samtar@deploy1002: Started scap: Backport for Work around sticky-positioned layers disabling subpixel rendering (T327460)
  • 21:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
  • 21:38 zabe: running migrateRevisionCommentTemp.php on testcommonswiki (s4) with --sleep 10 # T275246
  • 21:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
  • 21:32 samtar@deploy1002: backport aborted: (duration: 06m 28s)
  • 21:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
  • 21:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
  • 21:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS bullseye
  • 21:05 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
  • 21:03 TheresNoTime: holding UTC late backport window for outage, T327815
  • 21:01 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
  • 20:50 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 20:50 urandom: rebooting sessionstore1001.eqiad.wmnet -- T325132
  • 20:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host sessionstore1001.eqiad.wmnet
  • 20:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 20:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2027.codfw.wmnet
  • 20:32 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2027.codfw.wmnet
  • 20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=ats-be
  • 20:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
  • 20:31 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5025.eqsin.wmnet,service=cdn
  • 20:29 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet
  • 20:29 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5025.eqsin.wmnet with OS bullseye
  • 20:28 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet
  • 20:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
  • 20:20 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
  • 20:20 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=ats-be
  • 20:19 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6009.drmrs.wmnet,service=cdn
  • 20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=cdn
  • 20:18 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet,service=ats-be
  • 20:16 bblack: pool cp5032
  • 20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=ats-be
  • 20:16 mutante: contint2001 - restarted zuul
  • 20:16 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=5017.eqsin.wmnet,service=cdn
  • 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
  • 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=cdn
  • 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=ats-be
  • 20:14 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2041.codfw.wmnet,service=cdn
  • 20:12 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
  • 20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS bullseye
  • 20:09 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-be
  • 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=cdn
  • 20:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
  • 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=ats-be
  • 20:08 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2041.codfw.wmnet,service=cdn
  • 20:05 brett@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS bullseye
  • 20:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
  • 19:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
  • 19:56 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
  • 19:54 sukhe: reprepro -C main include bullseye-wikimedia libvmod-netmapper_1.9-3_amd64.changes: T326634
  • 19:53 sukhe: reprepro -C main include bullseye-wikimedia libvmod-re2_1.5.3-4_amd64.changes: T326634
  • 19:53 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
  • 19:51 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
  • 19:47 sukhe: reprepro -C main include bullseye-wikimedia libvmod-querysort_0.4_amd64.changes: T326634
  • 19:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
  • 19:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
  • 19:39 urandom: rebooting restbase cassandra nodes, row d -- T325132
  • 19:33 bblack: cp5032: restart varnish-frontend
  • 19:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2025.codfw.wmnet
  • 19:28 sukhe: reprepro -C main include bullseye-wikimedia varnish-modules_0.15.0-3_amd64.changes: T326634
  • 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
  • 19:24 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
  • 19:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2025.codfw.wmnet
  • 19:19 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
  • 19:19 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5025.eqsin.wmnet with OS bullseye
  • 19:10 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.20 refs T325583
  • 19:06 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1011.eqiad.wmnet with OS bullseye
  • 19:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1010.eqiad.wmnet with OS bullseye
  • 19:05 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 19:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
  • 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
  • 18:55 jynus: deploy new dump grants for analytics dbs at db1108 T327155
  • 18:43 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS bullseye
  • 18:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS bullseye
  • 18:17 brett@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
  • 18:14 brett@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
  • 18:12 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
  • 18:05 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
  • 17:44 bblack: cp5032: upgrading packages (varnish, trafficserver
  • 17:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2020.codfw.wmnet
  • 17:37 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
  • 17:36 brett@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5017.eqsin.wmnet with OS bullseye
  • 17:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
  • 17:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
  • 17:19 thcipriani: restarting ci jenkins for updates
  • 17:13 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
  • 17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
  • 17:10 brett@cumin1001: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
  • 17:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
  • 17:04 urandom: rebooting restbase cassandra nodes, row c -- T325132
  • 16:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2042.codfw.wmnet with OS bullseye
  • 16:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:23 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 16:23 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 16:22 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 16:22 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 16:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
  • 16:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2042.codfw.wmnet with reason: host reimage
  • 15:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:53 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2042.codfw.wmnet with OS bullseye
  • 15:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:31 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
  • 15:26 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad (duration: 01m 40s)
  • 15:15 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5c58f8f] (codfw): Disable traffic mirroring from codfw to eqiad
  • 15:12 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring" (duration: 00m 33s)
  • 15:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@15e6aa7] (codfw): Revert "codfw: Disable traffic mirroring"
  • 14:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:57 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
  • 14:55 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
  • 14:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:41 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:41 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
  • 14:39 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 14:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
  • 14:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:36 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
  • 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:35 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after switch upgrade - volans@cumin1001"
  • 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:34 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 14:33 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:29 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 14:29 effie: switch maps (kartotherian) from eqiad to codfw (attempt #2)
  • 14:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:25 TheresNoTime: close UTC afternoon backport window
  • 14:24 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:20 XioNoX: repool ulsfo (maintenance over)
  • 14:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host druid1010.eqiad.wmnet with OS bullseye
  • 14:17 samtar@deploy1002: Finished scap: Backport for Increase PC writes from parsoid API to 10% (T320534) (duration: 07m 41s)
  • 14:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:11 samtar@deploy1002: daniel and samtar: Backport for Increase PC writes from parsoid API to 10% (T320534) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:09 samtar@deploy1002: Started scap: Backport for Increase PC writes from parsoid API to 10% (T320534)
  • 13:50 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:44 XioNoX: reboot ulsfo switches for software upgrade
  • 13:40 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1002.eqiad.wmnet
  • 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1002.eqiad.wmnet
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2002.codfw.wmnet
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:10 topranks: enabling tunnel services on cr2-eqdfw fpc 0 pic 1
  • 13:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:04 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping2002.codfw.wmnet
  • 12:56 zabe@deploy1002: Finished scap: Backport for Remove PoolCounter from extension-list (T327336) (duration: 44m 09s)
  • 12:51 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 12:51 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 12:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-proxies (exit_code=0) rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
  • 12:48 XioNoX: restart ulsfo switches for network maintenance
  • 12:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
  • 12:43 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
  • 12:40 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-proxies rolling restart_daemons on A:eqiad and A:swift-fe or A:thanos-fe
  • 12:38 zabe@deploy1002: zabe: Backport for Remove PoolCounter from extension-list (T327336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 12:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor2004.codfw.wmnet
  • 12:12 zabe@deploy1002: Started scap: Backport for Remove PoolCounter from extension-list (T327336)
  • 11:54 volans: uploaded python3-gjson_1.0.0 to apt.wikimedia.org bullseye-wikimedia,unstable-wikimedia
  • 11:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43311 and previous config saved to /var/cache/conftool/dbconfig/20230124-114255-root.json
  • 11:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:36 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3002.esams.wmnet
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping3002.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43310 and previous config saved to /var/cache/conftool/dbconfig/20230124-112750-root.json
  • 11:26 zabe@deploy1002: Finished scap: Backport for Stop loading PoolCounter extension (T327336) (duration: 09m 19s)
  • 11:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1176.eqiad.wmnet with OS bullseye
  • 11:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:22 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping3002.esams.wmnet
  • 11:19 zabe@deploy1002: zabe: Backport for Stop loading PoolCounter extension (T327336) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 11:17 zabe@deploy1002: Started scap: Backport for Stop loading PoolCounter extension (T327336)
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43308 and previous config saved to /var/cache/conftool/dbconfig/20230124-111245-root.json
  • 11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
  • 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
  • 11:03 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 11:03 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 11:03 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 11:02 effie: depooling maps (kartotherian) from codfw, leaving eqiad as pooled
  • 11:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:59 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 10:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:58 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 10:58 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43306 and previous config saved to /var/cache/conftool/dbconfig/20230124-105740-root.json
  • 10:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS bullseye
  • 10:52 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 10:49 XioNoX: depool ulsfo for network maintenance - T316532
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl in s1 T326116', diff saved to https://phabricator.wikimedia.org/P43305 and previous config saved to /var/cache/conftool/dbconfig/20230124-104336-marostegui.json
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43304 and previous config saved to /var/cache/conftool/dbconfig/20230124-104235-root.json
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1176 from s1 T326116', diff saved to https://phabricator.wikimedia.org/P43303 and previous config saved to /var/cache/conftool/dbconfig/20230124-104219-root.json
  • 10:33 vgutierrez: repool cp4046
  • 10:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:31 vgutierrez: restarting varnish on cp4046
  • 10:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:29 vgutierrez: depool cp4046
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43302 and previous config saved to /var/cache/conftool/dbconfig/20230124-102730-root.json
  • 10:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:22 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 10:19 moritzm: rolling Apache/FPM restarts on mw canaries to pick up libtasn security update
  • 10:19 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2165 T327754', diff saved to https://phabricator.wikimedia.org/P43301 and previous config saved to /var/cache/conftool/dbconfig/20230124-101825-root.json
  • 10:17 effie: depooling maps from equad && pooling maps on codfw
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary T327754', diff saved to https://phabricator.wikimedia.org/P43300 and previous config saved to /var/cache/conftool/dbconfig/20230124-101727-root.json
  • 10:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:14 marostegui: Starting s8 codfw failover from db2165 to db2161 - T327754
  • 10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2041.codfw.wmnet with OS bullseye
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43299 and previous config saved to /var/cache/conftool/dbconfig/20230124-101025-root.json
  • 09:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
  • 09:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
  • 09:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2041.codfw.wmnet with reason: host reimage
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43298 and previous config saved to /var/cache/conftool/dbconfig/20230124-095520-root.json
  • 09:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 35 hosts with reason: Primary switchover s8 T327754
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 T327754', diff saved to https://phabricator.wikimedia.org/P43297 and previous config saved to /var/cache/conftool/dbconfig/20230124-095235-marostegui.json
  • 09:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 35 hosts with reason: Primary switchover s8 T327754
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P43296 and previous config saved to /var/cache/conftool/dbconfig/20230124-094725-root.json
  • 09:41 moritzm: installing libtasn1-6 security updates on buster
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43295 and previous config saved to /var/cache/conftool/dbconfig/20230124-094016-root.json
  • 09:39 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 09:39 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2041.codfw.wmnet with OS bullseye
  • 09:39 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P43294 and previous config saved to /var/cache/conftool/dbconfig/20230124-093220-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43293 and previous config saved to /var/cache/conftool/dbconfig/20230124-092511-root.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P43292 and previous config saved to /var/cache/conftool/dbconfig/20230124-091715-root.json
  • 09:14 kart_: Done: UTC morning backport window
  • 09:13 kartik@deploy1002: Finished scap: Backport for Remove Kartographer versioned mapdata flags (T326288) (duration: 09m 44s)
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43291 and previous config saved to /var/cache/conftool/dbconfig/20230124-091006-root.json
  • 09:05 kartik@deploy1002: awight and kartik: Backport for Remove Kartographer versioned mapdata flags (T326288) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:03 kartik@deploy1002: Started scap: Backport for Remove Kartographer versioned mapdata flags (T326288)
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P43290 and previous config saved to /var/cache/conftool/dbconfig/20230124-090210-root.json
  • 09:01 kartik@deploy1002: Finished scap: Backport for Deprecate the EnableMapFrame feature flag (T326288) (duration: 10m 42s)
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2096 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43289 and previous config saved to /var/cache/conftool/dbconfig/20230124-085501-root.json
  • 08:52 kartik@deploy1002: awight and kartik: Backport for Deprecate the EnableMapFrame feature flag (T326288) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:50 kartik@deploy1002: Started scap: Backport for Deprecate the EnableMapFrame feature flag (T326288)
  • 08:48 kartik@deploy1002: Finished scap: Backport for Enable write new for CheckUserLog comment fields on testwikis (T233004) (duration: 15m 20s)
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P43288 and previous config saved to /var/cache/conftool/dbconfig/20230124-084705-root.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add some weight to db2115 in x1 codfw', diff saved to https://phabricator.wikimedia.org/P43287 and previous config saved to /var/cache/conftool/dbconfig/20230124-084552-marostegui.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096 T327745', diff saved to https://phabricator.wikimedia.org/P43286 and previous config saved to /var/cache/conftool/dbconfig/20230124-084508-marostegui.json
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2115 to x1 codfw T327745', diff saved to https://phabricator.wikimedia.org/P43285 and previous config saved to /var/cache/conftool/dbconfig/20230124-084206-marostegui.json
  • 08:39 marostegui: Starting x1 codfw failover from db2096 to db2115 - T327745
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2115 with weight 0 T327745', diff saved to https://phabricator.wikimedia.org/P43284 and previous config saved to /var/cache/conftool/dbconfig/20230124-083643-marostegui.json
  • 08:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327745
  • 08:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Primary switchover x1 T327745
  • 08:35 kartik@deploy1002: dreamyjazz and kartik: Backport for Enable write new for CheckUserLog comment fields on testwikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:34 phedenskog@deploy1002: Finished deploy [performance/navtiming@8c87ca6]: (no justification provided) (duration: 00m 06s)
  • 08:34 phedenskog@deploy1002: Started deploy [performance/navtiming@8c87ca6]: (no justification provided)
  • 08:33 kartik@deploy1002: Started scap: Backport for Enable write new for CheckUserLog comment fields on testwikis (T233004)
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P43283 and previous config saved to /var/cache/conftool/dbconfig/20230124-083200-root.json
  • 08:28 kartik@deploy1002: Finished scap: Backport for Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727) (duration: 09m 09s)
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2110 from API T327739', diff saved to https://phabricator.wikimedia.org/P43282 and previous config saved to /var/cache/conftool/dbconfig/20230124-082440-marostegui.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 T327739', diff saved to https://phabricator.wikimedia.org/P43281 and previous config saved to /var/cache/conftool/dbconfig/20230124-082138-marostegui.json
  • 08:21 kartik@deploy1002: kartik and matmarex: Backport for Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary T327739', diff saved to https://phabricator.wikimedia.org/P43280 and previous config saved to /var/cache/conftool/dbconfig/20230124-082025-root.json
  • 08:19 kartik@deploy1002: Started scap: Backport for Add "Page Frame" to DiscussionTools beta feature on almost all wikis (T323727)
  • 08:18 marostegui: Starting s4 codfw failover from db2140 to db2110 - T327739
  • 08:16 kartik@deploy1002: Finished scap: Backport for Content Translation: Add campaign for Wiki Loves Living Heritage (T327587) (duration: 10m 25s)
  • 08:07 kartik@deploy1002: kartik: Backport for Content Translation: Add campaign for Wiki Loves Living Heritage (T327587) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:05 kartik@deploy1002: Started scap: Backport for Content Translation: Add campaign for Wiki Loves Living Heritage (T327587)
  • 07:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T327739
  • 07:58 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T327739
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 T327739', diff saved to https://phabricator.wikimedia.org/P43279 and previous config saved to /var/cache/conftool/dbconfig/20230124-075824-root.json
  • 07:50 moritzm: installing Linux 5.10.162 on Bullseye hosts
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl T327616', diff saved to https://phabricator.wikimedia.org/P43278 and previous config saved to /var/cache/conftool/dbconfig/20230124-074323-marostegui.json
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43277 and previous config saved to /var/cache/conftool/dbconfig/20230124-064905-ladsgroup.json
  • 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43276 and previous config saved to /var/cache/conftool/dbconfig/20230124-064554-ladsgroup.json
  • 06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43275 and previous config saved to /var/cache/conftool/dbconfig/20230124-063358-ladsgroup.json
  • 06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43274 and previous config saved to /var/cache/conftool/dbconfig/20230124-063048-ladsgroup.json
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P43273 and previous config saved to /var/cache/conftool/dbconfig/20230124-061852-ladsgroup.json
  • 06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43272 and previous config saved to /var/cache/conftool/dbconfig/20230124-061541-ladsgroup.json
  • 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43271 and previous config saved to /var/cache/conftool/dbconfig/20230124-060345-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 (T322618)', diff saved to https://phabricator.wikimedia.org/P43270 and previous config saved to /var/cache/conftool/dbconfig/20230124-060129-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43269 and previous config saved to /var/cache/conftool/dbconfig/20230124-060035-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T322618)', diff saved to https://phabricator.wikimedia.org/P43268 and previous config saved to /var/cache/conftool/dbconfig/20230124-055816-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 04:57 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.18 (duration: 02m 07s)
  • 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.20 refs T325583 (duration: 53m 01s)
  • 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.20 refs T325583
  • 03:30 AndyRussG: payments-wiki upgraded from 3d882ac7 to 15395d05
  • 02:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2024.codfw.wmnet
  • 02:27 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2024.codfw.wmnet
  • 02:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
  • 02:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
  • 02:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host restbase2019.codfw.wmnet
  • 02:04 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
  • 02:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
  • 01:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
  • 01:51 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
  • 01:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
  • 01:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
  • 01:26 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
  • 01:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1032.eqiad.wmnet
  • 01:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
  • 01:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1031.eqiad.wmnet
  • 01:06 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1031.eqiad.wmnet
  • 01:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
  • 00:55 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
  • 00:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
  • 00:47 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
  • 00:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
  • 00:38 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
  • 00:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
  • 00:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
  • 00:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1018.eqiad.wmnet
  • 00:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1018.eqiad.wmnet
  • 00:14 zabe@deploy1002: Finished scap: Backport for Use core's PoolCounterClient (T327336) (duration: 12m 47s)
  • 00:03 zabe@deploy1002: zabe: Backport for Use core's PoolCounterClient (T327336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 00:01 zabe@deploy1002: Started scap: Backport for Use core's PoolCounterClient (T327336)

2023-01-23

  • 23:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
  • 23:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
  • 23:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
  • 23:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
  • 23:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
  • 23:07 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
  • 22:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
  • 22:57 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 22:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 22:56 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@544f5f3]: 0.3.119 (duration: 07m 30s)
  • 22:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
  • 22:49 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.119` on canary `wdqs1003`; proceeding to rest of fleet
  • 22:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@544f5f3]: 0.3.119
  • 22:46 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.119`. Pre-deploy tests passing on canary `wdqs1003`
  • 22:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1017.eqiad.wmnet
  • 22:37 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1017.eqiad.wmnet
  • 22:31 maryum: Deployed patch for T285159
  • 21:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
  • 21:40 zabe@deploy1002: Finished scap: Backport for throttle: Remove expired rule (duration: 07m 45s)
  • 21:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
  • 21:34 zabe@deploy1002: zabe: Backport for throttle: Remove expired rule synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:32 zabe@deploy1002: Started scap: Backport for throttle: Remove expired rule
  • 21:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
  • 21:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
  • 21:12 kindrobot: close UTC late backport window
  • 21:12 kindrobot@deploy1002: Finished scap: Backport for Enable Page Tools for logged-in users on enwiki (T327686) (duration: 09m 00s)
  • 21:04 kindrobot@deploy1002: jdrewniak and kindrobot: Backport for Enable Page Tools for logged-in users on enwiki (T327686) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:03 kindrobot@deploy1002: Started scap: Backport for Enable Page Tools for logged-in users on enwiki (T327686)
  • 21:01 kindrobot: start UTC late backport window
  • 20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
  • 20:56 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
  • 20:45 taavi: restart T315510 on group1 after mwmaint restart, currently running on wikidatawiki
  • 19:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
  • 19:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
  • 19:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1019.eqiad.wmnet
  • 19:30 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1019.eqiad.wmnet
  • 19:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1016.eqiad.wmnet
  • 19:18 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
  • 19:17 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
  • 19:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
  • 19:16 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase1016.eqiad.wmnet
  • 19:16 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1016.eqiad.wmnet
  • 18:48 mutante: miscweb1002 - unload CAS apache module and config; apt-get remove libapache2-mod-auth-cas
  • 18:19 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load - apt-get remove libapache2-mod-auth-cas - T327405
  • 18:08 mutante: miscweb2002 - unlink /etc/apache2/mods-enabled/auth_cas.conf - unlink /etc/apache2/mods-enabled/auth_cas.load
  • 18:05 mutante: miscweb1002 - disabling puppet because latest merge would break apache if it runs, debugging in progress on inactive miscweb2002
  • 18:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
  • 18:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on miscweb2002.codfw.wmnet with reason: debugging on iactive server
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P43265 and previous config saved to /var/cache/conftool/dbconfig/20230123-175241-ladsgroup.json
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P43264 and previous config saved to /var/cache/conftool/dbconfig/20230123-173736-ladsgroup.json
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P43263 and previous config saved to /var/cache/conftool/dbconfig/20230123-172231-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P43262 and previous config saved to /var/cache/conftool/dbconfig/20230123-170726-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 16:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 16:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 16:48 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 48s)
  • 16:42 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 48s)
  • 16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
  • 16:41 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
  • 16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:40 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:35 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43261 and previous config saved to /var/cache/conftool/dbconfig/20230123-163207-root.json
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43260 and previous config saved to /var/cache/conftool/dbconfig/20230123-163138-root.json
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43259 and previous config saved to /var/cache/conftool/dbconfig/20230123-161702-root.json
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43258 and previous config saved to /var/cache/conftool/dbconfig/20230123-161633-root.json
  • 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43257 and previous config saved to /var/cache/conftool/dbconfig/20230123-160157-root.json
  • 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43256 and previous config saved to /var/cache/conftool/dbconfig/20230123-160126-root.json
  • 15:53 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.11-1wm1_amd64.changes: T326634
  • 15:50 urbanecm: Deploy security patch for T327613
  • 15:48 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 15:48 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43255 and previous config saved to /var/cache/conftool/dbconfig/20230123-154652-root.json
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43254 and previous config saved to /var/cache/conftool/dbconfig/20230123-154621-root.json
  • 15:44 papaul: on going maintenance on fasw-codfw
  • 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43253 and previous config saved to /var/cache/conftool/dbconfig/20230123-153147-root.json
  • 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 10%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43252 and previous config saved to /var/cache/conftool/dbconfig/20230123-153116-root.json
  • 15:17 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.1.4-1wm1_amd64.changes: T325563
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43251 and previous config saved to /var/cache/conftool/dbconfig/20230123-151642-root.json
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 5%: After adding a column', diff saved to https://phabricator.wikimedia.org/P43250 and previous config saved to /var/cache/conftool/dbconfig/20230123-151611-root.json
  • 15:09 taavi@deploy1002: Finished scap: Backport for Revert "Enable Linter write namespace tag and template using core config" (duration: 07m 28s)
  • 15:03 taavi@deploy1002: taavi and trainbranchbot: Backport for Revert "Enable Linter write namespace tag and template using core config" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 15:02 taavi@deploy1002: Started scap: Backport for Revert "Enable Linter write namespace tag and template using core config"
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P43248 and previous config saved to /var/cache/conftool/dbconfig/20230123-150110-marostegui.json
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P43247 and previous config saved to /var/cache/conftool/dbconfig/20230123-150018-marostegui.json
  • 15:00 taavi@deploy1002: Finished scap: Backport for Enable Linter write namespace tag and template using core config (T299612) (duration: 07m 56s)
  • 14:59 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 14:59 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 14:53 taavi@deploy1002: taavi and sbailey: Backport for Enable Linter write namespace tag and template using core config (T299612) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:52 taavi@deploy1002: Started scap: Backport for Enable Linter write namespace tag and template using core config (T299612)
  • 14:46 taavi@deploy1002: Finished scap: Backport for SpecialUserrights: Allow updating the expiry of user groups (T327605) (duration: 08m 48s)
  • 14:42 sukhe: rolling out pybal 1.15.10: T321191
  • 14:39 taavi@deploy1002: taavi and func: Backport for SpecialUserrights: Allow updating the expiry of user groups (T327605) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:37 taavi@deploy1002: Started scap: Backport for SpecialUserrights: Allow updating the expiry of user groups (T327605)
  • 14:37 taavi@deploy1002: Finished scap: Backport for zhwiki: Install PageAssessments (T326387) (duration: 11m 24s)
  • 14:27 taavi@deploy1002: stang and taavi: Backport for zhwiki: Install PageAssessments (T326387) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
  • 14:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
  • 14:25 taavi@deploy1002: Started scap: Backport for zhwiki: Install PageAssessments (T326387)
  • 14:25 taavi@deploy1002: Finished scap: Backport for bnwikiquote: Update logo (T323131), shnwikibooks: Add project logo (T327380) (duration: 09m 22s)
  • 14:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 14:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:18 taavi: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zhwiki pageassessments # T326387
  • 14:17 taavi@deploy1002: taavi and stang: Backport for bnwikiquote: Update logo (T323131), shnwikibooks: Add project logo (T327380) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:16 taavi@deploy1002: Started scap: Backport for bnwikiquote: Update logo (T323131), shnwikibooks: Add project logo (T327380)
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43246 and previous config saved to /var/cache/conftool/dbconfig/20230123-124532-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43245 and previous config saved to /var/cache/conftool/dbconfig/20230123-123025-ladsgroup.json
  • 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P43242 and previous config saved to /var/cache/conftool/dbconfig/20230123-121519-ladsgroup.json
  • 12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 12:06 marostegui: dbmaint Reboot db2135 (m5 codfw master)
  • 12:06 marostegui: dbmaint Reboot db2134 (m3 codfw master)
  • 12:05 Emperor: removing /usr/local/bin/prometheus-puppet-agent-stats from prometheus crontab on snapshot1014
  • 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43241 and previous config saved to /var/cache/conftool/dbconfig/20230123-120012-ladsgroup.json
  • 11:58 marostegui: dbmaint Reboot db2133 (m2 codfw master)
  • 11:57 marostegui: dbmaint Reboot db2132 (m1 codfw master)
  • 11:57 marostegui: Reboot db2132 (m1 codfw master)
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43239 and previous config saved to /var/cache/conftool/dbconfig/20230123-113506-ladsgroup.json
  • 11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2114 T327644', diff saved to https://phabricator.wikimedia.org/P43236 and previous config saved to /var/cache/conftool/dbconfig/20230123-112134-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43235 and previous config saved to /var/cache/conftool/dbconfig/20230123-112001-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2129 to s6 primary T327644', diff saved to https://phabricator.wikimedia.org/P43234 and previous config saved to /var/cache/conftool/dbconfig/20230123-111813-ladsgroup.json
  • 11:17 Amir1: Starting s6 codfw failover from db2114 to db2129 - T327644
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T323827)', diff saved to https://phabricator.wikimedia.org/P43233 and previous config saved to /var/cache/conftool/dbconfig/20230123-111147-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43232 and previous config saved to /var/cache/conftool/dbconfig/20230123-110456-ladsgroup.json
  • 10:55 XioNoX: update management routers ACLs to add new bast hosts
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2129 with weight 0 T327644', diff saved to https://phabricator.wikimedia.org/P43231 and previous config saved to /var/cache/conftool/dbconfig/20230123-105520-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
  • 10:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T327644
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2113 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43230 and previous config saved to /var/cache/conftool/dbconfig/20230123-104951-ladsgroup.json
  • 10:48 vgutierrez: rolling upgrade to HAProxy 2.4.20 on ulsfo
  • 10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 06s)
  • 10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
  • 10:40 btullis@deploy1002: Finished deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided) (duration: 00m 20s)
  • 10:40 btullis@deploy1002: Started deploy [analytics/superset/deploy@4ba1cb1]: (no justification provided)
  • 10:39 btullis@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
  • 10:39 btullis@deploy1002: Installing scap version "4.33.1" for 1 hosts
  • 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-tool1010.eqiad.wmnet with OS bullseye
  • 10:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
  • 10:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-tool1010.eqiad.wmnet with reason: host reimage
  • 10:07 ladsgroup@deploy1002: Finished scap: Backport for Remove Flow as default in techconductwiki (duration: 07m 51s)
  • 10:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-tool1010.eqiad.wmnet with OS bullseye
  • 10:01 ladsgroup@deploy1002: ladsgroup: Backport for Remove Flow as default in techconductwiki synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:59 ladsgroup@deploy1002: Started scap: Backport for Remove Flow as default in techconductwiki
  • 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 09:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 08:49 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:49 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
  • 08:48 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-create ripe-atlas-esams records as the host is back up - volans@cumin1001"
  • 08:46 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 08:45 zabe@deploy1002: Finished scap: Backport for Remove oversight group from privileged groups (T112147), Start reading from cuc_comment_id on wikidatawiki (T233004) (duration: 07m 48s)
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43229 and previous config saved to /var/cache/conftool/dbconfig/20230123-084326-marostegui.json
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1206 to vslow and dump group T326669', diff saved to https://phabricator.wikimedia.org/P43228 and previous config saved to /var/cache/conftool/dbconfig/20230123-084239-marostegui.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43227 and previous config saved to /var/cache/conftool/dbconfig/20230123-084055-root.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43226 and previous config saved to /var/cache/conftool/dbconfig/20230123-084045-root.json
  • 08:39 zabe@deploy1002: zabe: Backport for Remove oversight group from privileged groups (T112147), Start reading from cuc_comment_id on wikidatawiki (T233004) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:37 zabe@deploy1002: Started scap: Backport for Remove oversight group from privileged groups (T112147), Start reading from cuc_comment_id on wikidatawiki (T233004)
  • 08:37 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 08s)
  • 08:36 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
  • 08:30 ladsgroup@deploy1002: Finished scap: Backport for Tweaks for new heading HTML structure (T327328 T327469) (duration: 17m 12s)
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43225 and previous config saved to /var/cache/conftool/dbconfig/20230123-082550-root.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43224 and previous config saved to /var/cache/conftool/dbconfig/20230123-082540-root.json
  • 08:22 ladsgroup@deploy1002: ladsgroup and matmarex: Backport for Tweaks for new heading HTML structure (T327328 T327469) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:12 ladsgroup@deploy1002: Started scap: Backport for Tweaks for new heading HTML structure (T327328 T327469)
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43223 and previous config saved to /var/cache/conftool/dbconfig/20230123-081045-root.json
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43222 and previous config saved to /var/cache/conftool/dbconfig/20230123-081035-root.json
  • 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43221 and previous config saved to /var/cache/conftool/dbconfig/20230123-080824-ladsgroup.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43220 and previous config saved to /var/cache/conftool/dbconfig/20230123-075540-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43219 and previous config saved to /var/cache/conftool/dbconfig/20230123-075530-root.json
  • 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43218 and previous config saved to /var/cache/conftool/dbconfig/20230123-075319-ladsgroup.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43217 and previous config saved to /var/cache/conftool/dbconfig/20230123-074035-root.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43216 and previous config saved to /var/cache/conftool/dbconfig/20230123-074025-root.json
  • 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43215 and previous config saved to /var/cache/conftool/dbconfig/20230123-073814-ladsgroup.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43214 and previous config saved to /var/cache/conftool/dbconfig/20230123-072530-root.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After changing s1 sanitarium master', diff saved to https://phabricator.wikimedia.org/P43213 and previous config saved to /var/cache/conftool/dbconfig/20230123-072520-root.json
  • 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43212 and previous config saved to /var/cache/conftool/dbconfig/20230123-072309-ladsgroup.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 db1206 T326669', diff saved to https://phabricator.wikimedia.org/P43211 and previous config saved to /var/cache/conftool/dbconfig/20230123-071323-marostegui.json
  • 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 06:23 kart_: Updated cxserver to 2023-01-20-051603-production (T323840, T326236)
  • 06:19 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:18 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 06:17 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 06:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 06:16 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:12 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 05:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 04:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2113 T327611', diff saved to https://phabricator.wikimedia.org/P43210 and previous config saved to /var/cache/conftool/dbconfig/20230123-045939-ladsgroup.json
  • 04:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2123 to s5 primary T327611', diff saved to https://phabricator.wikimedia.org/P43209 and previous config saved to /var/cache/conftool/dbconfig/20230123-045740-ladsgroup.json
  • 04:57 Amir1: Starting s5 codfw failover from db2113 to db2123 - T327611
  • 04:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 04:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2123 with weight 0 T327611', diff saved to https://phabricator.wikimedia.org/P43208 and previous config saved to /var/cache/conftool/dbconfig/20230123-043324-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
  • 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T327611
  • 04:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 04:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2107 T327609', diff saved to https://phabricator.wikimedia.org/P43207 and previous config saved to /var/cache/conftool/dbconfig/20230123-035458-ladsgroup.json
  • 03:52 Amir1: Starting s2 codfw failover from db2107 to db2104 - T327609

2023-01-20

  • 18:22 jynus: deploying new grants for backups on m1 T327155
  • 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 16:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:28 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
  • 13:08 moritzm: installing node-minimatch security updates
  • 13:01 moritzm: installing libxstream-java security updates
  • 13:00 sukhe: reprepro --ignore=wrongdistribution -C main include bullseye-wikimedia cadvisor_0.44.0+ds1-1~wmf1_amd64.changes: T325557
  • 12:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
  • 12:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2040.codfw.wmnet with OS bullseye
  • 12:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
  • 12:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2040.codfw.wmnet with reason: host reimage
  • 12:17 moritzm: installing ping1003 T273509
  • 12:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2040.codfw.wmnet with OS bullseye
  • 12:03 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 12:02 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
  • 10:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
  • 10:32 elukey: restart kubelet on ml-staging200* nodes (some fs-inotify-related issues with the istio-proxy of newly created containers)
  • 10:27 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:13 moritzm: installing emacs security updates on bullseye
  • 10:13 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:12 moritzm: imported jenkins 2.375-2 to thirdparty/ci T326531
  • 10:00 jnuche@deploy1002: Installation of scap version "4.33.1" completed for 1 hosts
  • 10:00 jnuche@deploy1002: Installing scap version "4.33.1" for 1 hosts
  • 08:59 moritzm: installing ping2003 T273509
  • 08:10 elukey: restart kubelet on kubernetes2007 - node reported issues with it, marked as "notready" by the control plane
  • 07:58 elukey: `apt-get clean` on doh4001 to free space (root partition almost filled)
  • 01:55 ejegg: payments-wiki upgraded from 3cf03933 to 3d882ac7
  • 01:12 ejegg: payments-wiki upgraded from fcb9ab60 to 3cf03933

2023-01-19

  • 21:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2039.codfw.wmnet with OS bullseye
  • 21:42 jdrewniak@deploy1002: Finished scap: Backport for Enable Page tools on viwiki and itwiki (T327348) (duration: 10m 38s)
  • 21:33 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for Enable Page tools on viwiki and itwiki (T327348) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
  • 21:31 jdrewniak@deploy1002: Started scap: Backport for Enable Page tools on viwiki and itwiki (T327348)
  • 21:27 jdrewniak@deploy1002: Finished scap: Backport for Fix grid blowout with limited width turned off (T327423) (duration: 08m 26s)
  • 21:27 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2039.codfw.wmnet with reason: host reimage
  • 21:20 cwhite@deploy1002: Finished deploy [releng/phatality@e0bb573]: (no justification provided) (duration: 00m 13s)
  • 21:20 cwhite@deploy1002: Started deploy [releng/phatality@e0bb573]: (no justification provided)
  • 21:20 jdrewniak@deploy1002: jdlrobson and jdrewniak: Backport for Fix grid blowout with limited width turned off (T327423) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:18 jdrewniak@deploy1002: Started scap: Backport for Fix grid blowout with limited width turned off (T327423)
  • 21:11 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2039.codfw.wmnet with OS bullseye
  • 20:13 zabe@deploy1002: Finished scap: fix k8s drift (duration: 08m 02s)
  • 20:05 zabe@deploy1002: Started scap: fix k8s drift
  • 20:02 zabe@deploy1002: Finished scap: Backport for Start reading from cuc_comment_id everywhere except wikidatawiki (T233004) (duration: 14m 01s)
  • 19:49 zabe@deploy1002: zabe: Backport for Start reading from cuc_comment_id everywhere except wikidatawiki (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 19:48 zabe@deploy1002: Started scap: Backport for Start reading from cuc_comment_id everywhere except wikidatawiki (T233004)
  • 18:36 zabe: re-start populateCucComment on wikidatawiki post-mwmaint-reboot in screen with --sleep 2, will take ~30 hours # T233004
  • 18:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:16 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:16 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:13 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 18:08 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 18:06 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 18:05 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 18:02 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 18:01 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 17:36 Amir1: bash Krinkle> Vatican Interm Papacy Runbook, § 5.1: Notify Wikipedia about incoming traffic.
  • 17:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2038.codfw.wmnet with OS bullseye
  • 17:13 zabe@deploy1002: Finished scap: T233004 (duration: 18m 50s)
  • 17:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
  • 16:58 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: host reimage
  • 16:54 zabe@deploy1002: Started scap: T233004
  • 16:54 zabe@deploy1002: backport aborted: (duration: 15m 22s)
  • 16:48 godog: roll-restart opensearch-dashboards in logstash collectors eqiad - T327161
  • 16:44 zabe@deploy1002: Started scap: Backport for Add ability to start from cuc_id to populateCucComment (T233004)
  • 16:42 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2038.codfw.wmnet with OS bullseye
  • 16:27 moritzm: installing cryptsetup updates for bullseye
  • 16:18 jmm@cumin2002: END (FAIL) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=1) rolling restart_daemons on A:logstash-collector
  • 16:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
  • 16:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
  • 16:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:08 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
  • 16:06 jclark@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:55 sukhe: update pybal to 1.15.10 on lvs4010: T321191
  • 15:45 effie: enable puppet on C:memcached hosts
  • 15:42 godog: bounce opensearch on logstash102[34] - T327161
  • 15:30 sukhe: reprepro -C main include buster-wikimedia pybal_1.15.10_amd64.changes: T321191
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P43194 and previous config saved to /var/cache/conftool/dbconfig/20230119-151917-ladsgroup.json
  • 15:17 effie: disable puppet on all C:memcached servers to deploy 812173
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P43193 and previous config saved to /var/cache/conftool/dbconfig/20230119-150412-ladsgroup.json
  • 14:57 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P43192 and previous config saved to /var/cache/conftool/dbconfig/20230119-144907-ladsgroup.json
  • 14:47 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:40 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P43191 and previous config saved to /var/cache/conftool/dbconfig/20230119-143402-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 14:32 zabe: run populateCulComment on group2 wikis # T327290
  • 14:30 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:58 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:27 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps2009.codfw.wmnet
  • 12:19 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps2009.codfw.wmnet
  • 12:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 12:06 moritzm: stopping/masking slapd on ldap-corp1001/ldap-corp2001 T323820
  • 11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1054.eqiad.wmnet with OS bullseye
  • 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:29 hnowlan: rebooting maps-codfw for updates
  • 11:29 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host maps1009.eqiad.wmnet
  • 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf2004.codfw.wmnet
  • 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:24 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
  • 11:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet
  • 11:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 11:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
  • 11:18 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
  • 11:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1054.eqiad.wmnet with reason: host reimage
  • 11:13 filippo@cumin1001: START - Cookbook sre.dns.netbox
  • 11:09 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf2004.codfw.wmnet
  • 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts webperf1004.eqiad.wmnet
  • 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:08 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
  • 11:06 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: webperf1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - filippo@cumin1001"
  • 11:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1054.eqiad.wmnet with OS bullseye
  • 11:02 filippo@cumin1001: START - Cookbook sre.dns.netbox
  • 10:58 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts webperf1004.eqiad.wmnet
  • 10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 10:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 10:44 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 10:44 hnowlan: rebooting maps-eqiad for updates
  • 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
  • 10:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf2004.codfw.wmnet with reason: decom
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ping host - jmm@cumin2002"
  • 10:17 claime: Restarted maintenance scripts on mwmaint1002.eqiad.wmnet
  • 10:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ping host - jmm@cumin2002"
  • 10:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 10:15 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint1002.eqiad.wmnet
  • 10:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint1002.eqiad.wmnet
  • 10:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 10:06 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 10:05 claime: Stopping maintenance scripts on mwmaint1002.eqiad.wmnet for reboot
  • 09:55 moritzm: installing ping3003 T273509
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
  • 09:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ldap-corp[1001,2001].wikimedia.org with reason: Decommissioning
  • 09:24 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.19 refs T325582
  • 09:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 09:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 09:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 08:26 moritzm: installing sudo security updates
  • 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2118 T327372', diff saved to https://phabricator.wikimedia.org/P43190 and previous config saved to /var/cache/conftool/dbconfig/20230119-060449-ladsgroup.json
  • 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2121 to s7 primary T327372', diff saved to https://phabricator.wikimedia.org/P43189 and previous config saved to /var/cache/conftool/dbconfig/20230119-060316-ladsgroup.json
  • 06:02 Amir1: Starting s7 codfw failover from db2118 to db2121 - T327372
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2121 with weight 0 T327372', diff saved to https://phabricator.wikimedia.org/P43188 and previous config saved to /var/cache/conftool/dbconfig/20230119-054243-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372
  • 05:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 30 hosts with reason: Primary switchover s7 T327372

2023-01-18

  • 23:47 zabe: run populateCulComment.php on all group0 and group1 wikis # T327290
  • 23:42 cstone: civicrm upgraded from 164270b0 to f6093fb2
  • 22:35 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646
  • 22:03 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: raise heap memory to 12G - bking@cumin1001 - T323646
  • 21:50 kindrobot: close UTC late backport window
  • 21:50 kindrobot@deploy1002: Finished scap: Backport for [config]: Undeploy GDI Safety Survey Wave 4 (T327296) (duration: 10m 45s)
  • 21:41 kindrobot@deploy1002: essexigyan and kindrobot: Backport for [config]: Undeploy GDI Safety Survey Wave 4 (T327296) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:39 kindrobot@deploy1002: Started scap: Backport for [config]: Undeploy GDI Safety Survey Wave 4 (T327296)
  • 21:36 kindrobot@deploy1002: Finished scap: Backport for Bump English Wikipedia event logging from 0.5 to 1% (T326892), Legacy Vector is not a responsive skin (T327256) (duration: 13m 01s)
  • 21:25 kindrobot@deploy1002: kindrobot and jdlrobson: Backport for Bump English Wikipedia event logging from 0.5 to 1% (T326892), Legacy Vector is not a responsive skin (T327256) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:23 kindrobot@deploy1002: Started scap: Backport for Bump English Wikipedia event logging from 0.5 to 1% (T326892), Legacy Vector is not a responsive skin (T327256)
  • 21:08 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS bullseye
  • 21:05 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS bullseye
  • 21:03 kindrobot: start UTC late backport window
  • 20:54 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
  • 20:51 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
  • 20:49 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
  • 20:48 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
  • 20:36 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS bullseye
  • 20:35 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS bullseye
  • 20:34 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 20:34 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS buster
  • 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:52 bblack: db1129 and lvs1017: removed misconfigured IP address in wrong vlan from eno1 and /e/n/i
  • 19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS buster
  • 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
  • 19:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
  • 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
  • 19:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
  • 19:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS buster
  • 18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS buster
  • 18:21 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable the REST API on test-wikidata (T324999) (duration: 09m 38s)
  • 18:14 lucaswerkmeister-wmde@deploy1002: migr and lucaswerkmeister-wmde: Backport for Enable the REST API on test-wikidata (T324999) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 18:12 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable the REST API on test-wikidata (T324999)
  • 17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
  • 17:55 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
  • 17:44 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 560 hosts
  • 17:44 jnuche@deploy1002: Installing scap version "4.33.0" for 560 hosts
  • 17:42 jnuche@deploy1002: install-world aborted: (duration: 07m 17s)
  • 17:42 btullis@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
  • 17:41 btullis@deploy1002: Installing scap version "4.33.0" for 1 hosts
  • 17:35 jnuche@deploy1002: Installing scap version "4.33.0" for 561 hosts
  • 17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1037']
  • 17:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
  • 17:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logstash1037']
  • 17:09 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1037']
  • 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['logstash1036']
  • 16:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logstash1036']
  • 16:45 jnuche@deploy1002: Installation of scap version "4.33.0" completed for 1 hosts
  • 16:45 jnuche@deploy1002: Installing scap version "4.33.0" for 1 hosts
  • 16:39 jdrewniak@deploy1002: Finished scap: Backport for [100%] English Wikipedia uses Vector 2022 skin (duration: 09m 27s)
  • 16:31 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [100%] English Wikipedia uses Vector 2022 skin synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 16:29 jdrewniak@deploy1002: Started scap: Backport for [100%] English Wikipedia uses Vector 2022 skin
  • 16:20 jdrewniak@deploy1002: Finished scap: Backport for [75%] English Wikipedia uses Vector 2022 skin (T326892) (duration: 09m 24s)
  • 16:13 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [75%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 16:11 jdrewniak@deploy1002: Started scap: Backport for [75%] English Wikipedia uses Vector 2022 skin (T326892)
  • 16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
  • 16:06 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
  • 15:58 jdrewniak@deploy1002: Finished scap: Backport for [50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892) (duration: 08m 52s)
  • 15:51 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:49 jdrewniak@deploy1002: Started scap: Backport for [50%] English Wikipedia uses Vector 2022 skin, adds instrumentation (T326892)
  • 15:44 jdrewniak@deploy1002: Finished scap: Backport for [25%] English Wikipedia uses Vector 2022 skin (T326892) (duration: 09m 06s)
  • 15:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1052.eqiad.wmnet with OS bullseye
  • 15:37 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:37 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:36 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [25%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:35 jdrewniak@deploy1002: Started scap: Backport for [25%] English Wikipedia uses Vector 2022 skin (T326892)
  • 15:31 urandom: re-enabling Cassandra hinted-handoff for codfw -- T327001
  • 15:29 jdrewniak@deploy1002: Finished scap: Backport for [10%] English Wikipedia uses Vector 2022 skin (T326892) (duration: 11m 30s)
  • 15:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
  • 15:19 jdrewniak@deploy1002: jdrewniak and jdlrobson: Backport for [10%] English Wikipedia uses Vector 2022 skin (T326892) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 15:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1052.eqiad.wmnet with reason: host reimage
  • 15:17 jdrewniak@deploy1002: Started scap: Backport for [10%] English Wikipedia uses Vector 2022 skin (T326892)
  • 15:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990) (duration: 09m 11s)
  • 15:13 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
  • 15:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1052.eqiad.wmnet with OS bullseye
  • 15:06 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:05 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert gallery changes in 1.40.0-wmf.18 & .19 (T326990)
  • 15:04 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert gallery changes in 1.40.0-wmf.18 (T326990) (duration: 13m 04s)
  • 15:01 bblack: cp2031: rebooting to gather more information (still downtimed + depooled)
  • 14:57 moritzm: uploaded python-jose 3.3.0+dfsg-4~wmf11u1 to apt.wikmedia.org (needed by python-social-auth/Bitu)
  • 14:53 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and matmarex: Backport for Revert gallery changes in 1.40.0-wmf.18 (T326990) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:51 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert gallery changes in 1.40.0-wmf.18 (T326990)
  • 14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Breaking upgrade: mapdata" (T327151) (duration: 10m 33s)
  • 14:37 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and wmde-fisch: Backport for Revert "Breaking upgrade: mapdata" (T327151) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Breaking upgrade: mapdata" (T327151)
  • 14:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Write to cul_reason[_plaintext]_id everywhere (T233004) (duration: 19m 54s)
  • 14:23 moritzm: installing mod-wsgi security updates
  • 14:16 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and dreamyjazz: Backport for Write to cul_reason[_plaintext]_id everywhere (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:14 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Write to cul_reason[_plaintext]_id everywhere (T233004)
  • 13:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
  • 13:16 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on webperf1004.eqiad.wmnet with reason: decom
  • 12:20 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
  • 11:54 volans: upgraded cumin on cumin1001 to 4.2.0-1+deb11u1
  • 11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
  • 11:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on 10 hosts with reason: Still not ready to add these new presto servers to the cluster - btullis
  • 11:42 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
  • 11:27 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
  • 11:16 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 11:16 volans@cumin1001: START - Cookbook sre.network.cf
  • 11:15 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 11:15 volans@cumin1001: START - Cookbook sre.network.cf
  • 11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1050.eqiad.wmnet with OS bullseye
  • 11:11 volans@cumin2002: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
  • 11:11 volans@cumin2002: START - Cookbook sre.network.cf
  • 11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
  • 11:10 volans@cumin1001: START - Cookbook sre.network.cf
  • 11:10 volans@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
  • 11:10 volans@cumin1001: START - Cookbook sre.network.cf
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 T326116', diff saved to https://phabricator.wikimedia.org/P43185 and previous config saved to /var/cache/conftool/dbconfig/20230118-110716-marostegui.json
  • 10:59 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 10:59 volans@cumin1001: START - Cookbook sre.network.cf
  • 10:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
  • 10:54 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: host reimage
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43184 and previous config saved to /var/cache/conftool/dbconfig/20230118-105106-marostegui.json
  • 10:49 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
  • 10:48 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
  • 10:43 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1050.eqiad.wmnet with OS bullseye
  • 10:21 zabe@deploy1002: Finished scap: Backport for Start reading from cuc_comment_id from a few wikis (T233004) (duration: 09m 17s)
  • 10:14 zabe@deploy1002: zabe and zabe: Backport for Start reading from cuc_comment_id from a few wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:12 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
  • 10:12 zabe@deploy1002: Started scap: Backport for Start reading from cuc_comment_id from a few wikis (T233004)
  • 09:51 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:51 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:49 godog: start migration from webperf1004 to arclamp1001 - T319434
  • 09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
  • 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
  • 09:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
  • 09:33 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
  • 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
  • 09:24 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.19 refs T325582 (duration: 08m 20s)
  • 09:15 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.19 refs T325582
  • 08:54 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
  • 08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet
  • 08:34 mvernon@cumin1001: conftool action : set/pooled=yes; selector: name=ms-fe2010.codfw.wmnet
  • 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query,name=codfw
  • 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-swift,name=codfw
  • 08:32 mvernon@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
  • 08:30 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
  • 07:56 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
  • 02:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
  • 02:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
  • 02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
  • 02:36 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
  • 01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
  • 01:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=cdn
  • 01:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
  • 01:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
  • 01:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
  • 01:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2031.codfw.wmnet with reason: downtimed, host unreachable
  • 01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
  • 01:02 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=cdn
  • 00:28 zabe: enwiki: rename the "discretionary sanctions alert" tag to "contentious topics alert" # T327118
  • 00:26 zabe@deploy1002: Finished scap: Backport for Add script to rename a change tag in wmf prod (T327118) (duration: 08m 29s)
  • 00:20 zabe@deploy1002: zabe and zabe: Backport for Add script to rename a change tag in wmf prod (T327118) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 00:18 zabe@deploy1002: Started scap: Backport for Add script to rename a change tag in wmf prod (T327118)
  • 00:08 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=180p.vp9.webm # T312153
  • 00:07 zabe: mwscript extensions/TimedMediaHandler/maintenance/requeueTranscodes.php --wiki=testwiki --key=120p.vp9.webm # T312153

2023-01-17

2023-01-16

  • 17:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
  • 17:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
  • 17:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 17:04 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 17:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1044.eqiad.wmnet with OS bullseye
  • 16:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
  • 16:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: host reimage
  • 16:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1044.eqiad.wmnet with OS bullseye
  • 16:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1042.eqiad.wmnet with OS bullseye
  • 16:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
  • 15:59 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: host reimage
  • 15:47 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1042.eqiad.wmnet with OS bullseye
  • 13:35 XioNoX: disable one of 3 cr1-cr2 eqiad links - T304712
  • 13:34 XioNoX: repool eqiad-eqord link - T304712
  • 12:56 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[1234].eqiad.wmnet
  • 12:55 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
  • 12:50 XioNoX: drain eqiad-eqord link - T304712
  • 12:47 hnowlan@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
  • 12:43 Amir1: power cycled db1198
  • 12:36 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes100[5-9].eqiad.wmnet
  • 12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[5-9].eqiad.wmnet
  • 12:35 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102[012].eqiad.wmnet
  • 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes102.eqiad.wmnet
  • 12:05 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
  • 12:02 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[123].eqiad.wmnet
  • 11:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:32 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
  • 11:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
  • 10:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 10:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:48 moritzm: installing libtasn1-6 security updates on Bullseye
  • 10:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
  • 08:55 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
  • 08:46 elukey: powercycle an-worker1125 - soft lockup traces registered in the tty, host frozen
  • 08:14 oblivian@deploy1002: Synchronized README: test null deployment for T327041 (duration: 07m 12s)
  • 08:09 Emperor: stopped swift_rclone_sync on ms-be1069
  • 07:48 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse20(0[6-9]|10).codfw.wmnet
  • 07:44 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw23([12][0-9]|3[0-4]).codfw.wmnet
  • 07:41 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw22(59|6[0-9]|70).codfw.wmnet
  • 07:26 _joe_: restarting pybal on lvs2009
  • 07:10 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=(mw.*|appservers|api)-ro,name=codfw
  • 07:10 _joe_: depooling mediawiki in codfw
  • 06:47 XioNoX: add 2001:67c:930::/48 to network:external in data.yaml
  • 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
  • 06:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maint
  • 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1198 maint', diff saved to https://phabricator.wikimedia.org/P43157 and previous config saved to /var/cache/conftool/dbconfig/20230116-062211-ladsgroup.json
  • 02:25 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=parsoid-php
  • 02:05 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,service=nginx
  • 02:01 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver,service=nginx
  • 01:51 ladsgroup@cumin1001: conftool action : set/pooled=yes; selector: name=mw2283.codfw.wmnet
  • 01:35 Amir1: rolling restart of php-fpm across the fleet
  • 01:30 thcipriani: 01:29:56 php-fpm-restart: 100% (in-flight: 0; ok: 184; fail: 112; left: 0)
  • 01:29 thcipriani@deploy1002: Finished scap: Backport for LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788) (duration: 24m 47s)
  • 01:15 thcipriani@deploy1002: thcipriani and func: Backport for LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 01:05 thcipriani@deploy1002: Started scap: Backport for LanguageDropdown: Check if the page is in talk namespaces instead (T316559 T326788)

2023-01-14

  • 09:46 godog: issue 'request system reboot member 2' - T327001
  • 09:20 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
  • 09:19 Emperor: depool thanos-fe2002 T327001
  • 09:19 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=ms-fe2010.codfw.wmnet
  • 09:19 Emperor: depool ms-fe2010 T327001

2023-01-13

  • 23:39 mutante: people2002 - systemctl reset-failed after removing auto_restart_rsync timers
  • 22:26 mutante: mirror1001 - systemctl start update-ubuntu-mirror (sometimes sync fails)
  • 20:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1011']
  • 20:58 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
  • 20:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1011']
  • 20:49 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
  • 20:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['druid1011']
  • 20:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1011']
  • 20:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1010']
  • 20:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
  • 20:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1010']
  • 20:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['druid1009']
  • 20:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
  • 20:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['druid1009']
  • 20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1010']
  • 20:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['druid1009']
  • 20:04 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aphlict2001.codfw.wmnet
  • 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bullseye
  • 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict2001.codfw.wmnet on all recursors
  • 19:54 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache aphlict2001.codfw.wmnet on all recursors
  • 19:54 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:54 dzahn@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
  • 19:52 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aphlict2001.codfw.wmnet - dzahn@cumin2002"
  • 19:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:49 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 19:49 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host aphlict2001.codfw.wmnet
  • 19:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1001.eqiad.wmnet with OS bullseye
  • 19:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
  • 19:34 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
  • 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
  • 19:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
  • 19:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bullseye
  • 18:25 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Green Giant" "Cromium" # T298707
  • 17:34 thcipriani@deploy1002: Finished scap: Backport for TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125) (duration: 13m 25s)
  • 17:22 thcipriani@deploy1002: thcipriani and abi: Backport for TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 17:20 thcipriani@deploy1002: Started scap: Backport for TranslationNotificationsSubmitJob: Ensure LanguageSet is in proper format (T63125)
  • 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1004.eqiad.wmnet with OS bullseye
  • 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:24 jynus: restarted again update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
  • 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastion - jmm@cumin2002"
  • 15:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-mariadb1001.eqiad.wmnet with OS bullseye
  • 15:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastion - jmm@cumin2002"
  • 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-coord1003.eqiad.wmnet with OS bullseye
  • 15:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
  • 15:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1004.eqiad.wmnet with reason: host reimage
  • 15:11 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1004.eqiad.wmnet with OS bullseye
  • 14:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
  • 14:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1003.eqiad.wmnet with reason: host reimage
  • 14:49 volans: uploaded cumin_4.2.0 to apt.wikimedia.org bullseye-wikimedia
  • 14:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host an-coord1003.eqiad.wmnet with OS bullseye
  • 12:48 moritzm: installing bast6002 T324974
  • 12:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
  • 12:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2002.wikimedia.org with reason: troubeleshoot backup restore on gitlab replica
  • 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new bastions - jmm@cumin2002"
  • 11:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new bastions - jmm@cumin2002"
  • 10:53 moritzm: installing bast5003 T324974
  • 10:49 jynus: restarting update-ubuntu-mirror on mirror1001 due to remote server concurrency issues
  • 09:41 moritzm: installing bast4004 T324974
  • 09:06 moritzm: installing bast3006 T324974
  • 02:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1002']
  • 01:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
  • 01:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-mariadb1001']
  • 01:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
  • 01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1002']
  • 01:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-mariadb1001']
  • 01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1002']
  • 01:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-mariadb1001']
  • 01:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1004']
  • 01:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
  • 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts ['an-coord1003']
  • 01:02 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
  • 00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1004']
  • 00:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=True) upgrade firmware for hosts ['an-coord1003']
  • 00:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1004']
  • 00:40 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-coord1003']
  • 00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:15 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-mariadb1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED

2023-01-12

  • 23:53 zabe: start running cuc_comment_id population script on rest of sections in screens with --sleep 2 # T233004
  • 23:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host an-coord1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:13 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3 (duration: 02m 31s)
  • 23:10 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@99a3e6f]: import_cirrus_index: use spark3
  • 23:08 sbassett: Deployed (temporary) security mitigations for T326691
  • 22:45 mutante: people2002 - apt-get remove --purge rsync
  • 22:08 zabe: start of "foreachwikiindblist s3.dblist extensions/CheckUser/maintenance/populateCucComment.php" in a screen in mwmaint1002 # T233004
  • 22:07 thcipriani: end UTC late backport
  • 22:06 thcipriani@deploy1002: Finished scap: Backport for cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), cirrus: Disable incoming link counting (T317023) (duration: 09m 23s)
  • 21:59 krinkle@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 08s)
  • 21:59 krinkle@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
  • 21:59 Krinkle: krinkle@deploy1002$ `scap install-world -v --limit-hosts` for webperf1003.eqiad and webperf2003.codfw, ref T326668
  • 21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
  • 21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
  • 21:58 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), cirrus: Disable incoming link counting (T317023) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:58 krinkle@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
  • 21:58 krinkle@deploy1002: Installing scap version "4.32.0" for 1 hosts
  • 21:57 thcipriani@deploy1002: Started scap: Backport for cirrus: Divert requests with x-public-cloud set to a dedicated pool counter (T326757), cirrus: Disable incoming link counting (T317023)
  • 21:56 zabe: run populateCucComment.php on testwiki # T233004
  • 21:48 thcipriani@deploy1002: Finished scap: Backport for nlwiki: Add block right to checkuser group (T326355) (duration: 09m 04s)
  • 21:41 thcipriani@deploy1002: thcipriani and stang: Backport for nlwiki: Add block right to checkuser group (T326355) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:39 thcipriani@deploy1002: Started scap: Backport for nlwiki: Add block right to checkuser group (T326355)
  • 21:37 thcipriani@deploy1002: Finished scap: Backport for looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757) (duration: 09m 10s)
  • 21:30 thcipriani@deploy1002: thcipriani and ebernhardson: Backport for looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:28 thcipriani@deploy1002: Started scap: Backport for looksLikeAutomation: Allow flagging requests from arbitrary headers (T326757)
  • 21:27 thcipriani@deploy1002: Finished scap: Backport for etwikiquote: Switch logo variant back (T313698) (duration: 09m 25s)
  • 21:21 ejegg: restarted fundraising scheduled jobs
  • 21:19 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
  • 21:19 thcipriani@deploy1002: thcipriani and stang: Backport for etwikiquote: Switch logo variant back (T313698) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:17 thcipriani@deploy1002: Started scap: Backport for etwikiquote: Switch logo variant back (T313698)
  • 21:16 thcipriani@deploy1002: Finished scap: Backport for Remove Beta Feature for Realtime Preview and enable on plwiki (T323033) (duration: 10m 43s)
  • 21:07 thcipriani@deploy1002: thcipriani and samwilson: Backport for Remove Beta Feature for Realtime Preview and enable on plwiki (T323033) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:05 thcipriani@deploy1002: Started scap: Backport for Remove Beta Feature for Realtime Preview and enable on plwiki (T323033)
  • 20:43 ejegg: rolled back CiviCRM to 9afd2789
  • 20:31 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
  • 20:29 ejegg: disabled fundraising scheduled jobs for civi deploy
  • 20:08 brett: Setting thread_pool_max for varnish-frontend to 12000
  • 19:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1176 T326116', diff saved to https://phabricator.wikimedia.org/P43148 and previous config saved to /var/cache/conftool/dbconfig/20230112-195922-marostegui.json
  • 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 to LB with just 1% weight T326116', diff saved to https://phabricator.wikimedia.org/P43147 and previous config saved to /var/cache/conftool/dbconfig/20230112-195651-marostegui.json
  • 19:55 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1176 (mariadb 11) to dbctl, depooled T326116', diff saved to https://phabricator.wikimedia.org/P43146 and previous config saved to /var/cache/conftool/dbconfig/20230112-195514-marostegui.json
  • 19:11 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.18 refs T325581
  • 18:36 mutante: stat1008 - systemctl reset-failed - clears Icinga alerts from failed things of the past
  • 18:35 mutante: stat1007 - systemctl reset-failed - clears Icinga alerts
  • 18:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
  • 18:18 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2040.codfw.wmnet with reason: hardware troubleshooting
  • 17:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 17:45 mutante: powercycling mc2040 via mgmt ocnsole
  • 17:34 ejegg: civicrm rolled back from 7ecb5038 to 9afd2789
  • 17:08 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 17:08 btullis@cumin1001: Added views for new wiki: aswikiquote T321294
  • 17:05 ejegg: civicrm upgraded from 9afd2789 to 7ecb5038
  • 16:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 16:48 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 16:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 16:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 16:43 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 16:34 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 16:31 zabe@deploy1002: Finished scap: Backport for Stop writing to cul_user and cul_user_text on a few wikis (T233004), Start writing to rev_comment_id on group1 wikis (T299954) (duration: 09m 49s)
  • 16:23 zabe@deploy1002: zabe and zabe: Backport for Stop writing to cul_user and cul_user_text on a few wikis (T233004), Start writing to rev_comment_id on group1 wikis (T299954) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 16:21 zabe@deploy1002: Started scap: Backport for Stop writing to cul_user and cul_user_text on a few wikis (T233004), Start writing to rev_comment_id on group1 wikis (T299954)
  • 16:14 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 16:08 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 16:08 btullis@cumin1001: Added views for new wiki: bjnwiktionary T312214
  • 15:47 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
  • 15:46 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
  • 15:44 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 15:36 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 15:36 btullis@cumin1001: Added views for new wiki: shnwikibooks T321256
  • 15:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1014.eqiad.wmnet
  • 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 15:28 effie: Planet import in codfw (on maps2009) started at 15:26 UTC - T314472
  • 15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
  • 15:11 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
  • 15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
  • 15:05 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
  • 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2002.codfw.wmnet
  • 14:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 14:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43138 and previous config saved to /var/cache/conftool/dbconfig/20230112-145441-marostegui.json
  • 14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe2002.codfw.wmnet
  • 14:50 moritzm: installing postgresql-11 security updates on puppetdb1002
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1002.eqiad.wmnet
  • 14:42 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 14:42 btullis@cumin1001: Added views for new wiki: guwwikiquote T321288
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43137 and previous config saved to /var/cache/conftool/dbconfig/20230112-143934-marostegui.json
  • 14:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1002.eqiad.wmnet
  • 14:37 moritzm: installing sqlite3 security updates on buster
  • 14:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1040.eqiad.wmnet with OS bullseye
  • 14:34 taavi: UTC afternoon backports done
  • 14:28 taavi@deploy1002: Finished scap: Backport for Track callers of parseRevisionParsoidHtml. (duration: 09m 34s)
  • 14:26 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0)
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P43136 and previous config saved to /var/cache/conftool/dbconfig/20230112-142428-marostegui.json
  • 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
  • 14:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
  • 14:20 taavi@deploy1002: taavi and matmarex: Backport for Track callers of parseRevisionParsoidHtml. synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
  • 14:18 taavi@deploy1002: Started scap: Backport for Track callers of parseRevisionParsoidHtml.
  • 14:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: host reimage
  • 14:17 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 14:16 taavi@deploy1002: Finished scap: Backport for Allow administrators to revoke autopatroller rights on sh.WP (T325938) (duration: 13m 30s)
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43135 and previous config saved to /var/cache/conftool/dbconfig/20230112-140921-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T321391)', diff saved to https://phabricator.wikimedia.org/P43134 and previous config saved to /var/cache/conftool/dbconfig/20230112-140659-marostegui.json
  • 14:06 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1040.eqiad.wmnet with OS bullseye
  • 14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43133 and previous config saved to /var/cache/conftool/dbconfig/20230112-140649-marostegui.json
  • 14:05 taavi@deploy1002: taavi and aleksandar: Backport for Allow administrators to revoke autopatroller rights on sh.WP (T325938) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:03 taavi@deploy1002: Started scap: Backport for Allow administrators to revoke autopatroller rights on sh.WP (T325938)
  • 13:53 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43132 and previous config saved to /var/cache/conftool/dbconfig/20230112-135143-marostegui.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P43131 and previous config saved to /var/cache/conftool/dbconfig/20230112-133636-marostegui.json
  • 13:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 13:29 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 13:28 ladsgroup@deploy1002: Finished scap: Backport for Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY. (duration: 21m 44s)
  • 13:26 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 13:26 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43130 and previous config saved to /var/cache/conftool/dbconfig/20230112-132130-marostegui.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T321391)', diff saved to https://phabricator.wikimedia.org/P43129 and previous config saved to /var/cache/conftool/dbconfig/20230112-131908-marostegui.json
  • 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 13:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43128 and previous config saved to /var/cache/conftool/dbconfig/20230112-131847-marostegui.json
  • 13:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:13 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 13:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 13:08 ladsgroup@deploy1002: ladsgroup and daniel: Backport for Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY. synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:06 ladsgroup@deploy1002: Started scap: Backport for Remove obsolete MWMinimalScriptInit and MEDIAWIKI_MAINT_INIT_ONLY.
  • 13:05 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 13:05 btullis@cumin1001: Added views for new wiki: gorwiktionary T326138
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43127 and previous config saved to /var/cache/conftool/dbconfig/20230112-130341-marostegui.json
  • 12:58 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 12:56 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P43125 and previous config saved to /var/cache/conftool/dbconfig/20230112-124834-marostegui.json
  • 12:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 12:41 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43123 and previous config saved to /var/cache/conftool/dbconfig/20230112-123328-marostegui.json
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T321391)', diff saved to https://phabricator.wikimedia.org/P43122 and previous config saved to /var/cache/conftool/dbconfig/20230112-123106-marostegui.json
  • 12:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43121 and previous config saved to /var/cache/conftool/dbconfig/20230112-123045-marostegui.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43120 and previous config saved to /var/cache/conftool/dbconfig/20230112-121538-marostegui.json
  • 12:13 XioNoX: repool esams
  • 12:10 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:09 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:09 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:08 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:08 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P43119 and previous config saved to /var/cache/conftool/dbconfig/20230112-120032-marostegui.json
  • 11:54 XioNoX: re-seating cr2-esams fpc0 linecard - T318783
  • 11:52 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43116 and previous config saved to /var/cache/conftool/dbconfig/20230112-114524-marostegui.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T321391)', diff saved to https://phabricator.wikimedia.org/P43115 and previous config saved to /var/cache/conftool/dbconfig/20230112-114302-marostegui.json
  • 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1176.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43114 and previous config saved to /var/cache/conftool/dbconfig/20230112-114212-marostegui.json
  • 11:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 11:39 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 11:37 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 11:29 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43113 and previous config saved to /var/cache/conftool/dbconfig/20230112-112705-marostegui.json
  • 11:24 urbanecm@deploy1002: Finished scap: Backport for throttle: Add new rule for cswiki course (T326792) (duration: 07m 47s)
  • 11:17 urbanecm@deploy1002: Started scap: Backport for throttle: Add new rule for cswiki course (T326792)
  • 11:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25885
  • 11:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25885
  • 11:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3303
  • 11:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
  • 11:12 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3302
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P43112 and previous config saved to /var/cache/conftool/dbconfig/20230112-111159-marostegui.json
  • 11:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
  • 11:11 zabe: mwscript extensions/GlobalBlocking/maintenance/FixBlockerUsername.php --wiki metawiki "Defender" "Elton" # T298707
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43111 and previous config saved to /var/cache/conftool/dbconfig/20230112-105652-marostegui.json
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T321391)', diff saved to https://phabricator.wikimedia.org/P43110 and previous config saved to /var/cache/conftool/dbconfig/20230112-105430-marostegui.json
  • 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43109 and previous config saved to /var/cache/conftool/dbconfig/20230112-105358-marostegui.json
  • 10:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 36 hosts
  • 10:49 ayounsi@cumin1001: START - Cookbook sre.hosts.remove-downtime for 36 hosts
  • 10:41 hashar@deploy1002: Finished deploy [integration/docroot@577d68a]: zuul: Link to report_url if available (duration: 00m 14s)
  • 10:41 hashar@deploy1002: Started deploy [integration/docroot@577d68a]: zuul: Link to report_url if available
  • 10:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
  • 10:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8674
  • 10:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8932
  • 10:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8932
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43108 and previous config saved to /var/cache/conftool/dbconfig/20230112-103852-marostegui.json
  • 10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 10:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 10:24 XioNoX: rollback redirect ns2 to authdns1001 - T316532
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P43107 and previous config saved to /var/cache/conftool/dbconfig/20230112-102345-marostegui.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43106 and previous config saved to /var/cache/conftool/dbconfig/20230112-100839-marostegui.json
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T321391)', diff saved to https://phabricator.wikimedia.org/P43105 and previous config saved to /var/cache/conftool/dbconfig/20230112-100616-marostegui.json
  • 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43104 and previous config saved to /var/cache/conftool/dbconfig/20230112-100456-marostegui.json
  • 10:01 XioNoX: reboot asw2-esams for upgrade - T316532
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3003.esams.wmnet
  • 09:58 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 09:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
  • 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping3003.esams.wmnet on all recursors
  • 09:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping3003.esams.wmnet on all recursors
  • 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
  • 09:53 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping3003.esams.wmnet - jmm@cumin2002"
  • 09:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping3003.esams.wmnet
  • 09:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43103 and previous config saved to /var/cache/conftool/dbconfig/20230112-094950-marostegui.json
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2003.codfw.wmnet
  • 09:47 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 09:47 btullis@cumin1001: Added views for new wiki: pcmwiki T310879
  • 09:46 XioNoX: redirect ns2 to authdns1001 - T316532
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping2003.codfw.wmnet on all recursors
  • 09:43 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping2003.codfw.wmnet on all recursors
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
  • 09:41 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2003.codfw.wmnet - jmm@cumin2002"
  • 09:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:39 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping2003.codfw.wmnet
  • 09:37 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P43102 and previous config saved to /var/cache/conftool/dbconfig/20230112-093443-marostegui.json
  • 09:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 36 hosts with reason: nework maintenance
  • 09:31 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 36 hosts with reason: nework maintenance
  • 09:25 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc1039.eqiad.wmnet
  • 09:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
  • 09:24 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mc1039.eqiad.wmnet
  • 09:22 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43101 and previous config saved to /var/cache/conftool/dbconfig/20230112-091937-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T321391)', diff saved to https://phabricator.wikimedia.org/P43100 and previous config saved to /var/cache/conftool/dbconfig/20230112-091716-marostegui.json
  • 09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43099 and previous config saved to /var/cache/conftool/dbconfig/20230112-091654-marostegui.json
  • 09:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43098 and previous config saved to /var/cache/conftool/dbconfig/20230112-090148-marostegui.json
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1003.eqiad.wmnet
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping1003.eqiad.wmnet on all recursors
  • 08:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping1003.eqiad.wmnet on all recursors
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
  • 08:55 phedenskog@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 22s)
  • 08:54 phedenskog@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
  • 08:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1003.eqiad.wmnet - jmm@cumin2002"
  • 08:54 phedenskog@deploy1002: Finished deploy [performance/navtiming@172cc22]: (no justification provided) (duration: 00m 17s)
  • 08:53 phedenskog@deploy1002: Started deploy [performance/navtiming@172cc22]: (no justification provided)
  • 08:50 XioNoX: depool esams for network maintenance - T316532
  • 08:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping1003.eqiad.wmnet
  • 08:49 zabe: deployed updated patch for T311337
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P43097 and previous config saved to /var/cache/conftool/dbconfig/20230112-084641-marostegui.json
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast5003.wikimedia.org
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43096 and previous config saved to /var/cache/conftool/dbconfig/20230112-083135-marostegui.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T321391)', diff saved to https://phabricator.wikimedia.org/P43095 and previous config saved to /var/cache/conftool/dbconfig/20230112-082813-marostegui.json
  • 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43094 and previous config saved to /var/cache/conftool/dbconfig/20230112-082752-marostegui.json
  • 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5003.wikimedia.org on all recursors
  • 08:17 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast5003.wikimedia.org on all recursors
  • 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
  • 08:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5003.wikimedia.org - jmm@cumin2002"
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43093 and previous config saved to /var/cache/conftool/dbconfig/20230112-081245-marostegui.json
  • 07:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast5003.wikimedia.org
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P43092 and previous config saved to /var/cache/conftool/dbconfig/20230112-075739-marostegui.json
  • 07:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43091 and previous config saved to /var/cache/conftool/dbconfig/20230112-074232-marostegui.json
  • 07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9584
  • 07:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 37002
  • 07:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 37002
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T321391)', diff saved to https://phabricator.wikimedia.org/P43090 and previous config saved to /var/cache/conftool/dbconfig/20230112-074010-marostegui.json
  • 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 07:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43089 and previous config saved to /var/cache/conftool/dbconfig/20230112-073949-marostegui.json
  • 07:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
  • 07:38 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43088 and previous config saved to /var/cache/conftool/dbconfig/20230112-072443-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P43087 and previous config saved to /var/cache/conftool/dbconfig/20230112-070936-marostegui.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43086 and previous config saved to /var/cache/conftool/dbconfig/20230112-065430-marostegui.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T321391)', diff saved to https://phabricator.wikimedia.org/P43085 and previous config saved to /var/cache/conftool/dbconfig/20230112-065208-marostegui.json
  • 06:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43084 and previous config saved to /var/cache/conftool/dbconfig/20230112-065147-marostegui.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43083 and previous config saved to /var/cache/conftool/dbconfig/20230112-063640-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P43082 and previous config saved to /var/cache/conftool/dbconfig/20230112-062134-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43081 and previous config saved to /var/cache/conftool/dbconfig/20230112-060627-marostegui.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T321391)', diff saved to https://phabricator.wikimedia.org/P43080 and previous config saved to /var/cache/conftool/dbconfig/20230112-060404-marostegui.json
  • 06:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43079 and previous config saved to /var/cache/conftool/dbconfig/20230112-060343-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43078 and previous config saved to /var/cache/conftool/dbconfig/20230112-054837-marostegui.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P43077 and previous config saved to /var/cache/conftool/dbconfig/20230112-053330-marostegui.json
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43076 and previous config saved to /var/cache/conftool/dbconfig/20230112-051823-marostegui.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1107 (T321391)', diff saved to https://phabricator.wikimedia.org/P43075 and previous config saved to /var/cache/conftool/dbconfig/20230112-051601-marostegui.json
  • 05:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 05:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43074 and previous config saved to /var/cache/conftool/dbconfig/20230112-051539-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43073 and previous config saved to /var/cache/conftool/dbconfig/20230112-050033-marostegui.json
  • 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P43072 and previous config saved to /var/cache/conftool/dbconfig/20230112-044526-marostegui.json
  • 04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43071 and previous config saved to /var/cache/conftool/dbconfig/20230112-043020-marostegui.json
  • 04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T321391)', diff saved to https://phabricator.wikimedia.org/P43070 and previous config saved to /var/cache/conftool/dbconfig/20230112-042757-marostegui.json
  • 04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 04:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43069 and previous config saved to /var/cache/conftool/dbconfig/20230112-042741-marostegui.json
  • 04:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43068 and previous config saved to /var/cache/conftool/dbconfig/20230112-041234-marostegui.json
  • 03:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P43067 and previous config saved to /var/cache/conftool/dbconfig/20230112-035727-marostegui.json
  • 03:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43066 and previous config saved to /var/cache/conftool/dbconfig/20230112-034221-marostegui.json
  • 03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43065 and previous config saved to /var/cache/conftool/dbconfig/20230112-033958-marostegui.json
  • 03:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 03:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43064 and previous config saved to /var/cache/conftool/dbconfig/20230112-033937-marostegui.json
  • 03:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43063 and previous config saved to /var/cache/conftool/dbconfig/20230112-032430-marostegui.json
  • 03:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P43062 and previous config saved to /var/cache/conftool/dbconfig/20230112-030924-marostegui.json
  • 02:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43061 and previous config saved to /var/cache/conftool/dbconfig/20230112-025417-marostegui.json
  • 02:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43060 and previous config saved to /var/cache/conftool/dbconfig/20230112-025153-marostegui.json
  • 02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 02:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 02:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 02:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 02:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43059 and previous config saved to /var/cache/conftool/dbconfig/20230112-020046-marostegui.json
  • 01:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43058 and previous config saved to /var/cache/conftool/dbconfig/20230112-014539-marostegui.json
  • 01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P43057 and previous config saved to /var/cache/conftool/dbconfig/20230112-013033-marostegui.json
  • 01:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43056 and previous config saved to /var/cache/conftool/dbconfig/20230112-011526-marostegui.json
  • 01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T321391)', diff saved to https://phabricator.wikimedia.org/P43055 and previous config saved to /var/cache/conftool/dbconfig/20230112-011302-marostegui.json
  • 01:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 01:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43054 and previous config saved to /var/cache/conftool/dbconfig/20230112-011241-marostegui.json
  • 00:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43053 and previous config saved to /var/cache/conftool/dbconfig/20230112-005734-marostegui.json
  • 00:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P43052 and previous config saved to /var/cache/conftool/dbconfig/20230112-004228-marostegui.json
  • 00:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43051 and previous config saved to /var/cache/conftool/dbconfig/20230112-002721-marostegui.json
  • 00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T321391)', diff saved to https://phabricator.wikimedia.org/P43050 and previous config saved to /var/cache/conftool/dbconfig/20230112-002457-marostegui.json
  • 00:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43049 and previous config saved to /var/cache/conftool/dbconfig/20230112-002436-marostegui.json
  • 00:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43048 and previous config saved to /var/cache/conftool/dbconfig/20230112-000929-marostegui.json

2023-01-11

  • 23:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P43047 and previous config saved to /var/cache/conftool/dbconfig/20230111-235423-marostegui.json
  • 23:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43045 and previous config saved to /var/cache/conftool/dbconfig/20230111-233916-marostegui.json
  • 23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T321391)', diff saved to https://phabricator.wikimedia.org/P43044 and previous config saved to /var/cache/conftool/dbconfig/20230111-233652-marostegui.json
  • 23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 23:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 23:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 23:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43043 and previous config saved to /var/cache/conftool/dbconfig/20230111-233616-marostegui.json
  • 23:22 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.18 refs T325581 (duration: 06m 57s)
  • 23:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43042 and previous config saved to /var/cache/conftool/dbconfig/20230111-232109-marostegui.json
  • 23:15 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.18 refs T325581
  • 23:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P43041 and previous config saved to /var/cache/conftool/dbconfig/20230111-230603-marostegui.json
  • 22:51 zabe@deploy1002: Finished scap: Backport for Start reading from cuc_actor on group0 and group1 wikis (T233004), Start writing to rev_comment_id on group0 wikis (T299954), Stop writing to cul_user and cul_user_text on testwiki (T233004) (duration: 09m 28s)
  • 22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43040 and previous config saved to /var/cache/conftool/dbconfig/20230111-225056-marostegui.json
  • 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43039 and previous config saved to /var/cache/conftool/dbconfig/20230111-224832-marostegui.json
  • 22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43038 and previous config saved to /var/cache/conftool/dbconfig/20230111-224810-marostegui.json
  • 22:44 zabe@deploy1002: zabe and zabe: Backport for Start reading from cuc_actor on group0 and group1 wikis (T233004), Start writing to rev_comment_id on group0 wikis (T299954), Stop writing to cul_user and cul_user_text on testwiki (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 22:42 zabe@deploy1002: Started scap: Backport for Start reading from cuc_actor on group0 and group1 wikis (T233004), Start writing to rev_comment_id on group0 wikis (T299954), Stop writing to cul_user and cul_user_text on testwiki (T233004)
  • 22:40 effie: upload memkeys_20181031-2~bullseye0_ on bullseye-wikimedia
  • 22:39 kindrobot: close UTC late backport window
  • {{safesubst:SAL entry|1=22:38 kindrobot@deploy1002: Finished scap: Backport for Fix exception in `<gallery mode="slideshow">` with missing images, Fix phan error when Excimer is enabled, Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T30106}}
  • 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43037 and previous config saved to /var/cache/conftool/dbconfig/20230111-223304-marostegui.json
  • {{safesubst:SAL entry|1=22:21 kindrobot@deploy1002: kindrobot and matmarex: Backport for Fix exception in `<gallery mode="slideshow">` with missing images, Fix phan error when Excimer is enabled, Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view}}
  • 22:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P43036 and previous config saved to /var/cache/conftool/dbconfig/20230111-221757-marostegui.json
  • 22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43035 and previous config saved to /var/cache/conftool/dbconfig/20230111-220251-marostegui.json
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T321391)', diff saved to https://phabricator.wikimedia.org/P43034 and previous config saved to /var/cache/conftool/dbconfig/20230111-220026-marostegui.json
  • 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43033 and previous config saved to /var/cache/conftool/dbconfig/20230111-220005-marostegui.json
  • {{safesubst:SAL entry|1=21:58 kindrobot@deploy1002: Started scap: Backport for Fix exception in `<gallery mode="slideshow">` with missing images, Fix phan error when Excimer is enabled, Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063 T326399), [[gerrit:879099|Revert "ChangeTags: When showing a tag, also link to a filtered RecentChanges view" (T301063}}
  • 21:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43031 and previous config saved to /var/cache/conftool/dbconfig/20230111-214458-marostegui.json
  • 21:34 kindrobot@deploy1002: Finished scap: Backport for Fix mustache template rendering when TOC is rerendered after an edit (T326682), Enable page tools on beta cluster (duration: 10m 17s)
  • 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P43030 and previous config saved to /var/cache/conftool/dbconfig/20230111-212952-marostegui.json
  • 21:25 kindrobot@deploy1002: kindrobot and jdrewniak and jdlrobson: Backport for Fix mustache template rendering when TOC is rerendered after an edit (T326682), Enable page tools on beta cluster synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:23 kindrobot@deploy1002: Started scap: Backport for Fix mustache template rendering when TOC is rerendered after an edit (T326682), Enable page tools on beta cluster
  • 21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43029 and previous config saved to /var/cache/conftool/dbconfig/20230111-211445-marostegui.json
  • 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T321391)', diff saved to https://phabricator.wikimedia.org/P43028 and previous config saved to /var/cache/conftool/dbconfig/20230111-211222-marostegui.json
  • 21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43027 and previous config saved to /var/cache/conftool/dbconfig/20230111-211200-marostegui.json
  • 21:06 kindrobot: start UTC late backport window
  • 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43025 and previous config saved to /var/cache/conftool/dbconfig/20230111-205654-marostegui.json
  • 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P43024 and previous config saved to /var/cache/conftool/dbconfig/20230111-204147-marostegui.json
  • 20:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43023 and previous config saved to /var/cache/conftool/dbconfig/20230111-203141-root.json
  • 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43022 and previous config saved to /var/cache/conftool/dbconfig/20230111-202641-marostegui.json
  • 20:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T321391)', diff saved to https://phabricator.wikimedia.org/P43021 and previous config saved to /var/cache/conftool/dbconfig/20230111-202417-marostegui.json
  • 20:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 20:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 20:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43020 and previous config saved to /var/cache/conftool/dbconfig/20230111-202345-marostegui.json
  • 20:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43019 and previous config saved to /var/cache/conftool/dbconfig/20230111-201636-root.json
  • 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43018 and previous config saved to /var/cache/conftool/dbconfig/20230111-200838-marostegui.json
  • 20:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43017 and previous config saved to /var/cache/conftool/dbconfig/20230111-200131-root.json
  • 19:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P43016 and previous config saved to /var/cache/conftool/dbconfig/20230111-195332-marostegui.json
  • 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43015 and previous config saved to /var/cache/conftool/dbconfig/20230111-194626-root.json
  • 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43014 and previous config saved to /var/cache/conftool/dbconfig/20230111-193825-marostegui.json
  • 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T321391)', diff saved to https://phabricator.wikimedia.org/P43013 and previous config saved to /var/cache/conftool/dbconfig/20230111-193601-marostegui.json
  • 19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 19:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 19:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43012 and previous config saved to /var/cache/conftool/dbconfig/20230111-193506-marostegui.json
  • 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43011 and previous config saved to /var/cache/conftool/dbconfig/20230111-193121-root.json
  • 19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/flink-app-example: apply
  • 19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/services/flink-app-example: apply
  • 19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43010 and previous config saved to /var/cache/conftool/dbconfig/20230111-192000-marostegui.json
  • 19:19 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:19 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43009 and previous config saved to /var/cache/conftool/dbconfig/20230111-191616-root.json
  • 19:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P43008 and previous config saved to /var/cache/conftool/dbconfig/20230111-190453-marostegui.json
  • 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: After being recloned', diff saved to https://phabricator.wikimedia.org/P43007 and previous config saved to /var/cache/conftool/dbconfig/20230111-190111-root.json
  • 18:57 marostegui: dbmaint deploy schema change with replication on s3 eqiad T321391
  • 18:52 brett: Removing legacy vips from dns servers - T239993
  • 18:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43006 and previous config saved to /var/cache/conftool/dbconfig/20230111-184946-marostegui.json
  • 18:47 marostegui: dbmaint deploy schema change with replication on s2 eqiad T321391
  • 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T321391)', diff saved to https://phabricator.wikimedia.org/P43005 and previous config saved to /var/cache/conftool/dbconfig/20230111-184723-marostegui.json
  • 18:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 18:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P43004 and previous config saved to /var/cache/conftool/dbconfig/20230111-184701-marostegui.json
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43003 and previous config saved to /var/cache/conftool/dbconfig/20230111-184051-root.json
  • 18:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level (duration: 02m 33s)
  • 18:33 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@5a19b9d]: drop-snapshots: Accept snapshot= partition from any level
  • 18:33 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 18:32 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43002 and previous config saved to /var/cache/conftool/dbconfig/20230111-183155-marostegui.json
  • 18:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 18:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:28 bblack: repool eqsin edge DC
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P43001 and previous config saved to /var/cache/conftool/dbconfig/20230111-182546-root.json
  • 18:22 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 18:22 btullis@cumin1001: Added views for new wiki: blkwiki T310872
  • 18:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P43000 and previous config saved to /var/cache/conftool/dbconfig/20230111-181648-marostegui.json
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42999 and previous config saved to /var/cache/conftool/dbconfig/20230111-181041-root.json
  • 18:09 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 18:09 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 18:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 18:07 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P42998 and previous config saved to /var/cache/conftool/dbconfig/20230111-180142-marostegui.json
  • 18:01 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 17:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 17:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T321391)', diff saved to https://phabricator.wikimedia.org/P42997 and previous config saved to /var/cache/conftool/dbconfig/20230111-175919-marostegui.json
  • 17:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 17:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42996 and previous config saved to /var/cache/conftool/dbconfig/20230111-175857-marostegui.json
  • 17:58 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 17:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
  • 17:55 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker[1080,1084].eqiad.wmnet with reason: Shutting down to enable RAID battery replacement
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42995 and previous config saved to /var/cache/conftool/dbconfig/20230111-175536-root.json
  • 17:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 17:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42994 and previous config saved to /var/cache/conftool/dbconfig/20230111-174351-marostegui.json
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42993 and previous config saved to /var/cache/conftool/dbconfig/20230111-174031-root.json
  • 17:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 17:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 17:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P42992 and previous config saved to /var/cache/conftool/dbconfig/20230111-172844-marostegui.json
  • 17:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42991 and previous config saved to /var/cache/conftool/dbconfig/20230111-172526-root.json
  • 17:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:20 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42989 and previous config saved to /var/cache/conftool/dbconfig/20230111-171338-marostegui.json
  • 17:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T321391)', diff saved to https://phabricator.wikimedia.org/P42988 and previous config saved to /var/cache/conftool/dbconfig/20230111-171114-marostegui.json
  • 17:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 1%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42987 and previous config saved to /var/cache/conftool/dbconfig/20230111-171021-root.json
  • 17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 17:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 17:04 marostegui: dbmaint deploy schema change with replication on s7 eqiad T321391
  • 17:03 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:03 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:38 marostegui: dbmaint deploy schema change with replication on s5 eqiad T321391
  • 16:31 marostegui: dbmaint deploy schema change with replication on s4 eqiad T321391
  • 16:25 marostegui: dbmaint deploy schema change with replication on s8 eqiad T321391
  • 16:22 marostegui: dbmaint deploy schema change with replication on s6 eqiad T321391
  • 16:06 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:06 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
  • 16:05 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force update after eqsin outage is over - volans@cumin1001"
  • 16:03 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:01 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host mc1038.eqiad.wmnet with OS bullseye
  • 16:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:53 zabe@deploy1002: Finished scap: T233004 (duration: 07m 54s)
  • 15:45 zabe@deploy1002: Started scap: T233004
  • 15:38 zabe@deploy1002: backport aborted: (duration: 04m 25s)
  • 15:38 zabe@deploy1002: sync-world aborted: Backport for Start reading from cul_actor everywhere (T233004) (duration: 04m 00s)
  • 15:36 zabe@deploy1002: zabe and zabe: Backport for Start reading from cul_actor everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 15:34 zabe@deploy1002: Started scap: Backport for Start reading from cul_actor everywhere (T233004)
  • 15:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:21 marostegui: Stop mariadb on db1106 to reclone db1206 (there will be lag on s1 on wikireplicas) T326669
  • 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P42982 and previous config saved to /var/cache/conftool/dbconfig/20230111-151712-marostegui.json
  • 14:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:47 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1005.eqiad.wmnet with OS bullseye
  • 14:46 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 14:46 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.18/extensions/Wikibase/repo/tests/jest/wikibase.vector.searchClient.spec.js: Backport: Add missing parentheses to vector search match text (T326633) (2/2) (duration: 06m 46s)
  • 14:42 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 14:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.18/extensions/Wikibase/repo/resources/wikibase.vector.searchClient.js: Backport: Add missing parentheses to vector search match text (T326633) (1/2) (duration: 07m 09s)
  • 14:28 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix test constructing HTMLFormField without parent (T326621) (duration: 08m 38s)
  • 14:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
  • 14:22 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
  • 14:21 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and lucaswerkmeister-wmde: Backport for Fix test constructing HTMLFormField without parent (T326621) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:19 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix test constructing HTMLFormField without parent (T326621)
  • 14:14 jelto@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99)
  • 14:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
  • 14:10 moritzm: installing postgresql 11 security updates on maps/eqiad
  • 14:06 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bullseye
  • 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1004.eqiad.wmnet with OS bullseye
  • 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 14:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
  • 13:55 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 13:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37002
  • 13:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37002
  • 13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3302
  • 13:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade
  • 13:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3302
  • 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9584
  • 13:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9584
  • 13:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35753
  • 13:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35753
  • 13:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
  • 13:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast6002.wikimedia.org
  • 13:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
  • 13:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
  • 13:12 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast6002.wikimedia.org on all recursors
  • 13:11 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast6002.wikimedia.org on all recursors
  • 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
  • 13:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast6002.wikimedia.org - jmm@cumin2002"
  • 13:07 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bullseye
  • 13:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc1038.eqiad.wmnet with OS bullseye
  • 13:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast6002.wikimedia.org
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast4004.wikimedia.org
  • 12:42 moritzm: installing postgresql 11 security updates on maps/codfw
  • 12:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8849
  • 12:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8849
  • 12:35 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast4004.wikimedia.org on all recursors
  • 12:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast4004.wikimedia.org on all recursors
  • 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
  • 12:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4004.wikimedia.org - jmm@cumin2002"
  • 12:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56630
  • 12:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56630
  • 12:24 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
  • 12:24 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 12:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:18 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast4004.wikimedia.org
  • 12:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 12:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:10 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bullseye
  • 12:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1003.eqiad.wmnet with OS bullseye
  • 12:10 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 12:08 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 11:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
  • 11:51 claime: repooled mw1486 in api_appserver eqiad after hardware investigation - T326425
  • 11:50 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
  • 11:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1486.eqiad.wmnet
  • 11:50 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1486.eqiad.wmnet
  • 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast3006.wikimedia.org
  • 11:47 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1486.eqiad.wmnet
  • 11:38 cgoubert@cumin1001: conftool action : set/pooled=yes:weight=10; selector: cluster=aux-k8s,service=kubesvc
  • 11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
  • 11:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
  • 11:30 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) bast3006.wikimedia.org on all recursors
  • 11:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast3006.wikimedia.org on all recursors
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
  • 11:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3006.wikimedia.org - jmm@cumin2002"
  • 11:22 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
  • 11:22 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bullseye
  • 11:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:19 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast3006.wikimedia.org
  • 11:16 btullis@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99)
  • 11:15 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 11:15 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.reboot-workers (exit_code=99) for Druid test cluster: Reboot Druid nodes
  • 11:12 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1003.eqiad.wmnet with OS bullseye
  • 10:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
  • 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 10:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 10:31 zabe@deploy1002: Finished scap: Backport for Simplify expensive check (T326690), Start reading from cuc_actor on test wikis (T233004) (duration: 09m 34s)
  • 10:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
  • 10:24 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw1486.eqiad.wmnet with reason: hardware troubleshooting
  • 10:23 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid test cluster: Reboot Druid nodes
  • 10:23 zabe@deploy1002: zabe and zabe: Backport for Simplify expensive check (T326690), Start reading from cuc_actor on test wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 10:21 zabe@deploy1002: Started scap: Backport for Simplify expensive check (T326690), Start reading from cuc_actor on test wikis (T233004)
  • 10:18 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
  • 10:16 moritzm: installing postgresql-11 security updates
  • 10:02 XioNoX: asw1-eqsin> request system reboot all-members - T316532
  • 09:49 moritzm: installing python3.7 security updates
  • 08:31 kartik@deploy1002: Finished scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) (duration: 11m 45s)
  • 08:21 kartik@deploy1002: kartik and kartik: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 08:20 kartik@deploy1002: Started scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278)
  • 05:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
  • 05:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
  • 05:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
  • 05:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet

2023-01-10

  • 23:58 krinkle@deploy1002: Finished deploy [integration/docroot@b7c82a3]: (no justification provided) (duration: 00m 15s)
  • 23:58 krinkle@deploy1002: Started deploy [integration/docroot@b7c82a3]: (no justification provided)
  • 23:46 mutante: cumin2002 - sudo systemctl status httpbb_hourly_appserver
  • 23:30 zabe@deploy1002: Finished scap: Backport for Start writing to rev_comment_id on test wikis (T299954) (duration: 09m 39s)
  • 23:22 zabe@deploy1002: zabe and zabe: Backport for Start writing to rev_comment_id on test wikis (T299954) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 23:21 zabe@deploy1002: Started scap: Backport for Start writing to rev_comment_id on test wikis (T299954)
  • 22:42 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.18 refs T325581
  • 22:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
  • 22:28 jhuneidi@deploy1002: Pruned MediaWiki: 1.40.0-wmf.14, 1.40.0-wmf.13 (duration: 02m 35s)
  • 22:21 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.18 refs T325581 (duration: 45m 04s)
  • 22:10 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 22:09 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 22:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1206 T325046', diff saved to https://phabricator.wikimedia.org/P42980 and previous config saved to /var/cache/conftool/dbconfig/20230110-220942-marostegui.json
  • 22:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
  • 22:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
  • 21:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
  • 21:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
  • 21:54 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 21:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 21:36 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.18 refs T325581
  • 21:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
  • 21:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
  • 21:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
  • 21:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
  • 21:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42979 and previous config saved to /var/cache/conftool/dbconfig/20230110-211826-root.json
  • 21:18 zabe@deploy1002: Finished scap: Backport for Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), Start reading from cul_actor on group1 wikis (T233004) (duration: 10m 08s)
  • 21:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
  • 21:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
  • 21:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
  • 21:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
  • 21:09 zabe@deploy1002: zabe and zabe and matmarex: Backport for Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), Start reading from cul_actor on group1 wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:08 zabe@deploy1002: Started scap: Backport for Use new DiscussionTools heading markup on group2 wikis except enwiki (T314714), Start reading from cul_actor on group1 wikis (T233004)
  • 21:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42978 and previous config saved to /var/cache/conftool/dbconfig/20230110-210321-root.json
  • 20:55 mutante: repooling eqsin
  • 20:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42977 and previous config saved to /var/cache/conftool/dbconfig/20230110-204816-root.json
  • 20:37 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:37 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:33 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 25%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42976 and previous config saved to /var/cache/conftool/dbconfig/20230110-203311-root.json
  • 20:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:29 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:28 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:26 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42975 and previous config saved to /var/cache/conftool/dbconfig/20230110-201807-ladsgroup.json
  • 20:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42974 and previous config saved to /var/cache/conftool/dbconfig/20230110-201806-root.json
  • 20:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 20:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 20:08 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:08 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
  • 20:07 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:06 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
  • 20:04 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:04 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42972 and previous config saved to /var/cache/conftool/dbconfig/20230110-200302-ladsgroup.json
  • 20:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42971 and previous config saved to /var/cache/conftool/dbconfig/20230110-200301-root.json
  • 20:02 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 42s)
  • 20:01 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:01 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:00 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
  • 20:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
  • 19:58 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
  • 19:52 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 01m 06s)
  • 19:51 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
  • 19:49 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 19:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42970 and previous config saved to /var/cache/conftool/dbconfig/20230110-194757-ladsgroup.json
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P42969 and previous config saved to /var/cache/conftool/dbconfig/20230110-194756-root.json
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42968 and previous config saved to /var/cache/conftool/dbconfig/20230110-194750-ladsgroup.json
  • 19:43 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:42 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 19:39 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:38 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 19:38 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:38 dancy@deploy1002: Installation of scap version "4.32.0" completed for 1 hosts
  • 19:37 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 19:37 dancy@deploy1002: Installing scap version "4.32.0" for 1 hosts
  • 19:35 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42965 and previous config saved to /var/cache/conftool/dbconfig/20230110-193253-ladsgroup.json
  • 19:32 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42964 and previous config saved to /var/cache/conftool/dbconfig/20230110-193245-ladsgroup.json
  • 19:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 19:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 19:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 19:31 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1158 maint', diff saved to https://phabricator.wikimedia.org/P42963 and previous config saved to /var/cache/conftool/dbconfig/20230110-192929-ladsgroup.json
  • 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42962 and previous config saved to /var/cache/conftool/dbconfig/20230110-191740-ladsgroup.json
  • 19:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
  • 19:08 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42958 and previous config saved to /var/cache/conftool/dbconfig/20230110-190235-ladsgroup.json
  • 19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 19:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 18:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
  • 18:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
  • 18:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bullseye
  • 18:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bullseye
  • 18:29 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
  • 18:23 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagemaster2001.codfw.wmnet with OS bullseye
  • 18:23 jayme@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
  • 18:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 18:20 jayme@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
  • 18:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
  • 18:16 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 18:16 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
  • 18:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2001.codfw.wmnet with reason: host reimage
  • 18:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2001.codfw.wmnet with reason: host reimage
  • 18:01 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bullseye
  • 18:01 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bullseye
  • 17:55 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagemaster2001.codfw.wmnet with OS bullseye
  • 17:51 zabe: run populateCulActor on all wikis # T325484
  • 17:48 claime: Finished rolling reboots of eqiad appservers
  • 17:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 17:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 17:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 maint', diff saved to https://phabricator.wikimedia.org/P42956 and previous config saved to /var/cache/conftool/dbconfig/20230110-173807-ladsgroup.json
  • 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 T325652', diff saved to https://phabricator.wikimedia.org/P42955 and previous config saved to /var/cache/conftool/dbconfig/20230110-173027-marostegui.json
  • 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P42954 and previous config saved to /var/cache/conftool/dbconfig/20230110-173002-ladsgroup.json
  • 17:29 ayounsi@deploy1002: Finished deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9 (duration: 00m 11s)
  • 17:28 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
  • 17:28 ayounsi@deploy1002: deploy aborted: help (duration: 00m 01s)
  • 17:28 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: help
  • 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P42953 and previous config saved to /var/cache/conftool/dbconfig/20230110-171457-ladsgroup.json
  • 17:14 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:10 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:03 ayounsi@deploy1002: deploy aborted: netbox-next to 3.2.9 (duration: 00m 07s)
  • 17:03 ayounsi@deploy1002: Started deploy [netbox/deploy@ef7451d]: netbox-next to 3.2.9
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P42952 and previous config saved to /var/cache/conftool/dbconfig/20230110-165952-ladsgroup.json
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After the incident', diff saved to https://phabricator.wikimedia.org/P42951 and previous config saved to /var/cache/conftool/dbconfig/20230110-165406-root.json
  • 16:48 bblack: depooling eqsin from DNS
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P42950 and previous config saved to /var/cache/conftool/dbconfig/20230110-164447-ladsgroup.json
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After the incident', diff saved to https://phabricator.wikimedia.org/P42949 and previous config saved to /var/cache/conftool/dbconfig/20230110-163901-root.json
  • 16:36 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2003.codfw.wmnet with OS bullseye
  • 16:24 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After the incident', diff saved to https://phabricator.wikimedia.org/P42948 and previous config saved to /var/cache/conftool/dbconfig/20230110-162356-root.json
  • 16:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2003.codfw.wmnet with reason: host reimage
  • 16:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2003.codfw.wmnet with reason: host reimage
  • 16:14 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2002.codfw.wmnet with OS bullseye
  • 16:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After the incident', diff saved to https://phabricator.wikimedia.org/P42947 and previous config saved to /var/cache/conftool/dbconfig/20230110-160851-root.json
  • 16:08 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:08 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2003.codfw.wmnet with OS bullseye
  • 16:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2002.codfw.wmnet with reason: host reimage
  • 16:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 16:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 16:01 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2002.codfw.wmnet with reason: host reimage
  • 15:59 SandraEbele: reran failed pageview-druid-hourly-coord oozie job for 2023-1-10-10.
  • 15:59 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:58 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1373,1384-1385,1387].eqiad.wmnet
  • 15:55 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1373,1384-1385,1387].eqiad.wmnet
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After the incident', diff saved to https://phabricator.wikimedia.org/P42946 and previous config saved to /var/cache/conftool/dbconfig/20230110-155346-root.json
  • 15:52 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2002.codfw.wmnet with OS bullseye
  • 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: After the incident', diff saved to https://phabricator.wikimedia.org/P42945 and previous config saved to /var/cache/conftool/dbconfig/20230110-153841-root.json
  • 15:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
  • 15:30 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:29 claime: Restarting rolling reboots of eqiad appservers
  • 15:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
  • 15:25 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:25 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: After the incident', diff saved to https://phabricator.wikimedia.org/P42944 and previous config saved to /var/cache/conftool/dbconfig/20230110-152336-root.json
  • 15:21 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2001.codfw.wmnet
  • 15:17 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader2001.codfw.wmnet
  • 15:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagetcd2001.codfw.wmnet with reason: host reimage
  • 15:11 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagetcd2001.codfw.wmnet with reason: host reimage
  • 15:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
  • 15:02 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
  • 15:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2037.codfw.wmnet
  • 15:01 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2037.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 14:56 XioNoX: start VC link maintenance in eqiad - T325803
  • 14:55 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
  • 14:55 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
  • 14:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1001.eqiad.wmnet
  • 14:49 zabe: UTC afternoon deploys done
  • 14:49 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader1001.eqiad.wmnet
  • 14:48 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2037.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 14:47 zabe@deploy1002: Finished scap: Backport for Start reading from cul_actor on remaining test wikis and group0 wikis (T233004) (duration: 08m 59s)
  • 14:46 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 14:40 zabe@deploy1002: zabe and zabe: Backport for Start reading from cul_actor on remaining test wikis and group0 wikis (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:38 zabe@deploy1002: Started scap: Backport for Start reading from cul_actor on remaining test wikis and group0 wikis (T233004)
  • 14:36 zabe: run populateCulActor on group0 wikis # T325484
  • 14:35 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
  • 14:35 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2037.codfw.wmnet
  • 14:34 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host apifeatureusage2001.codfw.wmnet
  • 14:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2036.codfw.wmnet
  • 14:33 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:33 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2036.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 14:28 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2036.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 14:28 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubestagetcd2001.codfw.wmnet with OS bullseye
  • 14:28 jayme@cumin1001: START - Cookbook sre.ganeti.reimage for host kubestagetcd2001.codfw.wmnet with OS bullseye
  • 14:26 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 14:25 zabe@deploy1002: Finished scap: Backport for [config]: GDI Safety Survey Wave 4 (T325136) (duration: 17m 42s)
  • 14:21 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage2001.codfw.wmnet
  • 14:19 claime: Pausing reboots of eqiad appservers for deployments
  • 14:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1369-1372].eqiad.wmnet
  • 14:18 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1369-1372].eqiad.wmnet
  • 14:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apifeatureusage1001.eqiad.wmnet
  • 14:11 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2036.codfw.wmnet
  • 14:10 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 14:09 zabe@deploy1002: zabe and essexigyan: Backport for [config]: GDI Safety Survey Wave 4 (T325136) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:07 zabe@deploy1002: Started scap: Backport for [config]: GDI Safety Survey Wave 4 (T325136)
  • 14:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-codfw with k8s 1.23
  • 14:06 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host apifeatureusage1001.eqiad.wmnet
  • 14:06 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Reinitialize staging-codfw with k8s 1.23
  • 14:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2035.codfw.wmnet
  • 14:03 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:03 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2035.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 13:49 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2035.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 13:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1002.eqiad.wmnet with OS bullseye
  • 13:46 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 13:46 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 13:44 godog: delete grafana dashboards from "sre dashboards for deletion" folder - T178690
  • 13:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
  • 13:37 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2035.codfw.wmnet
  • 13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
  • 13:34 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2001.wikimedia.org
  • 13:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2001.wikimedia.org
  • 13:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
  • 13:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
  • 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetdb-test2001.codfw.wmnet
  • 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb-test2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:59 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
  • 12:59 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cephosd1002.eqiad.wmnet with OS bullseye
  • 12:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb-test2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:53 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 12:50 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 12:50 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 12:50 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetdb-test2001.codfw.wmnet
  • 12:49 claime: Starting rolling reboot of eqiad appservers
  • 12:47 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
  • 12:36 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
  • 12:34 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1002.eqiad.wmnet with OS bullseye
  • 12:31 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:31 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:31 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 12:31 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 12:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
  • 12:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
  • 12:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2034.codfw.wmnet
  • 12:18 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:18 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 12:12 claime: Finished rolling reboot of eqiad jobrunners
  • 12:07 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:06 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:06 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 12:05 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 12:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2034.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 11:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 11:58 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:57 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:57 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:56 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:53 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:52 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:48 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 11:35 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
  • 11:33 btullis@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
  • 11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
  • 11:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2047.codfw.wmnet
  • 11:00 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2034.codfw.wmnet
  • 10:31 godog: upgrade thanos to 0.30.1 on thanos-fe2* - T303154
  • 10:24 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
  • 10:23 btullis@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
  • 10:21 claime: Starting rolling reboot of eqiad jobrunners
  • 10:21 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 10:18 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephosd1002.eqiad.wmnet with OS bullseye
  • 10:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
  • 10:14 claime: repooled parse1002.eqiad.wmnet - T326119
  • 10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
  • 10:13 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
  • 10:07 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
  • 10:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2033.codfw.wmnet
  • 10:06 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:06 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 10:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2033.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 09:59 cgoubert@cumin1001: conftool action : set/pooled=no:weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1002.eqiad.wmnet
  • 09:55 godog: upgrade thanos to 0.30.1 on prometheus hosts - T303154
  • 09:53 moritzm: installing systemd bugfix updates from Bullseye point release
  • 09:45 aqu@deploy1002: Finished deploy [airflow-dags/analytics@9568478]: Fix bug fix in HDFS usage pipeline [airflow-dags@9568478] (duration: 00m 13s)
  • 09:45 aqu@deploy1002: Started deploy [airflow-dags/analytics@9568478]: Fix bug fix in HDFS usage pipeline [airflow-dags@9568478]
  • 09:43 godog: upgrade thanos to 0.30.1 on thanos-fe100[2-3] - T303154
  • 09:34 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9568478]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@9568478] (duration: 00m 11s)
  • 09:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 09:34 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9568478]: Fix bug fix in HDFS usage pipeline TEST [airflow-dags@9568478]
  • 09:25 XioNoX: repool ulsfo (maintenance cancelled) - T316532
  • 09:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
  • 09:22 taavi: added zabe to wmf-deployment gerrit group T326327
  • 09:19 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2033.codfw.wmnet
  • 09:18 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
  • 09:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2032.codfw.wmnet
  • 09:17 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:17 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 09:15 kart_: Done: UTC morning backport window
  • 09:14 kartik@deploy1002: Finished scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) (duration: 09m 20s)
  • 09:07 kartik@deploy1002: kartik and kartik: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:05 kartik@deploy1002: Started scap: Backport for CX: Fix transformation of TranslationUnitDTO to custom array (T326278)
  • 08:58 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2032.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 08:56 godog: upgrade thanos to 0.30.1 on thanos-fe1001 - T303154
  • 08:54 godog: upgrade thanos to 0.30.1 on prometheus2006 - T303154
  • 08:49 kartik@deploy1002: Finished scap: Backport for CX: Fix usage of categories translation unit as array (T326278) (duration: 12m 08s)
  • 08:38 kartik@deploy1002: kartik and kartik: Backport for CX: Fix usage of categories translation unit as array (T326278) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:37 kartik@deploy1002: Started scap: Backport for CX: Fix usage of categories translation unit as array (T326278)
  • 08:20 kartik@deploy1002: Finished scap: Backport for ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721) (duration: 17m 21s)
  • 08:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 08:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 08:08 kartik@deploy1002: kartik and kartik: Backport for ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 08:03 kartik@deploy1002: Started scap: Backport for ContentTranslation: Increase MT threshold for publishing in cswiki by 20% (T324721)
  • 08:02 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 07:45 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2032.codfw.wmnet
  • 07:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2031.codfw.wmnet
  • 07:37 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:37 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 07:36 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2031.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 07:33 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mc2044.codfw.wmnet
  • 07:28 XioNoX: depool ulsfo for network maintenance - T316532
  • 07:27 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 07:22 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2031.codfw.wmnet
  • 07:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
  • 07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if dns update is needed after change of rec-dns-lb IPs status - ayounsi@cumin1001"
  • 07:14 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if dns update is needed after change of rec-dns-lb IPs status - ayounsi@cumin1001"
  • 07:11 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 T326133', diff saved to https://phabricator.wikimedia.org/P42941 and previous config saved to /var/cache/conftool/dbconfig/20230110-070628-ladsgroup.json
  • 07:03 XioNoX: remove static routes for legacy dns-rec-lb IPs - T239993
  • 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write T326133', diff saved to https://phabricator.wikimedia.org/P42940 and previous config saved to /var/cache/conftool/dbconfig/20230110-070223-ladsgroup.json
  • 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T326133', diff saved to https://phabricator.wikimedia.org/P42939 and previous config saved to /var/cache/conftool/dbconfig/20230110-070152-ladsgroup.json
  • 07:01 Amir1: Starting s5 eqiad failover from db1130 to db1100 - T326133
  • 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 T326133', diff saved to https://phabricator.wikimedia.org/P42938 and previous config saved to /var/cache/conftool/dbconfig/20230110-062309-ladsgroup.json
  • 06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T326133
  • 06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T326133
  • 05:39 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Sync idm-test1001 - slyngshede@cumin1001"
  • 05:38 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Sync idm-test1001 - slyngshede@cumin1001"
  • 03:14 eileen: civicrm upgraded from 391e8482 to 9afd2789
  • 03:12 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
  • 02:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
  • 02:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
  • 02:08 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: plugin upgrade - ryankemper@cumin1001 - T324247
  • 01:50 krinkle@deploy1002: Finished deploy [integration/docroot@f59119c]: (no justification provided) (duration: 00m 14s)
  • 01:50 krinkle@deploy1002: Started deploy [integration/docroot@f59119c]: (no justification provided)
  • 01:28 eileen: civicrm upgraded from e3405a4e to 391e8482
  • 00:48 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247

2023-01-09

  • 22:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
  • 22:33 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
  • 22:32 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
  • 22:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
  • 22:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2030.codfw.wmnet
  • 22:25 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:25 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 22:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2030.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 22:11 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 22:05 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2030.codfw.wmnet
  • 22:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2029.codfw.wmnet
  • 22:03 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:03 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 22:00 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2029.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 21:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
  • 21:52 kindrobot: close UTC late backport window
  • 21:50 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 21:47 kindrobot@deploy1002: Sync cancelled.
  • 21:47 kindrobot@deploy1002: kindrobot and trainbranchbot: Backport for Revert "[config]: Deploy GDI Safety Survey Wave 4" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
  • 21:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: plugin upgrade - bking@cumin1001 - T324247
  • 21:45 kindrobot@deploy1002: Started scap: Backport for Revert "[config]: Deploy GDI Safety Survey Wave 4"
  • 21:39 kindrobot@deploy1002: Sync cancelled.
  • 21:38 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2029.codfw.wmnet
  • 21:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2027.codfw.wmnet
  • 21:37 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:37 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 21:34 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - bking@cumin1001 - T324247
  • 21:29 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2027.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 21:27 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 21:26 kindrobot@deploy1002: kindrobot and essexigyan: Backport for [config]: Deploy GDI Safety Survey Wave 4 (T325136) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:24 kindrobot@deploy1002: Started scap: Backport for [config]: Deploy GDI Safety Survey Wave 4 (T325136)
  • 21:21 kindrobot: starting UTC late backport window
  • 21:21 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2027.codfw.wmnet
  • 21:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2026.codfw.wmnet
  • 21:18 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:18 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 21:09 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2026.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 21:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P42936 and previous config saved to /var/cache/conftool/dbconfig/20230109-210940-marostegui.json
  • 21:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
  • 21:03 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 20:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
  • 20:57 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2026.codfw.wmnet
  • 20:52 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - bking@cumin1001 - T324247
  • 20:52 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
  • 20:44 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
  • 20:44 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
  • 20:44 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
  • 20:36 Amir1: deleting global usage coming from commons in commons (T322588)
  • 20:36 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
  • 20:35 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
  • 20:34 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 20:33 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 20:25 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
  • 20:24 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge plugin upgrade - bking@cumin1001 - T324247
  • 20:21 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 20:20 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 20:20 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 20:20 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 19:37 bblack: cp5032: set param transit_buffer=1M via varnishadm
  • 19:33 bblack: cp5032: set param transit_buffer=4M via varnishadm
  • 19:26 bblack: cp5032: set param transit_buffer=1M via varnishadm
  • 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2025.codfw.wmnet
  • 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:22 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2025.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 19:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2025.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 19:11 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 19:05 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2025.codfw.wmnet
  • 19:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2024.codfw.wmnet
  • 19:04 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:04 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 19:00 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2024.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 18:57 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 18:48 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2024.codfw.wmnet
  • 18:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2023.codfw.wmnet
  • 18:43 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:43 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 18:41 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2023.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 18:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
  • 18:36 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 18:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
  • 18:30 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2023.codfw.wmnet
  • 18:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2022.codfw.wmnet
  • 18:07 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:07 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2022.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 18:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:02 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2022.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 18:00 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 17:56 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2022.codfw.wmnet
  • 17:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
  • 17:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
  • 17:46 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc2021.codfw.wmnet
  • 17:46 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2021.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 17:42 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 17:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 17:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 17:41 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 17:36 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2021.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 17:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 17:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 17:34 claime: Finished codfw jobrunner rolling reboot
  • 17:32 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 17:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 16:59 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2021.codfw.wmnet
  • 16:49 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 16:48 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 16:46 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc2020.codfw.wmnet
  • 16:46 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:46 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 16:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
  • 16:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
  • 16:40 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 16:32 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2020.codfw.wmnet
  • 16:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc2019.codfw.wmnet
  • 16:11 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:11 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 16:08 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mc2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1001"
  • 16:04 XioNoX: start VC link maintenance in eqiad - T325803
  • 16:03 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 15:58 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc2019.codfw.wmnet
  • 15:37 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:37 claime: Starting codfw jobrunner rolling reboot
  • 15:35 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for CX: Allow composer/installers plugin (duration: 10m 03s)
  • 15:29 claime: Not starting codfw jobrunner rolling reboot, deploy in progress
  • 15:28 claime: Starting codfw jobrunner rolling reboot
  • 15:26 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and kartik: Backport for CX: Allow composer/installers plugin synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 15:24 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for CX: Allow composer/installers plugin
  • 15:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps2009.codfw.wmnet,maps1009.eqiad.wmnet with reason: Removing redis service
  • 15:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps2009.codfw.wmnet,maps1009.eqiad.wmnet with reason: Removing redis service
  • 15:11 effie: disable puppet on all 'P:mediawiki::mcrouter_wancache' hosts to merge 875894
  • 15:09 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for extwiki: Install SandboxLink extension (T326450) (duration: 08m 37s)
  • 15:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
  • 15:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
  • 15:02 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for extwiki: Install SandboxLink extension (T326450) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:00 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for extwiki: Install SandboxLink extension (T326450)
  • 15:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2003.codfw.wmnet
  • 14:59 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ echo 'https://en.wikipedia.org/static/images/project-logos/jawikisource.png' | mwscript purgeList.php # T326488
  • 14:56 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for jawikisource: Update project logo and wordmark (T326488) (duration: 09m 24s)
  • 14:55 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2003.codfw.wmnet
  • 14:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1004.eqiad.wmnet
  • 14:48 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for jawikisource: Update project logo and wordmark (T326488) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:47 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1004.eqiad.wmnet
  • 14:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for jawikisource: Update project logo and wordmark (T326488)
  • 14:45 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for arwiki: Create extendedmover group (T326434) (duration: 08m 56s)
  • 14:38 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for arwiki: Create extendedmover group (T326434) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:36 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for arwiki: Create extendedmover group (T326434)
  • 14:31 godog: upgrade thanos to 0.30.1 on prometheus2005 - T303154
  • 14:27 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for mediawikiwiki: Disable Flow on new pages by default (T325907) (duration: 18m 19s)
  • 14:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for mediawikiwiki: Disable Flow on new pages by default (T325907) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:09 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for mediawikiwiki: Disable Flow on new pages by default (T325907)
  • 13:55 moritzm: installing systemd bugfix updates from Bullseye point release
  • 13:41 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1003.eqiad.wmnet
  • 13:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1003.eqiad.wmnet
  • 13:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 13:35 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
  • 12:53 hnowlan@deploy1002: Finished deploy [restbase/deploy@bcb0a69]: New wikis T321284 T321290 T321296 T326140 (duration: 18m 56s)
  • 12:34 hnowlan@deploy1002: Started deploy [restbase/deploy@bcb0a69]: New wikis T321284 T321290 T321296 T326140
  • 12:18 vgutierrez: repool cp5025
  • 11:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15954
  • 11:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15954
  • 11:29 vgutierrez: restart purged on cp5025
  • 11:28 vgutierrez: depool cp5025 due to purging issues
  • 11:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum1001.eqiad.wmnet
  • 11:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum1001.eqiad.wmnet
  • 11:06 XioNoX: repool ulsfo - T316532
  • 11:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
  • 10:55 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 10:55 jiji@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 10:54 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
  • 10:54 claime: Starting codfw appserver rolling reboot
  • 10:54 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
  • 10:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
  • 10:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet
  • 10:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
  • 10:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet
  • 10:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
  • 10:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode2001.codfw.wmnet
  • 10:46 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 10:46 effie: switching maps to eqiad
  • 10:45 moritzm: installing avahi security updates
  • 10:44 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
  • 10:41 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
  • 09:35 dcausse: restarting blazegraph on wdqs1006 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 09:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
  • 09:04 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
  • 08:58 moritzm: installing glibc security updates
  • 08:56 XioNoX: depool ulsfo for network maintenance - T316532
  • 08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 327700
  • 08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 327700
  • 08:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 48237
  • 08:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 48237
  • 08:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32035
  • 08:21 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idm-test1001.wikimedia.org
  • 08:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32035
  • 08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idm-test1001.wikimedia.org on all recursors
  • 08:12 slyngshede@cumin1001: START - Cookbook sre.dns.wipe-cache idm-test1001.wikimedia.org on all recursors
  • 08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"
  • 08:08 slyngshede@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idm-test1001.wikimedia.org - slyngshede@cumin1001"
  • 08:06 slyngshede@cumin1001: START - Cookbook sre.dns.netbox
  • 08:06 slyngshede@cumin1001: START - Cookbook sre.ganeti.makevm for new host idm-test1001.wikimedia.org

2023-01-06

  • 18:57 mutante: systemctl start docker-gc on all gitlab-runners via cumin T310593
  • 18:56 mutante: gitlab-runner1002 - systemctl start docker-gc; run puppet on all gitlab-runners T310593
  • 18:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: debugging
  • 18:49 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: debugging
  • 18:36 sukhe: pool cp5032 [bullseye upgrade completed]: T325797
  • 18:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5032.eqsin.wmnet,service=ats-be
  • 18:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5032.eqsin.wmnet,service=cdn
  • 18:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425
  • 18:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on mw1486.eqiad.wmnet with reason: downtimed, hw failure: T326425
  • 18:13 Krinkle: krinkle@cloudweb1003$ Run `UPDATE actor SET actor_user=31136 WHERE actor_id=14640;` to partially fix T326431
  • 17:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5032.eqsin.wmnet with OS bullseye
  • 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
  • 17:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
  • 16:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
  • 16:53 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
  • 16:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
  • 16:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
  • 16:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
  • 16:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on parse1002.eqiad.wmnet with reason: CPU1 machine check error
  • 15:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1486.eqiad.wmnet
  • 15:53 claime: depooling mw1486.eqiad.wmnet for hardware troubleshooting
  • 15:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
  • 15:30 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS bullseye
  • 15:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS bullseye
  • 15:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=False) upgrade firmware for hosts cp5032.eqsin.wmnet
  • 15:08 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp5032.eqsin.wmnet
  • 15:07 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5032.eqsin.wmnet,service=ats-be
  • 15:07 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5032.eqsin.wmnet,service=cdn
  • 15:07 sukhe: depool cp5032 for bullseye upgrade (starting with NIC firmware upgrade): T325797
  • 14:42 jbond: remove bgpalerter from apt
  • 14:06 reedy@deploy1002: Synchronized php-1.40.0-wmf.17/extensions/SecurePoll/cli/wm-scripts/ucoc2023/populateEditCount.php: T326408 (duration: 07m 09s)
  • 12:42 stevemunene@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 12:36 tzatziki: running extensions/SecurePoll/cli/wm-scripts/ucoc2023/ucoc2023_tables.sql on each wiki
  • 12:29 stevemunene@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 11:38 jbond: upload bgpalerter to bullseye-wikimedia
  • 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 10:10 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 21245
  • 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 21245
  • 09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36994
  • 09:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36994
  • 09:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266925
  • 09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 266925
  • 09:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9038
  • 09:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9038
  • 09:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5713
  • 09:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 5713
  • 09:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37473
  • 09:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37473
  • 09:03 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 4788
  • 09:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4788
  • 09:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32035
  • 09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 32035
  • 09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15954
  • 09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15954
  • 09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 60427
  • 09:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 60427
  • 09:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58717
  • 09:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58717
  • 09:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45489
  • 08:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45489
  • 08:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 24482
  • 08:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 24482
  • 08:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9119
  • 08:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9119
  • 08:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049
  • 08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 64049
  • 08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263237
  • 08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 263237
  • 08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 51185
  • 08:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 51185
  • 08:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 201746
  • 08:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 201746
  • 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62597
  • 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62597
  • 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 327700
  • 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 327700
  • 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56630
  • 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 56630
  • 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21245
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21245
  • 08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37282
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37282
  • 08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37558
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 37558
  • 08:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13113
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 13113
  • 08:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 41095
  • 08:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 41095
  • 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61573
  • 08:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 61573
  • 08:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21320
  • 08:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 21320
  • 08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 39405
  • 08:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 39405
  • 08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 48237
  • 08:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 48237
  • 08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 47794
  • 08:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 47794
  • 08:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 22822
  • 08:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 22822
  • 08:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58715
  • 08:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58715
  • 08:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 51254
  • 08:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 51254
  • 08:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35432
  • 08:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35432
  • 08:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132602
  • 08:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 132602
  • 08:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42473
  • 08:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 42473
  • 08:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16347
  • 08:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16347
  • 08:05 XioNoX: drmrs offload Vodafone from Tata - T324955
  • 01:08 urbanecm@deploy1002: Finished scap: Backport for Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394) (duration: 08m 48s)
  • 01:01 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 00:59 urbanecm@deploy1002: Started scap: Backport for Revert "GlobalRename: Convert DB selects to use SelectQueryBuilder" (T326377 T312394)
  • 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42928 and previous config saved to /var/cache/conftool/dbconfig/20230106-004102-ladsgroup.json
  • 00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42927 and previous config saved to /var/cache/conftool/dbconfig/20230106-002556-ladsgroup.json
  • 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42926 and previous config saved to /var/cache/conftool/dbconfig/20230106-001049-ladsgroup.json

2023-01-05

  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42925 and previous config saved to /var/cache/conftool/dbconfig/20230105-235543-ladsgroup.json
  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T326156)', diff saved to https://phabricator.wikimedia.org/P42924 and previous config saved to /var/cache/conftool/dbconfig/20230105-235325-ladsgroup.json
  • 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42923 and previous config saved to /var/cache/conftool/dbconfig/20230105-235304-ladsgroup.json
  • 23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42922 and previous config saved to /var/cache/conftool/dbconfig/20230105-233758-ladsgroup.json
  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42921 and previous config saved to /var/cache/conftool/dbconfig/20230105-232251-ladsgroup.json
  • 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42920 and previous config saved to /var/cache/conftool/dbconfig/20230105-230745-ladsgroup.json
  • 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42919 and previous config saved to /var/cache/conftool/dbconfig/20230105-230629-ladsgroup.json
  • 23:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 23:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42918 and previous config saved to /var/cache/conftool/dbconfig/20230105-230607-ladsgroup.json
  • 22:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42917 and previous config saved to /var/cache/conftool/dbconfig/20230105-225101-ladsgroup.json
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42916 and previous config saved to /var/cache/conftool/dbconfig/20230105-223554-ladsgroup.json
  • 22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42915 and previous config saved to /var/cache/conftool/dbconfig/20230105-222048-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T326156)', diff saved to https://phabricator.wikimedia.org/P42914 and previous config saved to /var/cache/conftool/dbconfig/20230105-221932-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 22:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42913 and previous config saved to /var/cache/conftool/dbconfig/20230105-221911-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42912 and previous config saved to /var/cache/conftool/dbconfig/20230105-220404-ladsgroup.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42911 and previous config saved to /var/cache/conftool/dbconfig/20230105-214858-ladsgroup.json
  • 21:43 TheresNoTime: closing UTC late backport window
  • 21:42 samtar@deploy1002: Finished scap: Backport for Turn off wgNavigationTimingOversampleFactor campaigns (T286703) (duration: 08m 45s)
  • 21:35 samtar@deploy1002: samtar and krinkle: Backport for Turn off wgNavigationTimingOversampleFactor campaigns (T286703) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:33 samtar@deploy1002: Started scap: Backport for Turn off wgNavigationTimingOversampleFactor campaigns (T286703)
  • 21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42910 and previous config saved to /var/cache/conftool/dbconfig/20230105-213351-ladsgroup.json
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42909 and previous config saved to /var/cache/conftool/dbconfig/20230105-213235-ladsgroup.json
  • 21:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 21:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42908 and previous config saved to /var/cache/conftool/dbconfig/20230105-213214-ladsgroup.json
  • 21:31 samtar@deploy1002: Finished scap: Backport for actions: Actually store CommentFormatter in McrUndoAction (T326336) (duration: 10m 31s)
  • 21:23 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 21:23 samtar@deploy1002: samtar and zabe: Backport for actions: Actually store CommentFormatter in McrUndoAction (T326336) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:21 samtar@deploy1002: Started scap: Backport for actions: Actually store CommentFormatter in McrUndoAction (T326336)
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42907 and previous config saved to /var/cache/conftool/dbconfig/20230105-211707-ladsgroup.json
  • 21:16 samtar@deploy1002: Finished scap: Backport for Start writing to cuc_comment_id everywhere (T233004) (duration: 10m 07s)
  • 21:08 samtar@deploy1002: samtar and zabe: Backport for Start writing to cuc_comment_id everywhere (T233004) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:06 samtar@deploy1002: Started scap: Backport for Start writing to cuc_comment_id everywhere (T233004)
  • 21:04 samtar@deploy1002: backport aborted: (duration: 01m 22s)
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42906 and previous config saved to /var/cache/conftool/dbconfig/20230105-210201-ladsgroup.json
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42905 and previous config saved to /var/cache/conftool/dbconfig/20230105-204654-ladsgroup.json
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T326156)', diff saved to https://phabricator.wikimedia.org/P42904 and previous config saved to /var/cache/conftool/dbconfig/20230105-204438-ladsgroup.json
  • 20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 20:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42903 and previous config saved to /var/cache/conftool/dbconfig/20230105-204403-ladsgroup.json
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42902 and previous config saved to /var/cache/conftool/dbconfig/20230105-202856-ladsgroup.json
  • 20:17 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@9568478]: Bumping platform_eng airflow instance to latest (duration: 00m 09s)
  • 20:17 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@9568478]: Bumping platform_eng airflow instance to latest
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42901 and previous config saved to /var/cache/conftool/dbconfig/20230105-201350-ladsgroup.json
  • 19:59 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.17 refs T325580
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42900 and previous config saved to /var/cache/conftool/dbconfig/20230105-195843-ladsgroup.json
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T326156)', diff saved to https://phabricator.wikimedia.org/P42899 and previous config saved to /var/cache/conftool/dbconfig/20230105-195627-ladsgroup.json
  • 19:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 19:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42898 and previous config saved to /var/cache/conftool/dbconfig/20230105-195606-ladsgroup.json
  • 19:48 taavi@deploy1002: Finished scap: Backport for actions: Pass CommentFormatter to McrRestoreAction (T326275) (duration: 10m 11s)
  • 19:41 taavi@deploy1002: taavi and zabe: Backport for actions: Pass CommentFormatter to McrRestoreAction (T326275) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42897 and previous config saved to /var/cache/conftool/dbconfig/20230105-194059-ladsgroup.json
  • 19:38 sukhe: reprepro -C main include bullseye-wikimedia varnish_6.0.10-1wm3_amd64.changes: T325797
  • 19:37 taavi@deploy1002: Started scap: Backport for actions: Pass CommentFormatter to McrRestoreAction (T326275)
  • 19:31 Amir1: creating new cu tables
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42896 and previous config saved to /var/cache/conftool/dbconfig/20230105-192553-ladsgroup.json
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42895 and previous config saved to /var/cache/conftool/dbconfig/20230105-191046-ladsgroup.json
  • 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T326156)', diff saved to https://phabricator.wikimedia.org/P42894 and previous config saved to /var/cache/conftool/dbconfig/20230105-190830-ladsgroup.json
  • 19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 19:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42893 and previous config saved to /var/cache/conftool/dbconfig/20230105-190724-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42892 and previous config saved to /var/cache/conftool/dbconfig/20230105-185217-ladsgroup.json
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42891 and previous config saved to /var/cache/conftool/dbconfig/20230105-183711-ladsgroup.json
  • 18:22 taavi: delete some nostalgiawiki pages using maintenance/deleteBatch.php for T326334
  • 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42890 and previous config saved to /var/cache/conftool/dbconfig/20230105-182204-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T326156)', diff saved to https://phabricator.wikimedia.org/P42889 and previous config saved to /var/cache/conftool/dbconfig/20230105-181949-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42888 and previous config saved to /var/cache/conftool/dbconfig/20230105-181928-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42887 and previous config saved to /var/cache/conftool/dbconfig/20230105-180421-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42886 and previous config saved to /var/cache/conftool/dbconfig/20230105-174915-ladsgroup.json
  • 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42885 and previous config saved to /var/cache/conftool/dbconfig/20230105-173408-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T326156)', diff saved to https://phabricator.wikimedia.org/P42884 and previous config saved to /var/cache/conftool/dbconfig/20230105-173154-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42883 and previous config saved to /var/cache/conftool/dbconfig/20230105-173133-ladsgroup.json
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42882 and previous config saved to /var/cache/conftool/dbconfig/20230105-171626-ladsgroup.json
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42880 and previous config saved to /var/cache/conftool/dbconfig/20230105-170119-ladsgroup.json
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42878 and previous config saved to /var/cache/conftool/dbconfig/20230105-164612-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T326156)', diff saved to https://phabricator.wikimedia.org/P42877 and previous config saved to /var/cache/conftool/dbconfig/20230105-164358-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42876 and previous config saved to /var/cache/conftool/dbconfig/20230105-164258-ladsgroup.json
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42875 and previous config saved to /var/cache/conftool/dbconfig/20230105-162751-ladsgroup.json
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42874 and previous config saved to /var/cache/conftool/dbconfig/20230105-161245-ladsgroup.json
  • 16:05 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:04 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:03 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42873 and previous config saved to /var/cache/conftool/dbconfig/20230105-155738-ladsgroup.json
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42872 and previous config saved to /var/cache/conftool/dbconfig/20230105-155524-ladsgroup.json
  • 15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42871 and previous config saved to /var/cache/conftool/dbconfig/20230105-155503-ladsgroup.json
  • 15:52 matthiasmullie: UTC afternoon backports done
  • 15:51 mlitn@deploy1002: Finished scap: Backport for Fix URL construction (duration: 12m 21s)
  • 15:41 mlitn@deploy1002: mlitn and mlitn: Backport for Fix URL construction synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42870 and previous config saved to /var/cache/conftool/dbconfig/20230105-153956-ladsgroup.json
  • 15:39 mlitn@deploy1002: Started scap: Backport for Fix URL construction
  • 15:37 mlitn@deploy1002: Finished scap: Backport for Fix URL construction (duration: 08m 04s)
  • 15:31 mlitn@deploy1002: mlitn and mlitn: Backport for Fix URL construction synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 15:29 mlitn@deploy1002: Started scap: Backport for Fix URL construction
  • 15:26 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:26 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42869 and previous config saved to /var/cache/conftool/dbconfig/20230105-152447-ladsgroup.json
  • 15:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 15:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:14 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 15:10 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:10 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:10 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42868 and previous config saved to /var/cache/conftool/dbconfig/20230105-150939-ladsgroup.json
  • 15:09 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42867 and previous config saved to /var/cache/conftool/dbconfig/20230105-150825-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42866 and previous config saved to /var/cache/conftool/dbconfig/20230105-150804-ladsgroup.json
  • 14:58 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 14:58 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 14:56 claime: hard resetting mw1486
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42865 and previous config saved to /var/cache/conftool/dbconfig/20230105-145257-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42864 and previous config saved to /var/cache/conftool/dbconfig/20230105-143751-ladsgroup.json
  • 14:30 mlitn@deploy1002: Finished scap: Backport for Also get central description (T325831) (duration: 08m 32s)
  • 14:23 mlitn@deploy1002: mlitn and mlitn: Backport for Also get central description (T325831) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42862 and previous config saved to /var/cache/conftool/dbconfig/20230105-142244-ladsgroup.json
  • 14:21 mlitn@deploy1002: Started scap: Backport for Also get central description (T325831)
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T326156)', diff saved to https://phabricator.wikimedia.org/P42861 and previous config saved to /var/cache/conftool/dbconfig/20230105-142029-ladsgroup.json
  • 14:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42860 and previous config saved to /var/cache/conftool/dbconfig/20230105-142008-ladsgroup.json
  • 14:17 mlitn@deploy1002: Finished scap: Backport for Also get central description (T325831) (duration: 07m 57s)
  • 14:11 mlitn@deploy1002: mlitn and mlitn: Backport for Also get central description (T325831) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:09 mlitn@deploy1002: Started scap: Backport for Also get central description (T325831)
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42859 and previous config saved to /var/cache/conftool/dbconfig/20230105-140501-ladsgroup.json
  • 13:58 Amir1: start of externallinks migration in elwiki (and rest of large wikis in s3) (T326314)
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42858 and previous config saved to /var/cache/conftool/dbconfig/20230105-134955-ladsgroup.json
  • 13:46 ladsgroup@deploy1002: Finished scap: Backport for Enable write both for externallinks in ten largest s3 wikis (T321662) (duration: 08m 54s)
  • 13:42 urbanecm: aswikiquote: Run importDump.php to import a XML dump (per new wiki importers request, running into issues with a largish page)
  • 13:39 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Enable write both for externallinks in ten largest s3 wikis (T321662) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:38 XioNoX: start [eqiad] faulty VC optics maintenance - T325803
  • 13:37 ladsgroup@deploy1002: Started scap: Backport for Enable write both for externallinks in ten largest s3 wikis (T321662)
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42857 and previous config saved to /var/cache/conftool/dbconfig/20230105-133448-ladsgroup.json
  • 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T326156)', diff saved to https://phabricator.wikimedia.org/P42856 and previous config saved to /var/cache/conftool/dbconfig/20230105-133234-ladsgroup.json
  • 13:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 13:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42855 and previous config saved to /var/cache/conftool/dbconfig/20230105-133211-ladsgroup.json
  • 13:30 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:29 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:21 effie: enable puppet on all mw servers
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42854 and previous config saved to /var/cache/conftool/dbconfig/20230105-131705-ladsgroup.json
  • 13:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 13:03 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 13:03 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:03 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 13:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 13:02 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:02 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:02 oblivian@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 13:02 oblivian@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42853 and previous config saved to /var/cache/conftool/dbconfig/20230105-130158-ladsgroup.json
  • 13:02 oblivian@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 13:01 oblivian@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 13:01 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 13:01 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 13:01 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 13:01 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 13:01 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 13:01 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 13:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:00 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:00 hashar: Restarted Gerrit for a plugin update
  • 12:58 hashar@deploy1002: Finished deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages (duration: 00m 08s)
  • 12:58 hashar@deploy1002: Started deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages
  • 12:52 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:49 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:49 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42852 and previous config saved to /var/cache/conftool/dbconfig/20230105-124651-ladsgroup.json
  • 12:45 hashar@deploy1002: Finished deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages (duration: 00m 10s)
  • 12:45 hashar@deploy1002: Started deploy [gerrit/gerrit@b1ae5b4]: wm-checks-api: fix PCC handling of empty messages
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T326156)', diff saved to https://phabricator.wikimedia.org/P42851 and previous config saved to /var/cache/conftool/dbconfig/20230105-124437-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:44 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:42 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:42 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:31 ladsgroup:: Deployed security patch for T233004 T326293
  • 12:02 hashar: gerrit: running `copy-approvals` script to prepare for Gerrit 3.6 upgrade (T309870): `ssh -p 29418 gerrit.wikimedia.org gerrit copy-approvals --verbose`
  • 11:59 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:58 hashar: Restarting Gerrit
  • 11:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler (duration: 00m 09s)
  • 11:57 hashar@deploy1002: Started deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler
  • 11:57 hashar: Stopping Gerrit for plugin deployment
  • 11:45 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 11:40 effie: disabling puppet on all hosts running mcrouter to merge 860102
  • 11:24 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mwdebug,name=eqiad
  • 11:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:23 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=mwdebug,name=eqiad
  • 11:23 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:22 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mwdebug,name=codfw
  • 11:20 hashar@deploy1002: Finished deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler (duration: 00m 10s)
  • 11:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:20 hashar@deploy1002: Started deploy [gerrit/gerrit@32f984a]: wm-checks-api: add support for Puppet Catalogue Compiler
  • 11:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:19 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=mwdebug,name=codfw
  • 11:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:13 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:13 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:12 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:12 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42850 and previous config saved to /var/cache/conftool/dbconfig/20230105-105808-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42849 and previous config saved to /var/cache/conftool/dbconfig/20230105-104303-root.json
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42848 and previous config saved to /var/cache/conftool/dbconfig/20230105-102758-root.json
  • 10:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 10:26 claime: Rolling reboot of api_appserver hosts in eqiad
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 100%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42847 and previous config saved to /var/cache/conftool/dbconfig/20230105-102357-root.json
  • 10:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42846 and previous config saved to /var/cache/conftool/dbconfig/20230105-101253-root.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 75%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42845 and previous config saved to /var/cache/conftool/dbconfig/20230105-100852-root.json
  • 10:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 10:06 claime: Restarting rolling reboot of api_appserver hosts in codfw
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42844 and previous config saved to /var/cache/conftool/dbconfig/20230105-095748-root.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 60%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42843 and previous config saved to /var/cache/conftool/dbconfig/20230105-095347-root.json
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42841 and previous config saved to /var/cache/conftool/dbconfig/20230105-094243-root.json
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 50%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42840 and previous config saved to /var/cache/conftool/dbconfig/20230105-093842-root.json
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: After cloning db1176', diff saved to https://phabricator.wikimedia.org/P42839 and previous config saved to /var/cache/conftool/dbconfig/20230105-092738-root.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 40%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42838 and previous config saved to /var/cache/conftool/dbconfig/20230105-092336-root.json
  • 09:14 XioNoX: turn up BGP to NTT in drmrs - T314929
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2151 (re)pooling @ 25%: Pooling in s6', diff saved to https://phabricator.wikimedia.org/P42837 and previous config saved to /var/cache/conftool/dbconfig/20230105-090831-root.json
  • 08:56 hashar@deploy1002: Finished scap: Backport for [SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367) (duration: 11m 38s)
  • 08:46 hashar@deploy1002: hashar and mlitn: Backport for [SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 08:44 hashar@deploy1002: Started scap: Backport for [SearchVue] Enable extension on ptwiki, ruwiki & idwiki (T310367)
  • 07:58 moritzm: installing glibc security updates on bullseye
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db2151 in s6 T326206', diff saved to https://phabricator.wikimedia.org/P42836 and previous config saved to /var/cache/conftool/dbconfig/20230105-075046-marostegui.json
  • 07:28 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:27 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 07:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 to clone db1176 T326211', diff saved to https://phabricator.wikimedia.org/P42833 and previous config saved to /var/cache/conftool/dbconfig/20230105-064153-marostegui.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2151 for the first time in s6 T326206', diff saved to https://phabricator.wikimedia.org/P42832 and previous config saved to /var/cache/conftool/dbconfig/20230105-063937-marostegui.json
  • 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance

2023-01-04

  • 23:01 mutante: deploy2002 - re-arming keyholder T324014
  • 23:00 mutante: deploy1002 - re-arming keyholder T324014
  • 22:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42831 and previous config saved to /var/cache/conftool/dbconfig/20230104-223545-marostegui.json
  • 22:27 kindrobot: finished UTC late backport window
  • 22:27 kindrobot@deploy1002: Finished scap: Backport for Fix underlinkedness rescore logic (T301096), Fix underlinkedness rescore logic (T301096) (duration: 15m 20s)
  • 22:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P42828 and previous config saved to /var/cache/conftool/dbconfig/20230104-222038-marostegui.json
  • 22:13 kindrobot@deploy1002: kindrobot and tgr: Backport for Fix underlinkedness rescore logic (T301096), Fix underlinkedness rescore logic (T301096) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 22:11 kindrobot@deploy1002: Started scap: Backport for Fix underlinkedness rescore logic (T301096), Fix underlinkedness rescore logic (T301096)
  • 22:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P42827 and previous config saved to /var/cache/conftool/dbconfig/20230104-220532-marostegui.json
  • 21:51 kindrobot@deploy1002: backport aborted: (duration: 02m 12s)
  • 21:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42826 and previous config saved to /var/cache/conftool/dbconfig/20230104-215025-marostegui.json
  • 21:48 taavi: mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki mediawikiwiki "African Wikimedia Technical Community/Project Scope" "Africa Wikimedia Technical Community/Project Scope" "Taavi" --reason "per request phab:T318292" # T318292
  • 21:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T326011)', diff saved to https://phabricator.wikimedia.org/P42825 and previous config saved to /var/cache/conftool/dbconfig/20230104-214616-marostegui.json
  • 21:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 21:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42824 and previous config saved to /var/cache/conftool/dbconfig/20230104-214555-marostegui.json
  • 21:44 kindrobot@deploy1002: Finished scap: Backport for Add namespace to gorwiktionary (T326253) (duration: 11m 26s)
  • 21:35 kindrobot@deploy1002: kindrobot and jhsoby: Backport for Add namespace to gorwiktionary (T326253) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:33 kindrobot@deploy1002: Started scap: Backport for Add namespace to gorwiktionary (T326253)
  • 21:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P42823 and previous config saved to /var/cache/conftool/dbconfig/20230104-213049-marostegui.json
  • 21:28 kindrobot@deploy1002: Finished scap: Backport for Start writing to cuc_comment_id on group0 and group1 wikis (T233004) (duration: 17m 28s)
  • 21:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P42820 and previous config saved to /var/cache/conftool/dbconfig/20230104-211542-marostegui.json
  • 21:12 kindrobot@deploy1002: kindrobot and zabe: Backport for Start writing to cuc_comment_id on group0 and group1 wikis (T233004) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:10 kindrobot@deploy1002: Started scap: Backport for Start writing to cuc_comment_id on group0 and group1 wikis (T233004)
  • 21:05 kindrobot: starting UTC late backport window
  • 21:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42819 and previous config saved to /var/cache/conftool/dbconfig/20230104-210036-marostegui.json
  • 20:58 Amir1: running refreshGlobalimagelinks.php on all wikis (T322588)
  • 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T326011)', diff saved to https://phabricator.wikimedia.org/P42818 and previous config saved to /var/cache/conftool/dbconfig/20230104-205628-marostegui.json
  • 20:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 20:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42817 and previous config saved to /var/cache/conftool/dbconfig/20230104-205607-marostegui.json
  • 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P42816 and previous config saved to /var/cache/conftool/dbconfig/20230104-204100-marostegui.json
  • 20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P42815 and previous config saved to /var/cache/conftool/dbconfig/20230104-202554-marostegui.json
  • 20:14 cstone: payments-wiki upgraded from ede93d62 to f075991f
  • 20:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42814 and previous config saved to /var/cache/conftool/dbconfig/20230104-201047-marostegui.json
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T326011)', diff saved to https://phabricator.wikimedia.org/P42813 and previous config saved to /var/cache/conftool/dbconfig/20230104-200638-marostegui.json
  • 20:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 20:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42812 and previous config saved to /var/cache/conftool/dbconfig/20230104-200617-marostegui.json
  • 19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P42811 and previous config saved to /var/cache/conftool/dbconfig/20230104-195110-marostegui.json
  • 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P42810 and previous config saved to /var/cache/conftool/dbconfig/20230104-193604-marostegui.json
  • 19:32 dduvall@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.17 refs T325580 (duration: 06m 58s)
  • 19:25 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.17 refs T325580
  • 19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42809 and previous config saved to /var/cache/conftool/dbconfig/20230104-192057-marostegui.json
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T326011)', diff saved to https://phabricator.wikimedia.org/P42808 and previous config saved to /var/cache/conftool/dbconfig/20230104-191648-marostegui.json
  • 19:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 19:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42807 and previous config saved to /var/cache/conftool/dbconfig/20230104-191627-marostegui.json
  • 19:07 dancy@deploy1002: Installing scap version "4.32.0" for 560 hosts
  • 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P42806 and previous config saved to /var/cache/conftool/dbconfig/20230104-190121-marostegui.json
  • 18:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P42805 and previous config saved to /var/cache/conftool/dbconfig/20230104-184614-marostegui.json
  • 18:40 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@84f5f50]: (no justification provided) (duration: 00m 05s)
  • 18:40 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@84f5f50]: (no justification provided)
  • 18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42804 and previous config saved to /var/cache/conftool/dbconfig/20230104-183108-marostegui.json
  • 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T326011)', diff saved to https://phabricator.wikimedia.org/P42803 and previous config saved to /var/cache/conftool/dbconfig/20230104-182700-marostegui.json
  • 18:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 18:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 18:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42802 and previous config saved to /var/cache/conftool/dbconfig/20230104-182425-marostegui.json
  • 18:15 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (after remembering to update the submodules) (duration: 00m 54s)
  • 18:14 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (after remembering to update the submodules)
  • 18:13 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling (duration: 03m 54s)
  • 18:09 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: pushing wmf-puppet-dashboard updates for enc git handling
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P42801 and previous config saved to /var/cache/conftool/dbconfig/20230104-180918-marostegui.json
  • 18:00 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P42800 and previous config saved to /var/cache/conftool/dbconfig/20230104-175412-marostegui.json
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42799 and previous config saved to /var/cache/conftool/dbconfig/20230104-173905-marostegui.json
  • 17:37 dancy@deploy1002: Installing scap version "4.31.1" for 560 hosts
  • 17:36 dancy@deploy1002: Finished scap: testing (duration: 07m 50s)
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T326011)', diff saved to https://phabricator.wikimedia.org/P42798 and previous config saved to /var/cache/conftool/dbconfig/20230104-173455-marostegui.json
  • 17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42797 and previous config saved to /var/cache/conftool/dbconfig/20230104-173434-marostegui.json
  • 17:28 dancy@deploy1002: Started scap: testing
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P42796 and previous config saved to /var/cache/conftool/dbconfig/20230104-171928-marostegui.json
  • 17:10 mutante: new Wikipedia (and other projects) language added: guc - https://en.wikipedia.org/wiki/Wayuu_language - https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Wayuu T321880
  • 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P42795 and previous config saved to /var/cache/conftool/dbconfig/20230104-170421-marostegui.json
  • 17:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:55 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@84f5f50]: Bumping platform_eng airflow instance to latest (duration: 00m 17s)
  • 16:54 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@84f5f50]: Bumping platform_eng airflow instance to latest
  • 16:49 dancy@deploy1002: Installing scap version "4.30.3-1" for 560 hosts
  • 16:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42794 and previous config saved to /var/cache/conftool/dbconfig/20230104-164915-marostegui.json
  • 16:48 dancy@deploy1002: Finished scap: testing (duration: 13m 16s)
  • 16:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T326011)', diff saved to https://phabricator.wikimedia.org/P42793 and previous config saved to /var/cache/conftool/dbconfig/20230104-164504-marostegui.json
  • 16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 16:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 16:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:37 dancy@deploy1002: Started scap: testing
  • 16:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:33 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:30 dancy@deploy1002: Installing scap version "4.31.0" for 560 hosts
  • 16:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42792 and previous config saved to /var/cache/conftool/dbconfig/20230104-162828-marostegui.json
  • 16:29 dancy@deploy1002: sync-world aborted: (no justification provided) (duration: 00m 13s)
  • 16:27 dancy@deploy1002: Started scap: (no justification provided)
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P42791 and previous config saved to /var/cache/conftool/dbconfig/20230104-161321-marostegui.json
  • 15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2402.*
  • 15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2401.*
  • 15:59 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,name=mw2400.*
  • 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P42790 and previous config saved to /var/cache/conftool/dbconfig/20230104-155815-marostegui.json
  • 15:51 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42789 and previous config saved to /var/cache/conftool/dbconfig/20230104-154308-marostegui.json
  • 15:34 moritzm: installing glibc security updates on bullseye
  • 15:34 moritzm: installing glibc security updates
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T326011)', diff saved to https://phabricator.wikimedia.org/P42788 and previous config saved to /var/cache/conftool/dbconfig/20230104-153435-marostegui.json
  • 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42787 and previous config saved to /var/cache/conftool/dbconfig/20230104-153413-marostegui.json
  • 15:33 ladsgroup@deploy1002: Finished scap: Backport for Disable LoadMonitor in CLI (T322156) (duration: 09m 48s)
  • 15:32 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:32 claime: Restarting rolling reboot of api_appserver hosts in codfw
  • 15:25 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Disable LoadMonitor in CLI (T322156) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 15:23 ladsgroup@deploy1002: Started scap: Backport for Disable LoadMonitor in CLI (T322156)
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P42786 and previous config saved to /var/cache/conftool/dbconfig/20230104-151907-marostegui.json
  • 15:06 marostegui: dbmaint deploy schema change on s5 eqiad T326224
  • 15:05 marostegui: dbmaint deploy schema change on s3 eqiad T326224
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P42785 and previous config saved to /var/cache/conftool/dbconfig/20230104-150400-marostegui.json
  • 15:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
  • 15:00 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42784 and previous config saved to /var/cache/conftool/dbconfig/20230104-144853-marostegui.json
  • 14:46 marostegui: dbmaint deploy schema change on s3 eqiad T326222
  • 14:44 marostegui: dbmaint deploy schema change on s5 eqiad T326222
  • 14:42 XioNoX: fix inconsistent mtu betwen cr1-eqiad<->lsw1-f1 - T315838
  • 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T326011)', diff saved to https://phabricator.wikimedia.org/P42783 and previous config saved to /var/cache/conftool/dbconfig/20230104-144025-marostegui.json
  • 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 14:40 urbanecm: UTC afternoon B&C window done
  • 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42782 and previous config saved to /var/cache/conftool/dbconfig/20230104-143949-marostegui.json
  • 14:38 marostegui: dbmaint deploy schema change on s3 eqiad T326223
  • 14:38 urbanecm@deploy1002: Finished scap: Backport for Start reading from cul_actor on testwiki (T233004), aswikiquote: Set timezone to Asia/Kolkata (T321246) (duration: 09m 50s)
  • 14:37 marostegui: dbmaint deploy schema change on s5 eqiad T326223
  • 14:32 XioNoX: fix inconsistent mtu on mr1-eqiad - T315838
  • 14:30 urbanecm@deploy1002: urbanecm and urbanecm and zabe: Backport for Start reading from cul_actor on testwiki (T233004), aswikiquote: Set timezone to Asia/Kolkata (T321246) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:28 urbanecm@deploy1002: Started scap: Backport for Start reading from cul_actor on testwiki (T233004), aswikiquote: Set timezone to Asia/Kolkata (T321246)
  • 14:27 urbanecm@deploy1002: Finished scap: Backport for plwiki: Add editcontentmodel to interface-admin (T325819), Mark active sections even when their headings are in wrapper elements (T318044 T324869) (duration: 09m 32s)
  • 14:27 XioNoX: fix inconsistent mtu on mr1-codfw - T315838
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P42781 and previous config saved to /var/cache/conftool/dbconfig/20230104-142442-marostegui.json
  • 14:24 marostegui: dbmaint deploy schema change on s7 eqiad T326227
  • 14:22 XioNoX: fix inconsistent mtu on mr1-eqsin - T315838
  • 14:19 urbanecm@deploy1002: urbanecm and stang and matmarex: Backport for plwiki: Add editcontentmodel to interface-admin (T325819), Mark active sections even when their headings are in wrapper elements (T318044 T324869) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:18 urbanecm@deploy1002: Started scap: Backport for plwiki: Add editcontentmodel to interface-admin (T325819), Mark active sections even when their headings are in wrapper elements (T318044 T324869)
  • 14:16 urbanecm@deploy1002: backport aborted: (duration: 00m 07s)
  • 14:16 urbanecm@deploy1002: Finished scap: Backport for Revert "trwiki: Add 20 years celebration logos" (T325823), kuwiki: Install SandboxLink (T325469) (duration: 09m 37s)
  • 14:16 marostegui: Sanitize new wikis T326138 T321294 T321288 T321256
  • 14:15 XioNoX: fix inconsistent mtu on mr1-esams - T315838
  • 14:14 marostegui: dbmaint deploy schema change on s7 eqiad T326228
  • 14:13 marostegui: dbmaint deploy schema change on s7 eqiad T326226
  • 14:11 marostegui: dbmaint deploy schema change on s8 eqiad T326221
  • 14:11 marostegui: dbmaint deploy schema change on s7 eqiad T326221
  • 14:11 marostegui: dbmaint deploy schema change on s6 eqiad T326221
  • 14:11 marostegui: dbmaint deploy schema change on s5 eqiad T326221
  • 14:11 marostegui: dbmaint deploy schema change on s4 eqiad T326221
  • 14:11 marostegui: dbmaint deploy schema change on s3 eqiad T326221
  • 14:11 marostegui: dbmaint deploy schema change on s2 eqiad T326221
  • 14:11 marostegui: dbmaint deploy schema change on s1 eqiad T326221
  • 14:10 marostegui: dbmaint deploy schema change on s7 eqiad T326225
  • 14:10 marostegui: dbmaint deploy schema change on s7 T326225
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetdb2002.codfw.wmnet with reason: maintenance
  • 14:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetdb2002.codfw.wmnet with reason: maintenance
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P42780 and previous config saved to /var/cache/conftool/dbconfig/20230104-140936-marostegui.json
  • 14:08 urbanecm@deploy1002: urbanecm and stang: Backport for Revert "trwiki: Add 20 years celebration logos" (T325823), kuwiki: Install SandboxLink (T325469) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:06 urbanecm@deploy1002: Started scap: Backport for Revert "trwiki: Add 20 years celebration logos" (T325823), kuwiki: Install SandboxLink (T325469)
  • 14:04 XioNoX: fix inconsistent mtu on mr1-ulsfo - T315838
  • 14:02 marostegui: dbmaint deploy schema change on s3 T326221
  • 14:02 moritzm: updating buster nodes running 5.10 to 5.10.158-2~deb10u1 (only rollout of the new kernel, no reboots)
  • 14:02 urbanecm@deploy1002: Finished scap: Backport for Update interwiki cache (duration: 08m 00s)
  • 13:58 marostegui: dbmaint deploy schema change on s7 T326221
  • 13:57 marostegui: dbmaint deploy schema change on s8 T326221
  • 13:57 marostegui: dbmaint deploy schema change on s6 T326221
  • 13:56 marostegui: dbmaint deploy schema change on s5 T326221
  • 13:55 marostegui: dbmaint deploy schema change on s4 T326221
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42779 and previous config saved to /var/cache/conftool/dbconfig/20230104-135429-marostegui.json
  • 13:54 urbanecm@deploy1002: Started scap: Backport for Update interwiki cache
  • 13:54 marostegui: dbmaint deploy schema change on s2 T326221
  • 13:53 marostegui: dbmaint deploy schema change on s1 T326221
  • 13:52 urbanecm@deploy1002: Finished scap: Creating gorwiktionary (T326137), fixing aswikiquote logo (T321246) (duration: 07m 52s)
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T326011)', diff saved to https://phabricator.wikimedia.org/P42778 and previous config saved to /var/cache/conftool/dbconfig/20230104-134544-marostegui.json
  • 13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 13:45 XioNoX: repool esams-eqiad link for mtu change - T315838
  • 13:44 urbanecm@deploy1002: Started scap: Creating gorwiktionary (T326137), fixing aswikiquote logo (T321246)
  • 13:41 XioNoX: drain esams-eqiad link for mtu change - T315838
  • 13:39 urbanecm@deploy1002: Finished scap: Backport for Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137), Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137) (duration: 38m 23s)
  • 13:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42777 and previous config saved to /var/cache/conftool/dbconfig/20230104-133830-marostegui.json
  • 13:33 XioNoX: fix missmatch MTU on pfw3-codfw - T315838
  • 13:31 urbanecm: New wiki creation will run over by a couple of minutes
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42776 and previous config saved to /var/cache/conftool/dbconfig/20230104-132323-marostegui.json
  • 13:15 XioNoX: fix missmatch MTU on cloudsw switches - T315838
  • 13:11 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1001"
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P42775 and previous config saved to /var/cache/conftool/dbconfig/20230104-130816-marostegui.json
  • 13:00 urbanecm@deploy1002: Started scap: Backport for Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137), Add messages for Gorontalo Wiktionary (gorwiktionary) (T326137)
  • 12:58 urbanecm@deploy1002: Finished scap: Creating shnwikibooks (T321248) (duration: 07m 38s)
  • 12:56 moritzm: installing emacs security updates
  • 12:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42774 and previous config saved to /var/cache/conftool/dbconfig/20230104-125330-root.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42773 and previous config saved to /var/cache/conftool/dbconfig/20230104-125310-marostegui.json
  • 12:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 12:50 urbanecm@deploy1002: Started scap: Creating shnwikibooks (T321248)
  • 12:48 urbanecm@deploy1002: Finished scap: Creating guwwikiquote (T321247) (duration: 07m 44s)
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T326011)', diff saved to https://phabricator.wikimedia.org/P42772 and previous config saved to /var/cache/conftool/dbconfig/20230104-124424-marostegui.json
  • 12:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42771 and previous config saved to /var/cache/conftool/dbconfig/20230104-124403-marostegui.json
  • 12:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:41 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:41 urbanecm@deploy1002: Started scap: Creating guwwikiquote (T321247)
  • 12:40 claime: Rolling reboot of api_appserver hosts in codfw paused for https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20230104T1200
  • 12:38 urbanecm@deploy1002: Finished scap: Creating aswikiquote (T321246) (duration: 07m 49s)
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42770 and previous config saved to /var/cache/conftool/dbconfig/20230104-123825-root.json
  • 12:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
  • 12:31 urbanecm@deploy1002: Started scap: Creating aswikiquote (T321246)
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P42769 and previous config saved to /var/cache/conftool/dbconfig/20230104-122857-marostegui.json
  • 12:27 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 12:26 urbanecm@deploy1002: Finished scap: Backport for Add namespace translations in Wayuu (T321881), Add namespace translations in Wayuu (T321881) (duration: 10m 36s)
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42768 and previous config saved to /var/cache/conftool/dbconfig/20230104-122320-root.json
  • 12:18 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Add namespace translations in Wayuu (T321881), Add namespace translations in Wayuu (T321881) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 12:16 urbanecm@deploy1002: Started scap: Backport for Add namespace translations in Wayuu (T321881), Add namespace translations in Wayuu (T321881)
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P42767 and previous config saved to /var/cache/conftool/dbconfig/20230104-121350-marostegui.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 25%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42766 and previous config saved to /var/cache/conftool/dbconfig/20230104-120815-root.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42765 and previous config saved to /var/cache/conftool/dbconfig/20230104-115844-marostegui.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 10%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42764 and previous config saved to /var/cache/conftool/dbconfig/20230104-115310-root.json
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T326011)', diff saved to https://phabricator.wikimedia.org/P42763 and previous config saved to /var/cache/conftool/dbconfig/20230104-115011-marostegui.json
  • 11:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 5%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42761 and previous config saved to /var/cache/conftool/dbconfig/20230104-113805-root.json
  • 11:33 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetdb2003.codfw.wmnet
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2151 to dbctl depooled T326206', diff saved to https://phabricator.wikimedia.org/P42759 and previous config saved to /var/cache/conftool/dbconfig/20230104-112801-marostegui.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 1%: After cloning db2151', diff saved to https://phabricator.wikimedia.org/P42758 and previous config saved to /var/cache/conftool/dbconfig/20230104-112300-root.json
  • 11:02 vgutierrez: testing HAProxy 2.4.20 in cp4037 and cp4045
  • 10:56 vgutierrez: (apt1001) import HAproxy 2.4.20 from third-party repo for buster and bullseye
  • 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 1098 hosts
  • 10:48 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 1098 hosts
  • 10:48 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 894 hosts
  • 10:47 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 894 hosts
  • 10:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:37 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2124 T326206', diff saved to https://phabricator.wikimedia.org/P42756 and previous config saved to /var/cache/conftool/dbconfig/20230104-103109-marostegui.json
  • 10:29 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 10:29 claime: Rolling reboot of api_appserver hosts in codfw
  • 10:24 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 10:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 10:14 claime: Rolling reboot of mwdebug hosts in eqiad
  • 10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 10:04 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 10:04 marostegui: dbmaint eqiad deploy schema change on s5 T326011
  • 10:04 claime: Rolling reboot of mwdebug hosts in codfw
  • 10:04 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:04 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:04 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:03 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:03 filippo@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:03 filippo@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:03 filippo@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:03 filippo@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 10:03 filippo@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:03 filippo@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:03 filippo@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 10:03 filippo@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:03 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:03 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:03 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:03 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:03 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:03 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:03 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:02 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:02 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:01 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:01 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:00 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 09:53 effie: Upload imposm3_0.11.1-1 to buster-wikimedia - T325293
  • 09:48 XioNoX: drmrs: offload traffic from Tata - T324955
  • 09:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56286
  • 09:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56286
  • 09:37 marostegui: dbmaint codfw deploy schema change on s5 T326011
  • 09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet
  • 09:29 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 09:08 matthiasmullie: UTC morning backports done
  • 09:07 mlitn@deploy1002: Finished scap: Backport for Squashed diff to catch up to wmf/1.40.0-wmf.17 (duration: 08m 13s)
  • 09:01 mlitn@deploy1002: mlitn and mlitn: Backport for Squashed diff to catch up to wmf/1.40.0-wmf.17 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 09:00 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb1003.eqiad.wmnet
  • 08:59 mlitn@deploy1002: Started scap: Backport for Squashed diff to catch up to wmf/1.40.0-wmf.17
  • 08:57 mlitn@deploy1002: Finished scap: Backport for Change IW breakpoint to be enabled on smaller screen (T321377) (duration: 08m 56s)
  • 08:50 mlitn@deploy1002: mlitn and mlitn: Backport for Change IW breakpoint to be enabled on smaller screen (T321377) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:48 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 08:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb1003.eqiad.wmnet
  • 08:48 mlitn@deploy1002: Started scap: Backport for Change IW breakpoint to be enabled on smaller screen (T321377)
  • 08:32 mlitn@deploy1002: Finished scap: Backport for Always show search results at full width (T321377) (duration: 08m 22s)
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: After testing', diff saved to https://phabricator.wikimedia.org/P42755 and previous config saved to /var/cache/conftool/dbconfig/20230104-082942-root.json
  • 08:26 marostegui: dbmaint codfw deploy schema change on s8 T326011
  • 08:26 marostegui: dbmaint eqiad deploy schema change on s8 T326011
  • 08:26 marostegui: dbmaint eqiad deploy schema change on s4 T326011
  • 08:26 marostegui: dbmaint codfw deploy schema change on s4 T326011
  • 08:26 marostegui: dbmaint codfw deploy schema change on s4 T255174
  • 08:26 marostegui: dbmaint eqiad deploy schema change on s4 T255174
  • 08:25 mlitn@deploy1002: mlitn and mlitn: Backport for Always show search results at full width (T321377) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 08:23 mlitn@deploy1002: Started scap: Backport for Always show search results at full width (T321377)
  • 08:22 marostegui: dbmaint eqiad deploy schema change on s8 T255174
  • 08:20 marostegui: dbmaint codfw deploy schema change on s8 T255174
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: After testing', diff saved to https://phabricator.wikimedia.org/P42754 and previous config saved to /var/cache/conftool/dbconfig/20230104-081437-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: After testing', diff saved to https://phabricator.wikimedia.org/P42753 and previous config saved to /var/cache/conftool/dbconfig/20230104-075932-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: After testing', diff saved to https://phabricator.wikimedia.org/P42752 and previous config saved to /var/cache/conftool/dbconfig/20230104-074427-root.json
  • 07:38 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 07:38 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 07:38 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 07:38 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 07:38 marostegui: Switch x1 back to RBR T255174
  • 07:35 marostegui: dbmaint codfw deploy schema change on x1 T255174
  • 07:35 marostegui: dbmaint eqiad deploy schema change on x1 T255174
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 10%: After testing', diff saved to https://phabricator.wikimedia.org/P42751 and previous config saved to /var/cache/conftool/dbconfig/20230104-072922-root.json
  • 07:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 07:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 5%: After testing', diff saved to https://phabricator.wikimedia.org/P42750 and previous config saved to /var/cache/conftool/dbconfig/20230104-071417-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2131 (re)pooling @ 1%: After testing', diff saved to https://phabricator.wikimedia.org/P42749 and previous config saved to /var/cache/conftool/dbconfig/20230104-065912-root.json

2023-01-03

2023-01-02

  • 10:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host otrs1001.eqiad.wmnet
  • 10:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host otrs1001.eqiad.wmnet

Other archives

2000s

2010s

2020s