Server Admin Log/Archive 72

From Wikitech

2023-10-31

  • 23:59 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1111.eqiad.wmnet with OS bullseye
  • 23:51 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1110.eqiad.wmnet with OS bullseye
  • 23:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 23:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
  • 23:38 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 23:38 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
  • 23:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 23:30 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 23:23 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 23:23 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 23:23 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
  • 23:22 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1111.eqiad.wmnet with OS bullseye
  • 23:15 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1107.eqiad.wmnet with OS bullseye
  • 23:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 23:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 23:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 23:15 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1111.eqiad.wmnet with OS bullseye
  • 23:14 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
  • 23:14 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
  • 23:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1109.eqiad.wmnet with OS bullseye
  • 23:09 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1108.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1111.eqiad.wmnet with OS bullseye
  • 23:08 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
  • 23:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 23:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 23:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 22:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 22:54 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
  • 22:53 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 22:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 22:49 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
  • 22:48 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 22:38 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 22:38 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 22:34 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 22:33 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
  • 22:33 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 22:33 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1108.eqiad.wmnet with OS bullseye
  • 22:25 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 22:24 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 22:24 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 22:24 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
  • 22:24 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 22:24 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1108.eqiad.wmnet with OS bullseye
  • 22:19 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 22:18 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1106.eqiad.wmnet with OS bullseye
  • 22:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 22:17 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1108.eqiad.wmnet with OS bullseye
  • 22:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 22:16 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1107.eqiad.wmnet with OS bullseye
  • 22:05 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 22:02 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 22:02 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
  • 21:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
  • 21:54 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
  • 21:53 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 21:46 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1103.eqiad.wmnet
  • 21:39 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
  • 21:38 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1106.eqiad.wmnet with OS bullseye
  • 21:38 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1103.eqiad.wmnet
  • 21:37 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp1103.eqiad.wmnet
  • 21:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1103.eqiad.wmnet
  • 21:34 eileen: civicrm upgraded from 86a08564 to 31d53b57
  • 21:28 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
  • 21:28 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1106.eqiad.wmnet with OS bullseye
  • 21:21 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
  • 21:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1105.eqiad.wmnet with OS bullseye
  • 21:16 eileen: civicrm upgraded from a458c2bb to 86a08564
  • 20:58 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 20:55 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 20:40 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 20:32 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1105.eqiad.wmnet with OS bullseye
  • 20:16 TheresNoTime: close UTC late backport window
  • 20:14 samtar@deploy2002: Finished scap: Backport for Deploy vector 2022 to non-English Wikibooks, etc (T349544) (duration: 10m 51s)
  • 20:08 samtar@deploy2002: samtar and ksarabia: Continuing with sync
  • 20:05 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:05 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:04 samtar@deploy2002: samtar and ksarabia: Backport for Deploy vector 2022 to non-English Wikibooks, etc (T349544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:03 samtar@deploy2002: Started scap: Backport for Deploy vector 2022 to non-English Wikibooks, etc (T349544)
  • 19:56 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:55 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 19:12 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 19:12 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1105.eqiad.wmnet with OS bullseye
  • 19:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 18:59 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1104.eqiad.wmnet with OS bullseye
  • 18:50 ejegg: restarted fundraising scheduled jobs
  • 18:40 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
  • 18:37 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
  • 18:24 ejegg: disabled fundraising scheduled jobs for table alter
  • 18:24 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.3 refs T348356
  • 18:22 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
  • 18:10 ejegg: fundraising civicrm upgraded from 5862a3fc to a458c2bb
  • 18:04 sukhe: reprepro -C component/dnsdist include bookworm-wikimedia dnsdist_1.8.2-1+wmf12u1_amd64.changes
  • 17:59 taavi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:56 taavi@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:52 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1104.eqiad.wmnet with OS bullseye
  • 17:51 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1002
  • 17:51 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1002
  • 17:43 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:43 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:42 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:42 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:42 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:42 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 17:27 Krinkle: krinkle@deploy2002:/srv/mediawiki/private: fix untracked warning for readme.FatalErrorSettings.php
  • 16:49 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 16:49 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1103.eqiad.wmnet with OS bullseye
  • 16:35 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 16:34 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 16:31 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
  • 16:30 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1104.eqiad.wmnet with OS bullseye
  • 16:27 taavi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 16:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:23 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 16:23 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:22 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:20 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
  • 16:15 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 16:15 taavi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:12 taavi@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:12 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:11 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1002 - taavi@cumin1001"
  • 16:10 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1002 - taavi@cumin1001"
  • 16:08 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 16:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 16:04 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:52 arnaudb@cumin1001: dbctl commit (dc=all): 'discard db1131', diff saved to https://phabricator.wikimedia.org/P53120 and previous config saved to /var/cache/conftool/dbconfig/20231031-155253-arnaudb.json
  • 15:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:42 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db1131.eqiad.wmnet
  • 15:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1131.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:41 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1131.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
  • 15:38 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
  • 15:33 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1131.eqiad.wmnet
  • 15:29 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 15:28 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 15:26 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:25 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:25 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:24 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:23 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T343198)', diff saved to https://phabricator.wikimedia.org/P53119 and previous config saved to /var/cache/conftool/dbconfig/20231031-152105-arnaudb.json
  • 15:11 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:11 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53118 and previous config saved to /var/cache/conftool/dbconfig/20231031-150558-arnaudb.json
  • 15:06 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:06 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:05 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:05 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:04 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 15:04 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 14:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 14:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53117 and previous config saved to /var/cache/conftool/dbconfig/20231031-145052-arnaudb.json
  • 14:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T343198)', diff saved to https://phabricator.wikimedia.org/P53116 and previous config saved to /var/cache/conftool/dbconfig/20231031-143545-arnaudb.json
  • 14:13 sukhe: install4002:/etc/dhcp/automation/ttyS1-115200 rm cp4052.conf
  • 14:06 sbassett: Deployed updated security mitigation for T348828
  • 13:59 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:58 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:49 ejegg: fundraising civicrm upgraded from 71d26d3b to 5862a3fc
  • 13:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 13:36 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 13:36 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 13:30 TheresNoTime: close UTC afternoon backport window
  • 13:27 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 13:27 samtar@deploy2002: Finished scap: Backport for Roll-out Parsoid Kartographer support for all English language wikis (T342871) (duration: 10m 49s)
  • 13:22 samtar@deploy2002: ihurbain and samtar: Continuing with sync
  • 13:18 samtar@deploy2002: ihurbain and samtar: Backport for Roll-out Parsoid Kartographer support for all English language wikis (T342871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:17 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 13:16 samtar@deploy2002: Started scap: Backport for Roll-out Parsoid Kartographer support for all English language wikis (T342871)
  • 12:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 100%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53113 and previous config saved to /var/cache/conftool/dbconfig/20231031-125348-arnaudb.json
  • 12:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53112 and previous config saved to /var/cache/conftool/dbconfig/20231031-124918-arnaudb.json
  • 12:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 80%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53108 and previous config saved to /var/cache/conftool/dbconfig/20231031-122338-arnaudb.json
  • 12:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 80%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53107 and previous config saved to /var/cache/conftool/dbconfig/20231031-121908-arnaudb.json
  • 12:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 70%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53106 and previous config saved to /var/cache/conftool/dbconfig/20231031-120833-arnaudb.json
  • 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 70%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53105 and previous config saved to /var/cache/conftool/dbconfig/20231031-120403-arnaudb.json
  • 11:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 60%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53104 and previous config saved to /var/cache/conftool/dbconfig/20231031-115328-arnaudb.json
  • 11:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 60%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53103 and previous config saved to /var/cache/conftool/dbconfig/20231031-114858-arnaudb.json
  • 11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 50%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53102 and previous config saved to /var/cache/conftool/dbconfig/20231031-113823-arnaudb.json
  • 11:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53101 and previous config saved to /var/cache/conftool/dbconfig/20231031-113353-arnaudb.json
  • 11:24 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1007.eqiad.wmnet with OS bookworm
  • 11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 40%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53099 and previous config saved to /var/cache/conftool/dbconfig/20231031-112318-arnaudb.json
  • 11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 40%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53098 and previous config saved to /var/cache/conftool/dbconfig/20231031-111849-arnaudb.json
  • 11:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 30%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53097 and previous config saved to /var/cache/conftool/dbconfig/20231031-110813-arnaudb.json
  • 11:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 30%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53096 and previous config saved to /var/cache/conftool/dbconfig/20231031-110344-arnaudb.json
  • 10:53 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
  • 10:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 20%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53095 and previous config saved to /var/cache/conftool/dbconfig/20231031-105308-arnaudb.json
  • 10:50 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
  • 10:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 20%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53094 and previous config saved to /var/cache/conftool/dbconfig/20231031-104839-arnaudb.json
  • 10:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 10%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53093 and previous config saved to /var/cache/conftool/dbconfig/20231031-103804-arnaudb.json
  • 10:37 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1007.eqiad.wmnet with OS bookworm
  • 10:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53092 and previous config saved to /var/cache/conftool/dbconfig/20231031-103334-arnaudb.json
  • 10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 5%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53091 and previous config saved to /var/cache/conftool/dbconfig/20231031-102259-arnaudb.json
  • 10:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53090 and previous config saved to /var/cache/conftool/dbconfig/20231031-101829-arnaudb.json
  • 10:17 arnaudb@cumin1001: dbctl commit (dc=all): 'set db1230 as a depooled host', diff saved to https://phabricator.wikimedia.org/P53089 and previous config saved to /var/cache/conftool/dbconfig/20231031-101750-arnaudb.json
  • 09:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T343198)', diff saved to https://phabricator.wikimedia.org/P53088 and previous config saved to /var/cache/conftool/dbconfig/20231031-095054-arnaudb.json
  • 09:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 09:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 09:47 arnaudb@cumin1001: dbctl commit (dc=all): 'set db1230 as a depooled host', diff saved to https://phabricator.wikimedia.org/P53087 and previous config saved to /var/cache/conftool/dbconfig/20231031-094737-arnaudb.json
  • 09:39 arnaudb@cumin1001: dbctl commit (dc=all): 'set db1230 as a depooled host', diff saved to https://phabricator.wikimedia.org/P53086 and previous config saved to /var/cache/conftool/dbconfig/20231031-093919-arnaudb.json
  • 09:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53085 and previous config saved to /var/cache/conftool/dbconfig/20231031-093457-arnaudb.json
  • 09:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Set ', diff saved to https://phabricator.wikimedia.org/P53084 and previous config saved to /var/cache/conftool/dbconfig/20231031-093448-arnaudb.json
  • 09:01 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 09:00 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 08:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53083 and previous config saved to /var/cache/conftool/dbconfig/20231031-085740-arnaudb.json
  • 08:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 config append', diff saved to https://phabricator.wikimedia.org/P53082 and previous config saved to /var/cache/conftool/dbconfig/20231031-085615-arnaudb.json
  • 08:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 90%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53081 and previous config saved to /var/cache/conftool/dbconfig/20231031-085346-arnaudb.json
  • 08:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53080 and previous config saved to /var/cache/conftool/dbconfig/20231031-083841-arnaudb.json
  • 08:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 60%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53079 and previous config saved to /var/cache/conftool/dbconfig/20231031-082336-arnaudb.json
  • 08:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 45%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53078 and previous config saved to /var/cache/conftool/dbconfig/20231031-080832-arnaudb.json
  • 07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 30%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53077 and previous config saved to /var/cache/conftool/dbconfig/20231031-075327-arnaudb.json
  • 07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 15%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53076 and previous config saved to /var/cache/conftool/dbconfig/20231031-073822-arnaudb.json
  • 07:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 weight rebalancing - depooled', diff saved to https://phabricator.wikimedia.org/P53075 and previous config saved to /var/cache/conftool/dbconfig/20231031-073652-arnaudb.json
  • 07:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 weight rebalancing', diff saved to https://phabricator.wikimedia.org/P53074 and previous config saved to /var/cache/conftool/dbconfig/20231031-073312-arnaudb.json
  • 07:30 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 depooling from API and pooling in db2140', diff saved to https://phabricator.wikimedia.org/P53073 and previous config saved to /var/cache/conftool/dbconfig/20231031-073023-arnaudb.json
  • 07:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 weight mimic old db2140', diff saved to https://phabricator.wikimedia.org/P53072 and previous config saved to /var/cache/conftool/dbconfig/20231031-071938-arnaudb.json
  • 07:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary and set section read-write T349820', diff saved to https://phabricator.wikimedia.org/P53071 and previous config saved to /var/cache/conftool/dbconfig/20231031-070549-arnaudb.json
  • 07:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T349820', diff saved to https://phabricator.wikimedia.org/P53070 and previous config saved to /var/cache/conftool/dbconfig/20231031-070405-arnaudb.json
  • 07:02 arnaudb: Starting s4 codfw failover from db2179 to db2140 - T349820
  • 06:49 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" (duration: 07m 12s)
  • 06:44 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:43 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:42 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master"
  • 06:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 T349820', diff saved to https://phabricator.wikimedia.org/P53068 and previous config saved to /var/cache/conftool/dbconfig/20231031-063647-arnaudb.json
  • 06:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 34 hosts with reason: Primary switchover s4 T349820
  • 06:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 34 hosts with reason: Primary switchover s4 T349820
  • 06:31 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master (duration: 06m 50s)
  • 06:26 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:25 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc1 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:24 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master
  • 03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.1 (duration: 02m 14s)
  • 03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.3 refs T348356 (duration: 50m 44s)
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.3 refs T348356
  • 00:46 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 00:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 00:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye

2023-10-30

  • 23:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 23:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 23:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 21:22 sbassett: Deployed updated security mitigation for T348828
  • 21:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for search-loader[2001-2002].codfw.wmnet,search-loader[1001-1002].eqiad.wmnet
  • 21:19 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for search-loader[2001-2002].codfw.wmnet,search-loader[1001-1002].eqiad.wmnet
  • 20:58 ejegg: re-enabled fundraising scheduled jobs after deployment
  • 20:45 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:45 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 20:44 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:44 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 20:43 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:43 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:41 ejegg: fundraising civicrm upgraded from 2c79475e to 71d26d3b
  • 20:40 ejegg: disable fundraising scheduled jobs for deployment
  • 20:29 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:29 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:28 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:21 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:20 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3004.wikimedia.org with OS bookworm
  • 20:17 dancy@deploy2002: Finished scap: Backport for namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) (T349970) (duration: 10m 09s)
  • 20:11 dancy@deploy2002: dancy and rhinosf1: Continuing with sync
  • 20:10 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:08 dancy@deploy2002: dancy and rhinosf1: Backport for namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) (T349970) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:07 dancy@deploy2002: Started scap: Backport for namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) (T349970)
  • 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
  • 19:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
  • 19:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3004.wikimedia.org with OS bookworm
  • 18:59 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:53 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 18:52 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3003.wikimedia.org with OS bookworm
  • 18:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:35 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:34 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 18:34 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 18:33 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ping_offload
  • 18:27 jbond: migrate ping_offload to puppet7
  • 18:27 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ping_offload
  • 18:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 18:24 sukhe: racadm racreset cp1103.eqiad.wmnet
  • 18:22 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 18:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on search-loader[2001-2002].codfw.wmnet with reason: T346039
  • 18:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on search-loader[2001-2002].codfw.wmnet with reason: T346039
  • 18:16 bking@deploy2002: Finished deploy [search/mjolnir/deploy@daf8c32]: T346039 (duration: 00m 06s)
  • 18:16 bking@deploy2002: Started deploy [search/mjolnir/deploy@daf8c32]: T346039
  • 18:11 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:10 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 17:56 jbond: migrate bastionhost to puppet7
  • 17:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bookworm
  • 17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
  • 17:40 jbond: migrate pki::multirootca to puppet7
  • 17:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
  • 17:27 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host pki2002.codfw.wmnet
  • 17:23 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host pki2002.codfw.wmnet
  • 17:22 jbond: migrate pki2002 to puppet7
  • 17:16 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 17:14 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:12 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bookworm
  • 17:10 jbond: migrate pki::root to puppet7
  • 17:04 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 16:51 sukhe: running authdns-update for CR 969816
  • 16:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4052.ulsfo.wmnet with reason: depooled, reimaging
  • 16:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp4052.ulsfo.wmnet with reason: depooled, reimaging
  • 16:26 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:23 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:22 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
  • 16:22 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
  • 16:21 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
  • 16:16 jbond: migrate O:ganeti_test to puppet7
  • 16:14 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ganeti-test1002.eqiad.wmnet
  • 16:07 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 16:07 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 16:04 jbond: migrate ganeti-test1002.eqiad.wmnet to puppet7
  • 16:03 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host ganeti-test1002.eqiad.wmnet
  • 16:02 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 15:58 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 15:57 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudvirt-wdqs1003 - taavi@cumin1001"
  • 15:56 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudvirt-wdqs1003 - taavi@cumin1001"
  • 15:55 jbond: migrate failoid to puppet7
  • 15:51 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 15:51 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
  • 15:49 jbond: move builder to puppet7
  • 15:49 jbond: move cluster::unprivmanagement to puppet7
  • 15:49 jbond: move config_master to puppet7
  • 15:43 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 15:42 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
  • 15:33 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1003
  • 15:33 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1003
  • 15:30 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1003 - taavi@cumin1001"
  • 15:29 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1003 - taavi@cumin1001"
  • 15:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1003
  • 15:27 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:21 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1003
  • 14:41 bking@deploy2002: Finished deploy [search/mjolnir/deploy@daf8c32]: T346039 (duration: 00m 05s)
  • 14:41 bking@deploy2002: Started deploy [search/mjolnir/deploy@daf8c32]: T346039
  • 14:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on search-loader2001.codfw.wmnet with reason: T346039
  • 14:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 14:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on search-loader2001.codfw.wmnet with reason: T346039
  • 14:36 inflatador: bking@search-loader2001 disabling services as part of bullseye migration T346039
  • 14:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 14:32 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 14:31 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 14:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 12:55 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1130.eqiad.wmnet onto db1230.eqiad.wmnet
  • 12:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1217.eqiad.wmnet with OS bookworm
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'New host', diff saved to https://phabricator.wikimedia.org/P53065 and previous config saved to /var/cache/conftool/dbconfig/20231030-122855-marostegui.json
  • 12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage
  • 12:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage
  • 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1217.eqiad.wmnet with OS bookworm
  • 11:52 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1130.eqiad.wmnet onto db1230.eqiad.wmnet
  • 11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Adding db1230 depooled, depooling db1130', diff saved to https://phabricator.wikimedia.org/P53064 and previous config saved to /var/cache/conftool/dbconfig/20231030-113401-arnaudb.json
  • 11:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
  • 11:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
  • 11:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
  • 11:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
  • 09:42 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@af33784] (releasing): (no justification provided) (duration: 00m 40s)
  • 09:42 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@af33784] (releasing): (no justification provided)
  • 08:29 vgutierrez: switched to digicert-2023 in esams, eqsin and drmrs - T341119
  • 08:17 wmde-fisch@deploy2002: Finished scap: Backport for Cleanup Kartographer Nearby flags (T332785) (duration: 07m 35s)
  • 08:12 wmde-fisch@deploy2002: wmde-fisch: Continuing with sync
  • 08:11 wmde-fisch@deploy2002: wmde-fisch: Backport for Cleanup Kartographer Nearby flags (T332785) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:10 wmde-fisch@deploy2002: Started scap: Backport for Cleanup Kartographer Nearby flags (T332785)
  • 08:10 vgutierrez: triggering a puppet run on cp hosts in esams, eqsin and drmrs to switch to the new unified digicert certificates - T341119
  • 08:06 vgutierrez: repool cp5025 - T341119
  • 08:06 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" (duration: 06m 41s)
  • 08:01 marostegui@deploy2002: marostegui: Continuing with sync
  • 08:00 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:59 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master"
  • 07:52 vgutierrez: depool cp5025 to perform some digicert-2023 related sanity checks - T341119
  • 07:49 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (duration: 06m 36s)
  • 07:48 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:44 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:43 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master
  • 07:35 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 07:34 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 07:29 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" (duration: 06m 33s)
  • 07:24 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:24 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:22 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master"
  • 07:22 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (duration: 14m 04s)
  • 07:18 elukey: arm keyholder on acmechief2002 and deploy1002
  • 07:16 marostegui@deploy2002: marostegui: Continuing with sync
  • 07:16 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:08 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master

2023-10-28

  • 21:25 fabfur: re-pooled cp1089 and cp3069
  • 21:05 fabfur: depooled cp1089 and cp3069 to restart varnish|haproxy and let purged process incoming messages
  • 20:20 fabfur: restarted purged on cp1089, cp6005, cp3069
  • 19:46 fabfur: restarted purged on cp1078

2023-10-27

  • 22:47 rzl: reprepro -C main include bullseye-wikimedia k8s-controller-sidecars_1.0.2-1_source.changes
  • 22:05 ejegg: fundraising civicrm upgraded from 74781efd to 2c79475e
  • 15:38 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2004.codfw.wmnet with OS bullseye
  • 15:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 15:21 herron: power cycled titan1001
  • 14:59 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 14:42 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
  • 14:39 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
  • 14:19 topranks: announcing internal core routes to esams asw's to test policy T344547
  • 14:19 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:18 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:04 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 14:04 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 14:04 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 14:03 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 14:03 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 14:02 jayme@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 13:38 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host acmechief2002.codfw.wmnet
  • 13:38 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bullseye
  • 13:37 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2004.codfw.wmnet with OS bullseye
  • 13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change sretest2004 DNS - cmooney@cumin1001"
  • 13:35 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change sretest2004 DNS - cmooney@cumin1001"
  • 13:33 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:31 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host acmechief2002.codfw.wmnet
  • 13:27 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host acmechief2002.codfw.wmnet
  • 13:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief2002.codfw.wmnet with OS bookworm
  • 13:00 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bullseye
  • 12:55 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:54 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:41 jayme: updated mwdebug1001 to icu67 - T345561
  • 12:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief2002.codfw.wmnet with reason: host reimage
  • 12:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief2002.codfw.wmnet with reason: host reimage
  • 11:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1102.eqiad.wmnet with OS bullseye
  • 11:34 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
  • 11:31 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
  • 11:31 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host acmechief2002.codfw.wmnet with OS bookworm
  • 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief2002.codfw.wmnet - jbond@cumin1001"
  • 11:29 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief2002.codfw.wmnet - jbond@cumin1001"
  • 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) acmechief2002.codfw.wmnet on all recursors
  • 11:29 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache acmechief2002.codfw.wmnet on all recursors
  • 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:29 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief2002.codfw.wmnet - jbond@cumin1001"
  • 11:28 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief2002.codfw.wmnet - jbond@cumin1001"
  • 11:26 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 11:26 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host acmechief2002.codfw.wmnet
  • 11:18 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1102.eqiad.wmnet with OS bullseye
  • 11:17 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1102.eqiad.wmnet with OS bullseye
  • 11:08 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 11:08 volans@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 11:01 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 11:01 jbond@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 11:00 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 10:48 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 10:48 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 10:48 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 10:45 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:45 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 10:44 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 10:40 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1102.eqiad.wmnet with OS bullseye
  • 10:36 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1101.eqiad.wmnet with OS bullseye
  • 10:20 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt-wdqs1001.eqiad.wmnet
  • 10:20 taavi@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudvirt-wdqs1001.eqiad.wmnet
  • 10:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 10:17 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:14 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 10:14 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:14 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:13 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 09:59 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 09:59 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1101.eqiad.wmnet with OS bullseye
  • 09:34 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 09:19 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 09:19 btullis@cumin1001: Added views for new wiki: tlywiki T345169
  • 09:02 moritzm: deployment-prep app servers are now using ICU67/Unicode 13
  • 08:49 moritzm: uploaded libxml2 2.9.4+dfsg1-7+deb10u6+icu67+wmf1 to component/icu67 for buster-wikimedia (rebase of the ICU compat patches on top of the latest buster security update for libxml2) T345561
  • 08:48 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 08:41 moritzm: downgrading dh-python on build2001 to the version which is in Bullseye. Before, 5.20230130~bpo11+1 was installed from bullseye-backports, but that version has dropped the python2 sequence we still need for some Buster builds
  • 08:25 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1004.eqiad.wmnet with OS bookworm
  • 08:10 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: host reimage
  • 08:07 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: host reimage
  • 07:55 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1004.eqiad.wmnet with OS bookworm
  • 07:54 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS bookworm
  • 07:54 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 07:48 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: cloudmetrics1003 reimage
  • 07:48 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: cloudmetrics1003 reimage
  • 07:39 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1003.eqiad.wmnet with reason: host reimage
  • 07:36 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1003.eqiad.wmnet with reason: host reimage
  • 07:32 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 07:24 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bookworm
  • 06:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bullseye
  • 01:49 cstone: civicrm upgraded from 70e0b88d to 74781efd

2023-10-26

  • 22:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2006.wikimedia.org with OS bookworm
  • 22:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
  • 22:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
  • 21:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bookworm
  • 21:45 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:45 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:32 cstone: payments-wiki upgraded from f7407053 to 04428d6e
  • 21:16 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: still trying to get nova to schedule hosts there
  • 21:16 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: still trying to get nova to schedule hosts there
  • 21:12 taavi@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
  • 21:00 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
  • 20:45 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 20:45 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
  • 20:44 cstone: payments-wiki upgraded from f7407053 to 99b330be
  • 20:44 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
  • 20:42 brennen: end of utc late backport & config window
  • 20:42 brennen@deploy2002: Finished scap: Backport for OIDC: Return instead of null for email in profile (T283456) (duration: 07m 25s)
  • 20:41 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2005.wikimedia.org with OS bookworm
  • 20:37 brennen@deploy2002: brennen and tgr: Continuing with sync
  • 20:36 brennen@deploy2002: brennen and tgr: Backport for OIDC: Return instead of null for email in profile (T283456) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:35 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudvirt-wdqs1001 - taavi@cumin1001"
  • 20:34 brennen@deploy2002: Started scap: Backport for OIDC: Return instead of null for email in profile (T283456)
  • 20:34 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudvirt-wdqs1001 - taavi@cumin1001"
  • 20:34 brennen@deploy2002: Finished scap: Backport for Deploy pilot survey on metawiki (T349854) (duration: 08m 56s)
  • 20:31 bvibber: brion running video transcode backfill via mwmaint2002 (requeueTranscodes.php) + job queue
  • 20:29 brennen@deploy2002: dani and brennen: Continuing with sync
  • 20:26 brennen@deploy2002: dani and brennen: Backport for Deploy pilot survey on metawiki (T349854) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:25 brennen@deploy2002: Started scap: Backport for Deploy pilot survey on metawiki (T349854)
  • 20:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 20:20 brennen@deploy2002: Finished scap: Backport for "Soft-launch" iOS-compatible HLS video transcodes (T68722) (duration: 08m 29s)
  • 20:19 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 20:15 brennen@deploy2002: brennen and brion: Continuing with sync
  • 20:14 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 20:13 brennen@deploy2002: brennen and brion: Backport for "Soft-launch" iOS-compatible HLS video transcodes (T68722) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:12 brennen@deploy2002: Started scap: Backport for "Soft-launch" iOS-compatible HLS video transcodes (T68722)
  • 20:11 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
  • 20:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bookworm
  • 19:59 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 19:59 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:43 taavi@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:41 taavi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 19:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2004.wikimedia.org with OS bookworm
  • 19:30 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 19:29 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1001
  • 19:29 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1001
  • 19:28 taavi@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt-wdqs1001
  • 19:28 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1001
  • 19:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
  • 19:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
  • 18:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2004.wikimedia.org with OS bookworm
  • 18:07 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.2 refs T348355
  • 17:53 sukhe: sudo cumin -b1 -s300 'A:dns-rec and not A:codfw' 'systemctl restart pdns-recursor.service'
  • 17:36 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:36 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1001 - taavi@cumin1001"
  • 17:35 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1001 - taavi@cumin1001"
  • 17:32 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 17:19 stevemunene@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 17:17 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:01 sukhe: sudo cumin -b1 -s30 'A:dns-rec and not A:codfw' 'systemctl restart haproxy.service'
  • 16:18 stevemunene@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 16:05 hnowlan@deploy2002: Finished deploy [restbase/deploy@c461bad]: Adding fonwiki T347940 (duration: 16m 53s)
  • 16:04 sukhe: sudo cumin -b1 -s300 'A:dns-rec and A:edges' 'systemctl restart ntp.service'
  • 15:48 hnowlan@deploy2002: Started deploy [restbase/deploy@c461bad]: Adding fonwiki T347940
  • 15:42 sukhe: sudo cumin -b1 -s600 'A:dns-rec and (A:eqiad or A:codfw)' 'systemctl restart ntp.service'
  • 15:42 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
  • 15:35 jgiannelos@deploy2002: Finished deploy [restbase/deploy@4c14785]: (no justification provided) (duration: 13m 21s)
  • 15:30 XioNoX: test add BGP session between ssw1-e1-eqiad and lsw1-e8-eqiad
  • 15:22 jgiannelos@deploy2002: Started deploy [restbase/deploy@4c14785]: (no justification provided)
  • 15:15 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 15:09 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 14:53 Lucas_WMDE: UTC afternoon backport+config window (belatedly) done
  • 14:52 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) (duration: 14m 01s)
  • 14:49 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
  • 14:46 lucaswerkmeister-wmde@deploy2002: kartik and lucaswerkmeister-wmde: Continuing with sync
  • 14:42 jgiannelos@deploy2002: Finished deploy [restbase/deploy@ff46322]: (no justification provided) (duration: 01m 38s)
  • 14:40 jgiannelos@deploy2002: Started deploy [restbase/deploy@ff46322]: (no justification provided)
  • 14:39 lucaswerkmeister-wmde@deploy2002: kartik and lucaswerkmeister-wmde: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:38 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836)
  • 14:36 filippo@deploy2002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 14:36 filippo@deploy2002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 14:36 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 14:35 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 14:33 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
  • 14:23 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove broken QUnit test (T349485) (duration: 06m 53s)
  • 14:20 ejegg: donorwiki upgraded from 894eacce to f7407053
  • 14:17 lucaswerkmeister-wmde@deploy2002: abi and lucaswerkmeister-wmde: Continuing with sync
  • 14:17 lucaswerkmeister-wmde@deploy2002: abi and lucaswerkmeister-wmde: Backport for Remove broken QUnit test (T349485) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:16 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove broken QUnit test (T349485)
  • 14:14 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/similar-users: apply
  • 14:14 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/similar-users: apply
  • 14:14 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/similar-users: apply
  • 14:09 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
  • 14:09 jayme@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
  • 13:56 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for cirrus: disable canary events for update & error streams (duration: 07m 19s)
  • 13:51 lucaswerkmeister-wmde@deploy2002: dcausse and lucaswerkmeister-wmde: Continuing with sync
  • 13:50 lucaswerkmeister-wmde@deploy2002: dcausse and lucaswerkmeister-wmde: Backport for cirrus: disable canary events for update & error streams synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:49 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for cirrus: disable canary events for update & error streams
  • 13:46 moritzm: installing cpio security updates
  • 13:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) (duration: 14m 48s)
  • 13:40 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kartik: Continuing with sync
  • 13:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kartik: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 13:31 moritzm: installing curl security updates on buster
  • 13:31 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836)
  • 13:30 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Add throttle rule for Edit-a-Thon on 2023-11-03 (T349234) (duration: 06m 43s)
  • 13:27 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 13:25 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Continuing with sync
  • 13:24 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Backport for Add throttle rule for Edit-a-Thon on 2023-11-03 (T349234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:23 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Add throttle rule for Edit-a-Thon on 2023-11-03 (T349234)
  • 13:21 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
  • 13:21 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enable block feature for AbuseFilter on srwiki (T349727) (duration: 10m 23s)
  • 13:20 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:20 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 13:15 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Continuing with sync
  • 13:15 moritzm: installing poppler security updates
  • 13:11 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Backport for Enable block feature for AbuseFilter on srwiki (T349727) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:10 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enable block feature for AbuseFilter on srwiki (T349727)
  • 13:04 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 12:27 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 12:26 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 11:04 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:03 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:58 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:51 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:51 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:51 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:40 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:30 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 10:30 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 10:25 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:25 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:20 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:20 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:10 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:10 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 09:29 dcausse: erratum (replace wdqs1009 with wdqs2009 in the above msg): depooling and restarting blazegraph on wdqs2009 (stuck since 2023-10-12)
  • 09:28 dcausse: depooling and restarting blazegraph on wdqs1009 (stuck since 2023-10-12)
  • 09:23 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1009.eqiad.wmnet with OS bullseye
  • 09:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:14 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:06 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage
  • 09:03 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage
  • 08:50 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1009.eqiad.wmnet with OS bullseye
  • 08:49 urbanecm: mwmaint2002: `foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue` (testing T344428; after enabling backend on all Wikipedias)
  • 08:48 urbanecm@deploy2002: Finished scap: Backport for Growth: Enable new Impact backend everywhere (T344143) (duration: 09m 29s)
  • 08:43 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 08:40 urbanecm@deploy2002: urbanecm: Backport for Growth: Enable new Impact backend everywhere (T344143) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:40 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 08:40 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1008.eqiad.wmnet with OS bullseye
  • 08:39 urbanecm@deploy2002: Started scap: Backport for Growth: Enable new Impact backend everywhere (T344143)
  • 08:32 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 08:32 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 08:31 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:29 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 08:28 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:28 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 08:27 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage
  • 08:21 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage
  • 08:07 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1008.eqiad.wmnet with OS bullseye
  • 08:02 godog: restart prometheus k8s k8s-aux - T343529
  • 07:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
  • 07:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
  • 07:36 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 07:32 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:31 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 07:23 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:21 apergos: UTC morning backport and config window closed
  • 07:19 kartik@deploy2002: Finished scap: Backport for testwiki: Enable Section translation on some Wikipedias with potential to be supported with MinT (T345267) (duration: 13m 11s)
  • 07:13 kartik@deploy2002: kartik: Continuing with sync
  • 07:08 kartik@deploy2002: kartik: Backport for testwiki: Enable Section translation on some Wikipedias with potential to be supported with MinT (T345267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:06 kartik@deploy2002: Started scap: Backport for testwiki: Enable Section translation on some Wikipedias with potential to be supported with MinT (T345267)
  • 06:52 moritzm: installing openssl security updates
  • 06:40 _joe_: rebuilding the base httpd image for mediawiki to pick up glogger changes
  • 04:31 cstone: civicrm upgraded from 16175067 to 70e0b88d
  • 01:35 cstone: payments-wiki upgraded from 382a5a70 to f7407053

2023-10-25

  • 22:28 jforrester@deploy2002: Finished scap: Backport for diff: Fix LinkRenderer method call (T349726) (duration: 07m 21s)
  • 22:22 jforrester@deploy2002: jforrester and umherirrender: Continuing with sync
  • 22:22 jforrester@deploy2002: jforrester and umherirrender: Backport for diff: Fix LinkRenderer method call (T349726) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:20 jforrester@deploy2002: Started scap: Backport for diff: Fix LinkRenderer method call (T349726)
  • 21:01 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:00 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 20:59 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 20:58 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:57 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:23 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:20 ejegg: payments-wiki upgraded from 7575f0e6 to 382a5a70
  • 20:11 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:10 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1006.wikimedia.org with OS bookworm
  • 19:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase1018.eqiad.wmnet
  • 19:57 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1018.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:56 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1018.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:50 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:44 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:44 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
  • 19:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
  • 19:40 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase1018.eqiad.wmnet
  • 19:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase1017.eqiad.wmnet
  • 19:36 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:36 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:35 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:33 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:27 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1006.wikimedia.org with OS bookworm
  • 19:25 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase1017.eqiad.wmnet
  • 19:20 sukhe: sukhe@cumin2002:~$ sudo cumin 'A:dns-rec' "enable-puppet 'wait before enabling'"
  • 19:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase1016.eqiad.wmnet
  • 19:19 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:19 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:18 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 19:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:16 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 19:16 cmooney@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if it makes vlan1054 records - cmooney@cumin1001"
  • 19:14 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if it makes vlan1054 records - cmooney@cumin1001"
  • 19:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:33 dancy@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.2 refs T348355 (duration: 05m 52s)
  • 18:32 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:32 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:28 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.2 refs T348355
  • 18:17 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:11 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase1016.eqiad.wmnet
  • 18:04 ejegg: fundraising civicrm upgraded from 6cfae26a to 16175067
  • 17:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1005.wikimedia.org with OS bookworm
  • 17:21 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:20 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:15 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:15 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:10 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:09 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:04 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
  • 17:04 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:02 ottomata: temporarily increasing log level to trace for eventgate-logging-external in eqiad canary release only - T347477
  • 16:59 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
  • 16:47 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:46 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:46 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:45 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1005.wikimedia.org with OS bookworm
  • 16:45 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:44 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:44 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:07 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1004.wikimedia.org with OS bookworm
  • 14:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
  • 14:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
  • 14:30 jforrester@deploy2002: sync-world aborted: Backport for Allow logged out users to run FunctionEvaluator widget (T301670 T349055 T349057) (duration: 55m 10s)
  • 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bookworm
  • 14:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns1004.wikimedia.org with OS bookworm
  • 14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1100.eqiad.wmnet with OS bullseye
  • 14:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1007.eqiad.wmnet with OS bullseye
  • 14:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bookworm
  • 14:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 14:02 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host deploy1002
  • 14:02 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host deploy1002
  • 13:59 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 13:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1007.eqiad.wmnet with reason: host reimage
  • 13:54 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-tool1010
  • 13:54 jforrester@deploy2002: jforrester: Continuing with sync
  • 13:53 jforrester@deploy2002: jforrester: Backport for Allow logged out users to run FunctionEvaluator widget (T301670 T349055 T349057) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-tool1010
  • 13:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1007.eqiad.wmnet with reason: host reimage
  • 13:51 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
  • 13:42 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
  • 13:36 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:35 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:35 jforrester@deploy2002: Started scap: Backport for Allow logged out users to run FunctionEvaluator widget (T301670 T349055 T349057)
  • 13:29 jforrester@deploy2002: Finished scap: Backport for Remove no-op $wgHiddenPrefs[] = 'prefershttps' (duration: 06m 54s)
  • 13:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on an-tool1010.eqiad.wmnet with reason: Moving an-tool1010
  • 13:25 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on an-tool1010.eqiad.wmnet with reason: Moving an-tool1010
  • 13:24 jforrester@deploy2002: matmarex and jforrester: Continuing with sync
  • 13:24 jforrester@deploy2002: matmarex and jforrester: Backport for Remove no-op $wgHiddenPrefs[] = 'prefershttps' synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:22 jforrester@deploy2002: Started scap: Backport for Remove no-op $wgHiddenPrefs[] = 'prefershttps'
  • 13:21 jforrester@deploy2002: Finished scap: Backport for [wikifunctions] Allow logged-out users to run approved functions (T349055) (duration: 07m 59s)
  • 13:16 jforrester@deploy2002: jforrester: Continuing with sync
  • 13:14 jforrester@deploy2002: jforrester: Backport for [wikifunctions] Allow logged-out users to run approved functions (T349055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:13 jforrester@deploy2002: Started scap: Backport for [wikifunctions] Allow logged-out users to run approved functions (T349055)
  • 13:11 jforrester@deploy2002: Finished scap: Backport for ExtensionDistributor: Add REL1_41 as the development snapshot (T346929) (duration: 07m 01s)
  • 13:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1017.eqiad.wmnet
  • 13:06 jforrester@deploy2002: jforrester: Continuing with sync
  • 13:05 jforrester@deploy2002: jforrester: Backport for ExtensionDistributor: Add REL1_41 as the development snapshot (T346929) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 jforrester@deploy2002: Started scap: Backport for ExtensionDistributor: Add REL1_41 as the development snapshot (T346929)
  • 13:01 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1017.eqiad.wmnet
  • 10:56 urbanecm: mwmaint2002: foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue (T344428; all wikis, higher file limit)
  • 10:24 urbanecm: mwmaint2002: foreachwikiindblist /srv/mediawiki/dblists/growth-biggest.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue (T344428; with higher file limit)
  • 10:02 taavi: import kubernetes 1.23 packages for debian bookworm T284656
  • 09:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1007.eqiad.wmnet with OS bullseye
  • 09:50 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 09:48 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53053 and previous config saved to /var/cache/conftool/dbconfig/20231025-090648-arnaudb.json
  • 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 90%: Maint over', diff saved to https://phabricator.wikimedia.org/P53052 and previous config saved to /var/cache/conftool/dbconfig/20231025-085143-arnaudb.json
  • 08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 80%: Maint over', diff saved to https://phabricator.wikimedia.org/P53051 and previous config saved to /var/cache/conftool/dbconfig/20231025-083638-arnaudb.json
  • 08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 70%: Maint over', diff saved to https://phabricator.wikimedia.org/P53050 and previous config saved to /var/cache/conftool/dbconfig/20231025-082133-arnaudb.json
  • 08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 60%: Maint over', diff saved to https://phabricator.wikimedia.org/P53049 and previous config saved to /var/cache/conftool/dbconfig/20231025-080628-arnaudb.json
  • 07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P53048 and previous config saved to /var/cache/conftool/dbconfig/20231025-075123-arnaudb.json
  • 07:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 40%: Maint over', diff saved to https://phabricator.wikimedia.org/P53047 and previous config saved to /var/cache/conftool/dbconfig/20231025-073618-arnaudb.json
  • 07:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 30%: Maint over', diff saved to https://phabricator.wikimedia.org/P53046 and previous config saved to /var/cache/conftool/dbconfig/20231025-072113-arnaudb.json
  • 07:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 20%: Maint over', diff saved to https://phabricator.wikimedia.org/P53045 and previous config saved to /var/cache/conftool/dbconfig/20231025-070608-arnaudb.json
  • 06:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53044 and previous config saved to /var/cache/conftool/dbconfig/20231025-065103-arnaudb.json
  • 06:50 arnaudb: repooling db1231

2023-10-24

  • 21:58 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:58 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:09 sukhe: running authdns-update for CR 968354
  • 21:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS bookworm
  • 21:06 jdrewniak@deploy2002: Finished scap: Backport for Disable Parsoid internal REST API everywhere except on Parsoid cluster (T334980) (duration: 12m 39s)
  • 21:00 jdrewniak@deploy2002: jdrewniak and cscott: Continuing with sync
  • 20:54 jdrewniak@deploy2002: jdrewniak and cscott: Backport for Disable Parsoid internal REST API everywhere except on Parsoid cluster (T334980) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:53 jdrewniak@deploy2002: Started scap: Backport for Disable Parsoid internal REST API everywhere except on Parsoid cluster (T334980)
  • 20:49 jdrewniak@deploy2002: Finished scap: Backport for Enable Vector readability survey on select wikis (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232) (duration: 06m 57s)
  • 20:44 jdrewniak@deploy2002: jdrewniak: Continuing with sync
  • 20:44 jdrewniak@deploy2002: jdrewniak: Backport for Enable Vector readability survey on select wikis (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:42 jdrewniak@deploy2002: Started scap: Backport for Enable Vector readability survey on select wikis (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232)
  • 20:24 jdrewniak@deploy2002: Finished scap: Backport for Update comment about EditAttemptStep instruments, CentralAuth: Clarify why we don't use second-level domain for some wikis (T257852), Remove unused VisualEditor config settings (T344757 T344759), [noop] Explain more thoroughly how the '-' prefix works (duration: 07m 21s)
  • 20:18 jdrewniak@deploy2002: tgr and matmarex and jdrewniak: Continuing with sync
  • 20:18 jdrewniak@deploy2002: tgr and matmarex and jdrewniak: Backport for Update comment about EditAttemptStep instruments, CentralAuth: Clarify why we don't use second-level domain for some wikis (T257852), Remove unused VisualEditor config settings (T344757 T344759), [noop] Explain more thoroughly how the '-' prefix works synced to the testservers (htt
  • 20:16 jdrewniak@deploy2002: Started scap: Backport for Update comment about EditAttemptStep instruments, CentralAuth: Clarify why we don't use second-level domain for some wikis (T257852), Remove unused VisualEditor config settings (T344757 T344759), [noop] Explain more thoroughly how the '-' prefix works
  • 20:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 20:10 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 19:57 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@c585842]: T346373: Update mjolnir to use python 3.10 (duration: 00m 28s)
  • 19:56 ebernhardson@deploy2002: Started deploy [airflow-dags/search@c585842]: T346373: Update mjolnir to use python 3.10
  • 19:49 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 19:47 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 19:47 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 19:45 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 19:45 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 19:43 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 19:43 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 19:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS bookworm
  • 19:00 andrew@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 19:00 andrew@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:59 andrew@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:59 andrew@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:55 andrew@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:55 andrew@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:54 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5003.wikimedia.org with OS bookworm
  • 18:50 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:50 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:50 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:48 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:48 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:48 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:47 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 18:47 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:47 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:42 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:42 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:42 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 18:42 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 18:41 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase2012.codfw.wmnet
  • 18:41 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:41 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 18:41 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 18:39 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
  • 18:39 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:38 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 18:37 eevans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:31 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase2012.codfw.wmnet
  • 18:24 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:23 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 18:18 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:18 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 18:16 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:15 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 18:13 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:13 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.2 refs T348355
  • 18:13 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 18:03 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 18:00 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:50 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 17:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 17:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
  • 17:41 ejegg: fundraising civicrm upgraded from 8e8ffec0 to 6cfae26a
  • 16:59 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns5003.wikimedia.org with OS bookworm
  • 16:46 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@cc56357]: Deploying latest DAGs to analytics Airflow instance (duration: 01m 55s)
  • 16:44 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@cc56357]: Deploying latest DAGs to analytics Airflow instance
  • 15:48 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1100.eqiad.wmnet with OS bullseye
  • 15:32 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 15:26 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 15:22 godog: clean up overlapping blocks from thanos for instance 'cloud'
  • 15:11 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
  • 15:10 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1100.eqiad.wmnet with OS bullseye
  • 14:59 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
  • 14:58 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1100.eqiad.wmnet with OS bullseye
  • 14:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1016.eqiad.wmnet
  • 14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1016.eqiad.wmnet
  • 14:48 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
  • 14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Adding db1227 depooled', diff saved to https://phabricator.wikimedia.org/P53041 and previous config saved to /var/cache/conftool/dbconfig/20231024-143204-arnaudb.json
  • 14:01 TheresNoTime: close backport window
  • 14:00 samtar@deploy2002: Finished scap: Backport for Fix typo (undefined event) (T349271) (duration: 09m 26s)
  • 13:55 samtar@deploy2002: samtar and cparle: Continuing with sync
  • 13:52 samtar@deploy2002: samtar and cparle: Backport for Fix typo (undefined event) (T349271) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:51 samtar@deploy2002: Started scap: Backport for Fix typo (undefined event) (T349271)
  • 13:43 samtar@deploy2002: Finished scap: Backport for Add stream config for iOS schema (T347122) (duration: 07m 52s)
  • 13:38 samtar@deploy2002: samtar and tsev: Continuing with sync
  • 13:37 samtar@deploy2002: samtar and tsev: Backport for Add stream config for iOS schema (T347122) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:36 samtar@deploy2002: Started scap: Backport for Add stream config for iOS schema (T347122)
  • 13:34 samtar@deploy2002: Finished scap: Backport for cirrus: add wgCirrusSearchUseEventBusBridge and enable it on testwiki (T325565) (duration: 06m 55s)
  • 13:31 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:30 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:30 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:30 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:30 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:29 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:28 samtar@deploy2002: samtar and dcausse: Continuing with sync
  • 13:28 samtar@deploy2002: samtar and dcausse: Backport for cirrus: add wgCirrusSearchUseEventBusBridge and enable it on testwiki (T325565) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:27 samtar@deploy2002: Started scap: Backport for cirrus: add wgCirrusSearchUseEventBusBridge and enable it on testwiki (T325565)
  • 13:25 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 13:25 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 13:24 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 13:24 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 13:24 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 13:23 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 13:23 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 13:22 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 13:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:22 samtar@deploy2002: Finished scap: Backport for cirrus: add the mediawiki.cirrussearch.page_rerender.v1 stream (T325565) (duration: 07m 45s)
  • 13:17 samtar@deploy2002: samtar and dcausse: Continuing with sync
  • 13:15 samtar@deploy2002: samtar and dcausse: Backport for cirrus: add the mediawiki.cirrussearch.page_rerender.v1 stream (T325565) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:14 samtar@deploy2002: Started scap: Backport for cirrus: add the mediawiki.cirrussearch.page_rerender.v1 stream (T325565)
  • 13:10 samtar@deploy2002: Finished scap: Backport for Increase Lua memory limit to 100MB on Wiktionary only (T165935) (duration: 07m 51s)
  • 13:05 samtar@deploy2002: samtar and tstarling: Continuing with sync
  • 13:04 samtar@deploy2002: samtar and tstarling: Backport for Increase Lua memory limit to 100MB on Wiktionary only (T165935) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:03 samtar@deploy2002: Started scap: Backport for Increase Lua memory limit to 100MB on Wiktionary only (T165935)
  • 12:41 jbond: migrate idp_test to puppet7
  • 11:17 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:17 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:16 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:15 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:13 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:12 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 11:12 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:12 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 11:11 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:11 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 11:09 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:08 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:08 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:08 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:08 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:07 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:05 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 11:05 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 11:04 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 11:04 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 10:59 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 10:59 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 10:58 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:57 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 10:57 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:57 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 10:57 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:56 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 10:54 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 10:53 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 10:47 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 10:46 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 10:44 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 10:43 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 10:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on an-test-client1002.eqiad.wmnet with reason: Cold booting with ganeti to increase RAM
  • 10:42 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on an-test-client1002.eqiad.wmnet with reason: Cold booting with ganeti to increase RAM
  • 10:42 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:41 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:40 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:39 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:27 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
  • 10:27 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:26 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
  • 10:26 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:15 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 10:14 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
  • 10:10 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 10:10 jiji@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 10:08 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:07 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:04 jnuche@deploy2002: Pruned MediaWiki: 1.41.0-wmf.30 (duration: 02m 08s)
  • 10:02 jnuche@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.2 refs T348355 (duration: 25m 27s)
  • 09:49 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:48 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:45 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53039 and previous config saved to /var/cache/conftool/dbconfig/20231024-094329-arnaudb.json
  • 09:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:36 jnuche@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.2 refs T348355
  • 09:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 90%: Maint over', diff saved to https://phabricator.wikimedia.org/P53038 and previous config saved to /var/cache/conftool/dbconfig/20231024-092824-arnaudb.json
  • 09:16 vgutierrez: upload golang-github-florianl-go-tc to apt.wm.o (bookworm) - T348837
  • 09:13 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 80%: Maint over', diff saved to https://phabricator.wikimedia.org/P53037 and previous config saved to /var/cache/conftool/dbconfig/20231024-091319-arnaudb.json
  • 09:11 taavi: restart ferm on deploy1002 T349587
  • 09:04 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host deploy1002
  • 09:03 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host deploy1002
  • 08:58 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 70%: Maint over', diff saved to https://phabricator.wikimedia.org/P53036 and previous config saved to /var/cache/conftool/dbconfig/20231024-085815-arnaudb.json
  • 08:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 60%: Maint over', diff saved to https://phabricator.wikimedia.org/P53035 and previous config saved to /var/cache/conftool/dbconfig/20231024-084310-arnaudb.json
  • 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1127.eqiad.wmnet onto db1227.eqiad.wmnet
  • 08:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P53034 and previous config saved to /var/cache/conftool/dbconfig/20231024-082805-arnaudb.json
  • 08:13 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 40%: Maint over', diff saved to https://phabricator.wikimedia.org/P53033 and previous config saved to /var/cache/conftool/dbconfig/20231024-081300-arnaudb.json
  • 07:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 30%: Maint over', diff saved to https://phabricator.wikimedia.org/P53032 and previous config saved to /var/cache/conftool/dbconfig/20231024-075755-arnaudb.json
  • 07:42 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 20%: Maint over', diff saved to https://phabricator.wikimedia.org/P53031 and previous config saved to /var/cache/conftool/dbconfig/20231024-074250-arnaudb.json
  • 07:27 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53030 and previous config saved to /var/cache/conftool/dbconfig/20231024-072745-arnaudb.json
  • 07:27 arnaudb: repool db2109
  • 07:08 marostegui@cumin1001: START - Cookbook sre.mysql.clone of db1127.eqiad.wmnet onto db1227.eqiad.wmnet
  • 06:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 06:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 06:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 06:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 06:54 godog: +50G to prometheus/analytics in eqiad
  • 06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 from pc1015 - marostegui@cumin1001"
  • 06:44 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 from pc1015 - marostegui@cumin1001"
  • 06:42 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
  • 06:33 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
  • 06:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15435
  • 06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15435
  • 05:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 39180
  • 05:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 39180
  • 03:51 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.2 refs T348355 (duration: 47m 53s)
  • 03:03 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.2 refs T348355

2023-10-23

  • 23:05 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 23:05 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 22:58 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 22:58 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 22:55 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 22:54 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm
  • 21:05 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 20:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 20:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm
  • 19:50 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:49 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6002.wikimedia.org with OS bookworm
  • 18:33 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:32 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:31 herron: sretest1001:~/tmp/backfill$ promtool tsdb create-blocks-from rules --start 1672531200 --end 1698080718 --url http://prometheus.svc.eqiad.wmnet/ops/ logstash-requests.yaml T349521
  • 18:19 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:18 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:14 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:13 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns6002.wikimedia.org with reason: host reimage
  • 18:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns6002.wikimedia.org with reason: host reimage
  • 18:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:00 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 17:59 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 17:59 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 17:57 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns6002.wikimedia.org with OS bookworm
  • 17:41 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:40 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:26 ejegg: fundraising python tools upgraded from e56ae8ae to 9e84c689
  • 17:25 ejegg: standalone (IPN listener) SmashPig upgraded from e27dfbce to c5b12dc3
  • 16:57 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:47 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:46 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'New host being setup', diff saved to https://phabricator.wikimedia.org/P53029 and previous config saved to /var/cache/conftool/dbconfig/20231023-160926-marostegui.json
  • 16:08 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:08 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:52 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 15:51 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 15:05 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:05 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:56 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:55 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:55 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:55 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:55 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:55 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:55 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:54 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1021']
  • 14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P53028 and previous config saved to /var/cache/conftool/dbconfig/20231023-145101-arnaudb.json
  • 14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Provision db1227 depooled as a candidate master for s7', diff saved to https://phabricator.wikimedia.org/P53027 and previous config saved to /var/cache/conftool/dbconfig/20231023-145011-arnaudb.json
  • 14:48 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 14:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 14:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 14:47 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 14:47 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
  • 14:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1021']
  • 14:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1021']
  • 14:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1021']
  • 14:30 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 14:26 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:26 jayme: switched mw-api-int (mw-on-k8s) to certmanager certificates - T300033
  • 14:26 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:25 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:24 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:14 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:14 jayme: switched mw-api-ext (mw-on-k8s) to certmanager certificates - T300033
  • 14:13 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:13 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:12 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:06 jayme@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:06 jayme@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:06 jayme@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:06 jayme@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:06 jayme: switched mw-jobrunner (mw-on-k8s) to certmanager certificates - T300033
  • 14:05 jayme@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:05 jayme@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:05 jayme@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:05 jayme@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 13:53 urbanecm@deploy2002: Finished scap: Backport for Stop writing to $wgCentralAuthCookieDomain in 'EnterMobileMode' hook (duration: 15m 50s)
  • 13:52 moritzm: installing batik security updates
  • 13:48 urbanecm@deploy2002: urbanecm and matmarex: Continuing with sync
  • 13:38 urbanecm@deploy2002: urbanecm and matmarex: Backport for Stop writing to $wgCentralAuthCookieDomain in 'EnterMobileMode' hook synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:37 urbanecm@deploy2002: Started scap: Backport for Stop writing to $wgCentralAuthCookieDomain in 'EnterMobileMode' hook
  • 13:37 urbanecm@deploy2002: Finished scap: Backport for New stream for Android Patroller tasks feature (T348816) (duration: 06m 54s)
  • 13:31 urbanecm@deploy2002: urbanecm and sharvaniharan: Continuing with sync
  • 13:31 urbanecm@deploy2002: urbanecm and sharvaniharan: Backport for New stream for Android Patroller tasks feature (T348816) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:30 urbanecm@deploy2002: Started scap: Backport for New stream for Android Patroller tasks feature (T348816)
  • 13:29 urbanecm@deploy2002: Finished scap: Backport for Remove 'currentProto'/'finalProto'/'proto' business (T348852), Remove unused $wgIncludeLegacyJavaScript, Remove $wgApiFrameOptions override for enwiki and zhwiki (T131183) (duration: 11m 56s)
  • 13:23 urbanecm@deploy2002: matmarex and urbanecm: Continuing with sync
  • 13:18 urbanecm@deploy2002: matmarex and urbanecm: Backport for Remove 'currentProto'/'finalProto'/'proto' business (T348852), Remove unused $wgIncludeLegacyJavaScript, Remove $wgApiFrameOptions override for enwiki and zhwiki (T131183) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:17 urbanecm@deploy2002: Started scap: Backport for Remove 'currentProto'/'finalProto'/'proto' business (T348852), Remove unused $wgIncludeLegacyJavaScript, Remove $wgApiFrameOptions override for enwiki and zhwiki (T131183)
  • 13:16 urbanecm@deploy2002: Finished scap: Backport for wikidatawiki: Switch property for determining Lexeme language code (T348923) (duration: 12m 50s)
  • 13:11 urbanecm@deploy2002: migr and urbanecm: Continuing with sync
  • 13:05 urbanecm@deploy2002: migr and urbanecm: Backport for wikidatawiki: Switch property for determining Lexeme language code (T348923) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 moritzm: installing libxpm security updates on buster
  • 13:04 urbanecm@deploy2002: Started scap: Backport for wikidatawiki: Switch property for determining Lexeme language code (T348923)
  • 12:41 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 12:40 jayme: switched mw-web (mw-on-k8s) to certmanager certificates - T300033
  • 12:40 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 12:40 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:39 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:33 moritzm: installing libx11 security updates
  • 12:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1131.eqiad.wmnet onto db1231.eqiad.wmnet
  • 11:49 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@054e07d] (releasing): (no justification provided) (duration: 00m 42s)
  • 11:49 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@054e07d] (releasing): (no justification provided)
  • 11:49 moritzm: added Balthazar to pwstore
  • 11:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Server not yet in productin use
  • 11:33 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Server not yet in productin use
  • 10:51 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1001.eqiad.wmnet
  • 10:51 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:51 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 10:51 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1131.eqiad.wmnet onto db1231.eqiad.wmnet
  • 10:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depool db1131 T344036', diff saved to https://phabricator.wikimedia.org/P53025 and previous config saved to /var/cache/conftool/dbconfig/20231023-105036-arnaudb.json
  • 10:50 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 10:41 jayme: switched mw-debug (mw-on-k8s) to certmanager certificates - T300033
  • 10:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1002.eqiad.wmnet
  • 10:40 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 10:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: provisionning - T344036
  • 10:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: provisionning - T344036
  • 10:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: provisionning - T344036
  • 10:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: provisionning - T344036
  • 10:36 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1002.eqiad.wmnet
  • 10:35 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:34 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:34 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:34 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:32 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1001.eqiad.wmnet
  • 10:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Provision db1231 depooled as a candidate master for s6', diff saved to https://phabricator.wikimedia.org/P53024 and previous config saved to /var/cache/conftool/dbconfig/20231023-103202-arnaudb.json
  • 10:31 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:29 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
  • 10:28 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
  • 10:26 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 10:26 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:26 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
  • 10:25 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
  • 10:23 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 10:20 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:19 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 10:13 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1002.eqiad.wmnet
  • 10:13 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:13 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 10:12 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 10:11 taavi: reprepro: drop thirdparty/kubeadm-k8s-1-22 component
  • 10:10 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 10:04 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1002.eqiad.wmnet
  • 10:02 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:02 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:57 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1003.eqiad.wmnet
  • 09:57 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:55 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 09:54 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 09:51 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 09:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 09:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 09:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 09:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 09:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 09:47 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:37 brouberol@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: kafka-jumbo1004.eqiad.wmnet
  • 09:37 brouberol@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: kafka-jumbo1004.eqiad.wmnet
  • 09:36 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001 - brouberol@cumin1001 - T336044"
  • 09:35 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001 - brouberol@cumin1001 - T336044"
  • 09:32 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:32 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:31 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:28 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:21 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:21 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 09:19 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:18 elukey@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:18 elukey@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:17 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:13 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 09:00 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 08:55 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1004.eqiad.wmnet
  • 08:52 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1005.eqiad.wmnet
  • 08:52 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:52 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 08:51 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 08:38 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 08:33 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1005.eqiad.wmnet
  • 08:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1003.eqiad.wmnet
  • 08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1006.eqiad.wmnet
  • 08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 08:21 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
  • 08:19 brouberol@cumin1001: START - Cookbook sre.dns.netbox
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1003.eqiad.wmnet
  • 08:14 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1006.eqiad.wmnet
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1004.eqiad.wmnet
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1004.eqiad.wmnet
  • 08:01 moritzm: installing Linux kernel updates for Buster 5.10 backport
  • 07:42 taavi: mwscript purgeList.php enwiki <<< "https://en.wikipedia.org/static/images/project-logos/knwiktionary.png" (and for 1.5x and 2x variants)
  • 07:36 hashar: Upgrading CI Jenkins # T349282
  • 07:26 taavi@deploy2002: Finished scap: Backport for knwiktionary: update logo (T349036), dewiktionary: add tagline (T348978), hiwikisource: Adjust width-height ratio of logo to fix display issue (T310961) (duration: 16m 59s)
  • 07:22 elukey@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:22 elukey@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:21 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:20 taavi@deploy2002: taavi and anzx: Continuing with sync
  • 07:17 taavi@deploy2002: taavi and anzx: Backport for knwiktionary: update logo (T349036), dewiktionary: add tagline (T348978), hiwikisource: Adjust width-height ratio of logo to fix display issue (T310961) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:09 taavi@deploy2002: Started scap: Backport for knwiktionary: update logo (T349036), dewiktionary: add tagline (T348978), hiwikisource: Adjust width-height ratio of logo to fix display issue (T310961)

2023-10-21

  • 00:10 krinkle@deploy2002: Synchronized wmf-config/logging.php: (no justification provided) (duration: 06m 03s)

2023-10-20

  • 22:47 cstone: civicrm upgraded from ca081c11 to 8e8ffec0
  • 21:39 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:38 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:33 ejegg: fundraising civicrm upgraded from 1263a91b to ca081c11
  • 21:06 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:06 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:21 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:20 ejegg: fundraising civicrm upgraded from e57425a9 to 1263a91b
  • 20:20 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:19 bvibber: brion running requeueTranscodes.php on mwmaint2002 for audio and video transcode backfill, will use some jobqueue cpu but should be nicely throttled
  • 20:05 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:05 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:46 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:44 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:35 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:35 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:08 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:07 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:07 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:06 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:06 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:05 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:05 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 19:05 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:57 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:56 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:43 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:42 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:42 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:41 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:36 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:36 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:59 topranks: Disabling BGP from asw1-by27-esams to cr1-esams to move BGP peers to new group T349125
  • 15:55 topranks: Disabling BGP from asw1-by27-esams to cr2-esams to move BGP peers to new group T349125
  • 15:47 topranks: Disabling BGP from asw1-bw27-esams to cr2-esams to move BGP peers to new group T349125
  • 15:39 topranks: Disabling BGP from asw1-bw27-esams to cr1-esams to move BGP peers to new group T349125
  • 15:37 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@fd88cfa]: Update kafka hosts mjolnir communicates with (duration: 00m 27s)
  • 15:36 ebernhardson@deploy2002: Started deploy [airflow-dags/search@fd88cfa]: Update kafka hosts mjolnir communicates with
  • 15:26 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: changing bgp config on esams switches
  • 15:25 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: changing bgp config on esams switches
  • 15:18 topranks: Disabling BGP from asw1-b13-drmrs to cr1-drmrs to move BGP peers to new group T349125
  • 15:16 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:15 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:13 topranks: Disabling BGP from asw1-b13-drmrs to cr2-drmrs to move BGP peers to new group T349125
  • 15:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: changing bgp config on drmrs switches
  • 15:09 topranks: Disabling BGP from asw1-b12-drmrs to cr2-drmrs to move BGP peers to new group T349125
  • 15:08 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: changing bgp config on drmrs switches
  • 14:57 topranks: Disabling BGP from asw1-b12-drmrs to cr1-drmrs to move BGP peers to new group T349125
  • 14:49 ejegg: payments-wiki upgraded from 87cda414 to 7575f0e6
  • 14:33 topranks: Disabling BGP from ssw1-f1-eqiad to cr2-eqiad to move BGP peers to new group T349125
  • away: fundraising civicrm upgraded from f11ad380 to e57425a9
  • 13:19 topranks: Disabling BGP from ssw1-e1-eqiad to cr1-eqiad to move BGP peers to new group T349125
  • 12:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 11:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 11:52 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 11:42 jynus: refactoring tables @ db1164[bbackups] T349360
  • 11:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
  • 11:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 10:46 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:46 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:39 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:19 godog: powercycle titan1001
  • 10:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 10:12 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 10:04 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:58 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 08:45 brouberol@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts kafka-jumbo1006.eqiad.wmnet
  • 08:43 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1006.eqiad.wmnet
  • 07:43 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 07:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 07:26 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
  • 07:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on etherpad1003.eqiad.wmnet with reason: Reboot to use new CPU and memory config
  • 07:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on etherpad1003.eqiad.wmnet with reason: Reboot to use new CPU and memory config
  • 07:22 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:21 jelto: increase etherpad1003 CPU and memory (1CPU,1GB -> 2CPU,2GB) - T348386
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl T349272', diff saved to https://phabricator.wikimedia.org/P53021 and previous config saved to /var/cache/conftool/dbconfig/20231020-061822-marostegui.json
  • 03:15 tstarling@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Enable source maps everywhere T47514 (duration: 06m 26s)
  • 03:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye

2023-10-19

  • 22:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:32 hmonroy@deploy2002: Finished scap: Backport for PhonosButton: use text() instead of append() (T349312) (duration: 06m 48s)
  • 21:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 21:27 hmonroy@deploy2002: hmonroy: Continuing with sync
  • 21:27 hmonroy@deploy2002: hmonroy: Backport for PhonosButton: use text() instead of append() (T349312) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:25 hmonroy@deploy2002: Started scap: Backport for PhonosButton: use text() instead of append() (T349312)
  • 21:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
  • 20:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 20:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
  • 20:02 brennen: utc late backport window: no patches
  • 18:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:09 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.1 refs T348354
  • 17:33 urandom: Decommissioning Cassandra, restbase1018-{a,b,c} — T328490
  • 16:50 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:49 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:17 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:16 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:16 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:15 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:15 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:14 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:42 jgiannelos@deploy2002: Finished deploy [restbase/deploy@a311c5d]: (no justification provided) (duration: 00m 54s)
  • 15:41 jgiannelos@deploy2002: Started deploy [restbase/deploy@a311c5d]: (no justification provided)
  • 15:30 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:25 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:15 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1006.eqiad.wmnet with reason: host is being decommissioned
  • 15:15 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1006.eqiad.wmnet with reason: host is being decommissioned
  • 15:15 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1005.eqiad.wmnet with reason: host is being decommissioned
  • 15:14 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1005.eqiad.wmnet with reason: host is being decommissioned
  • 15:14 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1004.eqiad.wmnet with reason: host is being decommissioned
  • 15:14 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1004.eqiad.wmnet with reason: host is being decommissioned
  • 15:14 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1003.eqiad.wmnet with reason: host is being decommissioned
  • 15:13 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1003.eqiad.wmnet with reason: host is being decommissioned
  • 15:13 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1002.eqiad.wmnet with reason: host is being decommissioned
  • 15:13 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1002.eqiad.wmnet with reason: host is being decommissioned
  • 15:13 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: host is being decommissioned
  • 15:13 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: host is being decommissioned
  • 15:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudnet1008-dev']
  • 15:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudnet1007-dev']
  • 15:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1007-dev']
  • 15:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1008-dev']
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet1008-dev.eqiad.wmnet']
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet1007-dev.eqiad.wmnet']
  • 15:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol1009-dev.eqiad.wmnet']
  • 15:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 15:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1007-dev.eqiad.wmnet']
  • 14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1008-dev.eqiad.wmnet']
  • 14:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 14:58 elukey: powercycle titan1001
  • 14:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev.eqiad.wmnet']
  • 14:57 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
  • 14:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev']
  • 14:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev']
  • 14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1009-dev']
  • 14:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev']
  • 14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev']
  • 14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1009-dev']
  • 14:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev']
  • 14:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev']
  • 14:54 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:44 elukey: powercycle titan1001
  • 14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:35 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 14:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet1007-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudnet1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudnet1007-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:04 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:01 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol100[8-10]-dev cloudnet100[7-8]-dev - jclark@cumin1001"
  • 14:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol100[8-10]-dev cloudnet100[7-8]-dev - jclark@cumin1001"
  • 13:58 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:48 wmde-fisch@deploy2002: Finished scap: Backport for Revert "Revert "Workaround to center search terms label"" (T252346) (duration: 07m 50s)
  • 13:43 wmde-fisch@deploy2002: wmde-fisch: Continuing with sync
  • 13:42 wmde-fisch@deploy2002: wmde-fisch: Backport for Revert "Revert "Workaround to center search terms label"" (T252346) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:41 wmde-fisch@deploy2002: Started scap: Backport for Revert "Revert "Workaround to center search terms label"" (T252346)
  • 13:00 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:00 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: noop - volans@cumin1001"
  • 12:59 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: noop - volans@cumin1001"
  • 12:52 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 12:50 volans@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:50 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:50 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 12:50 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 11:47 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 11:46 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@6f09297] (releasing): (no justification provided) (duration: 01m 08s)
  • 11:44 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@6f09297] (releasing): (no justification provided)
  • 11:30 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 08:36 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 07:33 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 07:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
  • 07:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
  • 07:17 tgr: UTC morning deploys done
  • 07:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 07:13 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 06:57 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 06:34 volans: enabled distributed locking support in spicerack/cookbooks T341973
  • 06:32 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 06:32 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 06:31 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 06:31 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 05:14 tchin@deploy2002: Finished deploy [airflow-dags/analytics@60950f6]: Deploying airflow [data-engineering/airflow-dags@60950f6b] (duration: 01m 12s)
  • 05:12 tchin@deploy2002: Started deploy [airflow-dags/analytics@60950f6]: Deploying airflow [data-engineering/airflow-dags@60950f6b]

2023-10-18

  • 23:58 eileen: civicrm upgraded from 4a5634ed to f11ad380
  • 22:12 eileen: civicrm upgraded from 52202980 to 4a5634ed
  • 21:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:54 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:35 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1229.eqiad.wmnet with OS bullseye
  • 21:23 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:16 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:08 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1229.eqiad.wmnet with reason: host reimage
  • 20:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1229.eqiad.wmnet with reason: host reimage
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 20:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 20:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 20:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 20:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 20:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 19:40 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 19:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 19:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 19:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 19:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:25 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:16 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6001.wikimedia.org with OS bookworm
  • 19:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 19:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1110.eqiad.wmnet with OS bullseye
  • 19:00 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 19:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1104.eqiad.wmnet with OS bullseye
  • 19:00 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns6001.wikimedia.org with reason: host reimage
  • 18:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns6001.wikimedia.org with reason: host reimage
  • 18:36 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:36 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:35 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:35 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:33 brennen@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.1 refs T348354 (duration: 05m 40s)
  • 18:28 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.1 refs T348354
  • 18:20 brennen: train 1.42.0-wmf.1 (T348354): logs clean and no blockers, rolling to group1
  • 18:17 brennen@deploy2002: Finished scap: Backport for Fix Typo in OS Dark Mode field (T346106) (duration: 13m 46s)
  • 18:17 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns6001.wikimedia.org with OS bookworm
  • 18:12 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:12 brennen@deploy2002: brennen and jdlrobson: Continuing with sync
  • 18:05 brennen@deploy2002: brennen and jdlrobson: Backport for Fix Typo in OS Dark Mode field (T346106) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:03 brennen@deploy2002: Started scap: Backport for Fix Typo in OS Dark Mode field (T346106)
  • 17:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 17:52 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:52 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:51 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
  • 17:47 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:46 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:44 sukhe: running authdns-update for CR 966573
  • 17:43 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:43 tchin@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 17:34 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 17:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1110
  • 17:28 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1110
  • 17:27 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:26 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 17:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
  • 17:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 17:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 17:22 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
  • 17:13 XioNoX: restart turnilo to pickup UI change
  • 17:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1110.eqiad.wmnet with OS bullseye
  • 17:07 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 17:07 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 17:05 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
  • 17:04 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
  • 17:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
  • 17:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1100.eqiad.wmnet with OS bullseye
  • 17:04 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 17:03 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 17:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1101.eqiad.wmnet with OS bullseye
  • 17:01 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 17:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1102.eqiad.wmnet with OS bullseye
  • 16:56 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:54 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 16:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 16:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
  • 16:40 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
  • 16:39 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
  • 16:37 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
  • 16:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1010.eqiad.wmnet with OS bullseye
  • 16:30 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:30 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:29 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1103.eqiad.wmnet with OS bullseye
  • 16:28 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:28 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:28 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:26 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:25 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1102.eqiad.wmnet with OS bullseye
  • 16:24 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 16:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
  • 16:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
  • 16:22 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1100']
  • 16:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 16:20 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1101']
  • 16:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1101']
  • 16:19 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 16:19 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 16:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 16:18 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1102 - jclark@cumin1001"
  • 16:17 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:17 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:17 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1102 - jclark@cumin1001"
  • 16:17 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:16 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:15 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:15 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 16:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp1110']
  • 16:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 16:14 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
  • 16:13 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
  • 16:11 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1010.eqiad.wmnet with reason: host reimage
  • 16:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 16:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 16:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:08 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1010.eqiad.wmnet with reason: host reimage
  • 16:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 16:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1105.eqiad.wmnet with OS bullseye
  • 16:07 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:06 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:06 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
  • 16:05 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
  • 16:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1102']
  • 16:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 16:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1106.eqiad.wmnet with OS bullseye
  • 16:02 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1110.eqiad.wmnet with OS bullseye
  • 15:53 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:52 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 15:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 15:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1107.eqiad.wmnet with OS bullseye
  • 15:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
  • 15:50 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1010.eqiad.wmnet with OS bullseye
  • 15:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
  • 15:49 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 15:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
  • 15:47 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1010.eqiad.wmnet with OS bullseye
  • 15:46 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
  • 15:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1111.eqiad.wmnet with OS bullseye
  • 15:46 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
  • 15:43 inflatador: bking@deploy2002 destroy dse-k8s-services instance of rdf-streaming-updater T349095
  • 15:40 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
  • 15:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:32 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 15:29 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
  • 15:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
  • 15:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 15:28 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:28 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:28 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 15:26 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
  • 15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1106']
  • 15:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
  • 15:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
  • 15:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1106']
  • 15:13 dancy@deploy2002: Finished deploy [releng/jenkins-deploy@2cf7af2] (releasing): (no justification provided) (duration: 00m 44s)
  • 15:12 dancy@deploy2002: Started deploy [releng/jenkins-deploy@2cf7af2] (releasing): (no justification provided)
  • 15:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 15:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1107']
  • 15:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1107']
  • 15:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1114.eqiad.wmnet with OS bullseye
  • 15:07 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:06 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:04 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
  • 15:03 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 15:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 15:02 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
  • 15:02 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1010.eqiad.wmnet with OS bullseye
  • 15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 15:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
  • 15:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 15:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
  • 14:59 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1010.eqiad.wmnet with OS bullseye
  • 14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 14:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
  • 14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 14:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1111
  • 14:58 elukey: powercycle titan1001 (no mgmt console / tty available, no host metrics, no ssh)
  • 14:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1111
  • 14:57 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 14:57 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 14:57 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
  • 14:57 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 14:56 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
  • 14:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 14:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1108.eqiad.wmnet with OS bullseye
  • 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 14:46 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 14:44 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1010.eqiad.wmnet with OS bullseye
  • 14:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:31 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 14:25 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1114']
  • 14:24 ejegg: fundraising civicrm upgraded from d8fe92e3 to 52202980
  • 14:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 14:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
  • 14:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
  • 14:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 13:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 13:23 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 13:23 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 13:14 volans: uploaded spicerack_8.0.2 to apt.wikimedia.org bullseye-wikimedia
  • 13:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 13:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 13:06 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 13:06 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 13:05 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 13:05 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 13:04 sukhe: running authdns-update for CR 966243
  • 13:04 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 13:04 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 13:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53008 and previous config saved to /var/cache/conftool/dbconfig/20231018-130343-arnaudb.json
  • 13:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53007 and previous config saved to /var/cache/conftool/dbconfig/20231018-130325-arnaudb.json
  • 12:59 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:59 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:52 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 12:51 jbond: upload puppet_7.23.0-1~debu11u1 (bullseye backport
  • 12:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P53006 and previous config saved to /var/cache/conftool/dbconfig/20231018-124838-arnaudb.json
  • 12:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P53005 and previous config saved to /var/cache/conftool/dbconfig/20231018-124820-arnaudb.json
  • 12:44 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 12:44 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 12:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
  • 12:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
  • 12:38 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:37 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P53004 and previous config saved to /var/cache/conftool/dbconfig/20231018-123333-arnaudb.json
  • 12:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P53003 and previous config saved to /var/cache/conftool/dbconfig/20231018-123315-arnaudb.json
  • 12:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53002 and previous config saved to /var/cache/conftool/dbconfig/20231018-121828-arnaudb.json
  • 12:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53001 and previous config saved to /var/cache/conftool/dbconfig/20231018-121811-arnaudb.json
  • 12:17 arnaudb: repool db2161 and db1126
  • 11:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1009.eqiad.wmnet
  • 11:44 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1009.eqiad.wmnet
  • 11:43 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 11:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 11:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 11:29 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 11:29 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 11:23 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 11:21 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 11:20 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 11:14 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 11:12 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 11:11 ladsgroup@deploy2002: Finished scap: Backport for Set s6 and s8 to write both for pagelinks migration (T345732) (duration: 10m 10s)
  • 11:08 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 11:05 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:02 ladsgroup@deploy2002: ladsgroup: Backport for Set s6 and s8 to write both for pagelinks migration (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:01 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 11:01 ladsgroup@deploy2002: Started scap: Backport for Set s6 and s8 to write both for pagelinks migration (T345732)
  • 10:40 volans: re-enabled puppet on the cumin hosts. installed spicerack 8.0.1 on the cumin hosts
  • 10:37 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1007.eqiad.wmnet
  • 10:32 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 10:28 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:19 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 10:16 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 10:09 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 10:07 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 10:03 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 09:54 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 09:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on stat1009.eqiad.wmnet with reason: Extending downtime for stat1009
  • 09:52 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on stat1009.eqiad.wmnet with reason: Extending downtime for stat1009
  • 09:48 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 09:47 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 09:25 volans: uploaded spicerack_8.0.1 to apt.wikimedia.org bullseye-wikimedia
  • 09:23 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 09:23 jynus: aborting backup of es1022, es1025 (there was already another backup running)
  • 09:23 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 09:22 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 09:21 jynus: starting new backup of es1022, es1025 (new clusters only)
  • 09:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1006.eqiad.wmnet
  • 09:20 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 09:19 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 09:17 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 09:17 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 09:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on stat1009.eqiad.wmnet with reason: Moving /home to /srv/home on stat1009 and rebooting
  • 09:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on stat1009.eqiad.wmnet with reason: Moving /home to /srv/home on stat1009 and rebooting
  • 09:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1007.eqiad.wmnet
  • 09:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1006.eqiad.wmnet
  • 09:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
  • 09:10 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 09:06 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
  • 09:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
  • 09:02 aqu@deploy2002: Finished deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy - Second try [airflow-dags@c17c91ce] (duration: 00m 06s)
  • 09:02 aqu@deploy2002: Started deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy - Second try [airflow-dags@c17c91ce]
  • 09:01 aqu@deploy2002: deploy aborted: Fix following yesterday weekly train deploy [airflow-dags@c17c91ce] (duration: 01m 10s)
  • 09:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy [airflow-dags@c17c91ce]
  • 08:54 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
  • 08:51 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 08:40 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 08:18 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 08:14 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 08:08 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 08:06 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 08:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-e8/ssw links - ayounsi@cumin1001"
  • 08:02 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-e8/ssw links - ayounsi@cumin1001"
  • 07:54 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:47 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2132.codfw.wmnet with OS bookworm
  • 07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2132.codfw.wmnet with reason: host reimage
  • 07:37 volans: temporarily disabled puppet on the A:cumin hosts to deploy and test spicerack v8.0.0
  • 07:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2132.codfw.wmnet with reason: host reimage
  • 07:28 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 07:28 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 07:28 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 07:28 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 07:27 filippo@deploy2002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 07:27 filippo@deploy2002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2132.codfw.wmnet with OS bookworm
  • 07:06 aqu@deploy2002: Finished deploy [airflow-dags/analytics@5dcce3b]: Add missing MR in yesterday weekly train (run 2) [airflow-dags@5dcce3bd] (duration: 00m 07s)
  • 07:05 aqu@deploy2002: Started deploy [airflow-dags/analytics@5dcce3b]: Add missing MR in yesterday weekly train (run 2) [airflow-dags@5dcce3bd]
  • 07:05 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 06s)
  • 07:05 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
  • 07:04 aqu@deploy2002: deploy aborted: Add missing MR in yesterday weekly train [airflow-dags@5dcce3bd] (duration: 03m 52s)
  • 07:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@5dcce3b]: Add missing MR in yesterday weekly train [airflow-dags@5dcce3bd]
  • 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2160.codfw.wmnet with OS bookworm
  • 06:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2160.codfw.wmnet with reason: host reimage
  • 06:38 XioNoX: push pfw policies - T349101
  • 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2160.codfw.wmnet with reason: host reimage
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2160.codfw.wmnet with OS bookworm
  • 06:08 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2160.codfw.wmnet with OS bookworm
  • 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2160.codfw.wmnet with OS bookworm
  • 01:22 eileen: civicrm upgraded from da11d010 to d8fe92e3

2023-10-17

  • 22:03 herron: pyrra.wm.o upgraded to 0.7.1 T302995
  • 21:32 catrope@deploy2002: backport Cancelled
  • 21:10 inflatador: bking@cumin1001 repool wdqs eqiad after rdf-streaming-updater fix
  • 21:05 catrope@deploy2002: Finished scap: Backport for Add language prefix to Readability survey (T347208) (duration: 13m 03s)
  • 21:00 catrope@deploy2002: catrope and jdrewniak: Continuing with sync
  • 20:53 catrope@deploy2002: catrope and jdrewniak: Backport for Add language prefix to Readability survey (T347208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:52 inflatador: bking@cumin1001 depool wdqs eqiad due to rdf-streaming-updater failure
  • 20:52 catrope@deploy2002: Started scap: Backport for Add language prefix to Readability survey (T347208)
  • 20:36 volans: uploaded spicerack_8.0.0 to apt.wikimedia.org bullseye-wikimedia
  • 20:36 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 20:36 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 20:35 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 20:34 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 20:31 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 20:31 eevans@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 20:29 catrope@deploy2002: Finished scap: Backport for Fixes incorrect Hebrew logo and applies gotwiki (T341253 T341251) (duration: 09m 59s)
  • 20:27 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 20:26 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 20:24 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 20:24 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 20:24 catrope@deploy2002: jdlrobson and catrope: Continuing with sync
  • 20:21 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 20:21 eevans@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 20:20 catrope@deploy2002: jdlrobson and catrope: Backport for Fixes incorrect Hebrew logo and applies gotwiki (T341253 T341251) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 catrope@deploy2002: Started scap: Backport for Fixes incorrect Hebrew logo and applies gotwiki (T341253 T341251)
  • 20:16 catrope@deploy2002: Finished scap: Backport for Wordmark for blk wiktionary and got wikipedia (T341253 T341257) (duration: 11m 17s)
  • 20:11 catrope@deploy2002: catrope and jdlrobson: Continuing with sync
  • 20:06 catrope@deploy2002: catrope and jdlrobson: Backport for Wordmark for blk wiktionary and got wikipedia (T341253 T341257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:05 catrope@deploy2002: Started scap: Backport for Wordmark for blk wiktionary and got wikipedia (T341253 T341257)
  • 18:46 hashar@deploy2002: Finished scap: Backport for logging: reorder wmgMonologProcessors entries (T349086) (duration: 08m 14s)
  • 18:43 hashar@deploy2002: hashar: Continuing with sync
  • 18:39 hashar@deploy2002: hashar: Backport for logging: reorder wmgMonologProcessors entries (T349086) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:38 hashar@deploy2002: Started scap: Backport for logging: reorder wmgMonologProcessors entries (T349086)
  • 18:25 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.1 refs T348354
  • 18:18 brennen: train 1.42.0-wmf.1 (T348354): blockers resolved, rolling to group0
  • 18:16 brennen@deploy2002: Finished scap: Backport for Pass full content to Parsoid for redirect pages (T349087) (duration: 07m 42s)
  • 18:11 brennen@deploy2002: brennen: Continuing with sync
  • 18:09 brennen@deploy2002: brennen: Backport for Pass full content to Parsoid for redirect pages (T349087) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:08 brennen@deploy2002: Started scap: Backport for Pass full content to Parsoid for redirect pages (T349087)
  • 17:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 17:05 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 16:22 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:22 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:50 sukhe: running authdns-update for CR 966564
  • 15:09 brennen@deploy2002: Finished deploy [phabricator/deployment@745d703]: deploy to phab1004 for T349038 (duration: 00m 57s)
  • 15:08 brennen@deploy2002: Started deploy [phabricator/deployment@745d703]: deploy to phab1004 for T349038
  • 15:07 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:07 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@745d703]: test deploy to phab2002 for T349038 (duration: 00m 33s)
  • 15:06 brennen@deploy2002: Started deploy [phabricator/deployment@745d703]: test deploy to phab2002 for T349038
  • 15:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
  • 15:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
  • 15:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator maintenance
  • 15:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator maintenance
  • 15:03 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 15:02 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:59 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:58 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:28 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:28 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:24 denisse@deploy2002: Finished deploy [performance/navtiming@2e17c67]: (no justification provided) (duration: 00m 05s)
  • 14:24 denisse@deploy2002: Started deploy [performance/navtiming@2e17c67]: (no justification provided)
  • 14:11 jdrewniak@deploy2002: Finished scap: Backport for ParserOutputAccess: Fix local cache when page is edited within the process (T349033) (duration: 15m 56s)
  • 14:05 jdrewniak@deploy2002: jdrewniak: Continuing with sync
  • 14:03 tchin@deploy2002: Finished deploy [airflow-dags/analytics_test@be05071]: Regular analytics weekly train (duration: 00m 06s)
  • 14:03 tchin@deploy2002: Started deploy [airflow-dags/analytics_test@be05071]: Regular analytics weekly train
  • 14:01 tchin@deploy2002: Finished deploy [airflow-dags/analytics@fae5764]: (no justification provided) (duration: 01m 22s)
  • 13:59 tchin@deploy2002: Started deploy [airflow-dags/analytics@fae5764]: (no justification provided)
  • 13:56 jdrewniak@deploy2002: jdrewniak: Backport for ParserOutputAccess: Fix local cache when page is edited within the process (T349033) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:55 jdrewniak@deploy2002: Started scap: Backport for ParserOutputAccess: Fix local cache when page is edited within the process (T349033)
  • 13:52 tchin@deploy2002: Finished deploy [analytics/refinery@0d09fbd] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d09fbdc] (duration: 02m 59s)
  • 13:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1225.eqiad.wmnet with reason: db1225 downtime for restoration
  • 13:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1225.eqiad.wmnet with reason: db1225 downtime for restoration
  • 13:49 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 13:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 13:49 tchin@deploy2002: Started deploy [analytics/refinery@0d09fbd] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d09fbdc]
  • 13:49 tchin@deploy2002: Finished deploy [analytics/refinery@0d09fbd] (thin): Regular analytics weekly train THIN [analytics/refinery@0d09fbdc] (duration: 00m 07s)
  • 13:49 tchin@deploy2002: Started deploy [analytics/refinery@0d09fbd] (thin): Regular analytics weekly train THIN [analytics/refinery@0d09fbdc]
  • 13:48 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 13:48 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 13:48 tchin@deploy2002: Finished deploy [analytics/refinery@0d09fbd]: Regular analytics weekly train [analytics/refinery@0d09fbdc] (duration: 07m 24s)
  • 13:47 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2160.codfw.wmnet with OS bookworm
  • 13:46 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:46 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:40 tchin@deploy2002: Started deploy [analytics/refinery@0d09fbd]: Regular analytics weekly train [analytics/refinery@0d09fbdc]
  • 13:40 jdrewniak@deploy2002: Finished scap: Backport for Enable Vector readability survey on select wikis (T347208) (duration: 09m 50s)
  • 13:34 jdrewniak@deploy2002: jdrewniak: Continuing with sync
  • 13:32 jdrewniak@deploy2002: jdrewniak: Backport for Enable Vector readability survey on select wikis (T347208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:30 jdrewniak@deploy2002: Started scap: Backport for Enable Vector readability survey on select wikis (T347208)
  • 13:26 jdrewniak@deploy2002: Backport cancelled.
  • 13:15 jdrewniak@deploy2002: Backport cancelled.
  • 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2160.codfw.wmnet with OS bookworm
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119 T339185', diff saved to https://phabricator.wikimedia.org/P52995 and previous config saved to /var/cache/conftool/dbconfig/20231017-124916-root.json
  • 12:28 urandom: Starting Cassandra decommission(s) of restbase1017 —
  • 11:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52994 and previous config saved to /var/cache/conftool/dbconfig/20231017-115217-arnaudb.json
  • 11:39 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depool db1126 T349077', diff saved to https://phabricator.wikimedia.org/P52993 and previous config saved to /var/cache/conftool/dbconfig/20231017-113809-arnaudb.json
  • 11:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P52992 and previous config saved to /var/cache/conftool/dbconfig/20231017-113711-arnaudb.json
  • 11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db1126 with weight 275 T349077', diff saved to https://phabricator.wikimedia.org/P52991 and previous config saved to /var/cache/conftool/dbconfig/20231017-113432-arnaudb.json
  • 11:29 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:27 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P52990 and previous config saved to /var/cache/conftool/dbconfig/20231017-112204-arnaudb.json
  • 11:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Promote db1209 to s8 primary T349077', diff saved to https://phabricator.wikimedia.org/P52989 and previous config saved to /var/cache/conftool/dbconfig/20231017-111720-arnaudb.json
  • 11:12 arnaudb: Starting s8 eqiad failover from db1126 to db1209 - T349077
  • 11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52988 and previous config saved to /var/cache/conftool/dbconfig/20231017-110658-arnaudb.json
  • 11:00 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:59 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db1209 with weight 0 T349077', diff saved to https://phabricator.wikimedia.org/P52987 and previous config saved to /var/cache/conftool/dbconfig/20231017-104839-arnaudb.json
  • 10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349077
  • 10:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349077
  • 10:28 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:28 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:59 hashar: Deleted operations-puppet-catalog-compiler Jenkins job to replace it with a new job letting one picks the Puppet version(s) to compile against | T236373
  • 09:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 09:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 09:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet
  • 09:58 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet
  • 09:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
  • 09:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
  • 09:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
  • 09:42 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host an-airflow1007.eqiad.wmnet
  • 09:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
  • 09:36 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@b010dae]: (no justification provided) (duration: 00m 46s)
  • 09:35 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@b010dae]: (no justification provided)
  • 09:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1007.eqiad.wmnet
  • 09:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet
  • 09:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet
  • 09:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1005.eqiad.wmnet
  • 09:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1005.eqiad.wmnet
  • 09:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1004.eqiad.wmnet
  • 09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1004.eqiad.wmnet
  • 09:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1006.eqiad.wmnet
  • 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1006.eqiad.wmnet
  • 09:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
  • 09:12 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
  • 08:38 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 08:35 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 08:32 XioNoX: push pfw policies - T348576
  • 07:26 hashar@deploy2002: Finished deploy [gerrit/gerrit@578be93]: wm-checks-api: filter out Zuul start messages | T348920 (duration: 00m 07s)
  • 07:26 hashar@deploy2002: Started deploy [gerrit/gerrit@578be93]: wm-checks-api: filter out Zuul start messages | T348920
  • 07:23 hashar@deploy2002: Finished deploy [gerrit/gerrit@1153a16]: wm-checks-api: filter out Zuul start messages | T348920 (duration: 00m 05s)
  • 07:22 hashar@deploy2002: Started deploy [gerrit/gerrit@1153a16]: wm-checks-api: filter out Zuul start messages | T348920
  • 06:06 isaranto@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 T349053', diff saved to https://phabricator.wikimedia.org/P52986 and previous config saved to /var/cache/conftool/dbconfig/20231017-060214-root.json
  • 06:06 isaranto@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 06:02 isaranto@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 primary and set section read-write T349053', diff saved to https://phabricator.wikimedia.org/P52985 and previous config saved to /var/cache/conftool/dbconfig/20231017-060047-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s8 codfw as read-only for maintenance - T349053', diff saved to https://phabricator.wikimedia.org/P52984 and previous config saved to /var/cache/conftool/dbconfig/20231017-060021-root.json
  • 06:00 marostegui: Starting s8 codfw failover from db2161 to db2165 - T349053
  • 05:59 kart_: Update MinT to 2023-10-16-101614-production (T333969, T336683, T348097)
  • 05:36 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:31 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:29 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:19 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 T349053', diff saved to https://phabricator.wikimedia.org/P52983 and previous config saved to /var/cache/conftool/dbconfig/20231017-051723-root.json
  • 05:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053
  • 03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.29 (duration: 02m 15s)
  • 03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.1 refs T348354 (duration: 50m 15s)
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.1 refs T348354
  • 02:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52982 and previous config saved to /var/cache/conftool/dbconfig/20231017-021040-arnaudb.json
  • 02:10 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 02:10 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 02:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52981 and previous config saved to /var/cache/conftool/dbconfig/20231017-021018-arnaudb.json
  • 01:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P52980 and previous config saved to /var/cache/conftool/dbconfig/20231017-015511-arnaudb.json
  • 01:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P52979 and previous config saved to /var/cache/conftool/dbconfig/20231017-014005-arnaudb.json
  • 01:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52978 and previous config saved to /var/cache/conftool/dbconfig/20231017-012459-arnaudb.json

2023-10-16

  • 22:04 maryum: deployed security patch for T347742
  • 21:53 maryum: deployed security patch for T347708
  • 21:40 maryum: deployed security patch for T348343
  • 21:04 sbassett: deployed security mitigation for T348828
  • 20:55 cjming: end of UTC late backport window
  • 20:53 cjming@deploy2002: Finished scap: Backport for wordmarks/taglines for Wiktionary projects (T341257) (duration: 07m 17s)
  • 20:47 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
  • 20:46 cjming@deploy2002: jdlrobson and cjming: Backport for wordmarks/taglines for Wiktionary projects (T341257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:45 cjming@deploy2002: Started scap: Backport for wordmarks/taglines for Wiktionary projects (T341257)
  • 20:44 cjming@deploy2002: Finished scap: Backport for Update logos for remaining Wikisource projects (T343753) (duration: 07m 50s)
  • 20:39 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
  • 20:37 cjming@deploy2002: jdlrobson and cjming: Backport for Update logos for remaining Wikisource projects (T343753) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:36 cjming@deploy2002: Started scap: Backport for Update logos for remaining Wikisource projects (T343753)
  • 20:35 cjming@deploy2002: Finished scap: Backport for Fixes Thai Wikinews wordmark and sewikimedia (T348757 T347534) (duration: 07m 08s)
  • 20:30 cjming@deploy2002: cjming and jdlrobson: Continuing with sync
  • 20:29 cjming@deploy2002: cjming and jdlrobson: Backport for Fixes Thai Wikinews wordmark and sewikimedia (T348757 T347534) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:28 cjming@deploy2002: Started scap: Backport for Fixes Thai Wikinews wordmark and sewikimedia (T348757 T347534)
  • 20:26 cjming@deploy2002: Finished scap: Backport for Merge ReplyWidget[Plain/Visual] modules (T348834) (duration: 07m 23s)
  • 20:21 cjming@deploy2002: kemayo and cjming: Continuing with sync
  • 20:20 cjming@deploy2002: kemayo and cjming: Backport for Merge ReplyWidget[Plain/Visual] modules (T348834) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 cjming@deploy2002: Started scap: Backport for Merge ReplyWidget[Plain/Visual] modules (T348834)
  • 20:18 cjming@deploy2002: Finished scap: Backport for Enable display of Client Hints data on all wikis (T341110 T337942) (duration: 08m 17s)
  • 20:13 cjming@deploy2002: dreamyjazz and cjming: Continuing with sync
  • 20:11 cjming@deploy2002: dreamyjazz and cjming: Backport for Enable display of Client Hints data on all wikis (T341110 T337942) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:10 cjming@deploy2002: Started scap: Backport for Enable display of Client Hints data on all wikis (T341110 T337942)
  • 19:55 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:55 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:42 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:42 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:30 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:30 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:27 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:27 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts aqs1010.eqiad.wmnet
  • 19:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts aqs1010.eqiad.wmnet
  • 19:20 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:20 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:17 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts aqs1010.eqiad.wmnet
  • 19:13 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:12 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:09 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts aqs1010.eqiad.wmnet
  • 18:51 sukhe: exiqgrep -i -r <redacted> | xargs exim -Mrm
  • 18:41 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 18:27 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 18:27 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 18:27 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 18:20 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:19 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:06 ejegg: fundraising python tools upgraded from 7c6a28e0 to e56ae8ae
  • 17:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 17:59 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
  • 17:55 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:55 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:41 denisse: Upgrading navtiming on the webperf hosts in the beta cluster
  • 17:14 ejegg: fundraising python tools upgraded from 0c17296c to 7c6a28e0
  • 16:48 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 16:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 16:43 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:42 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:23 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52975 and previous config saved to /var/cache/conftool/dbconfig/20231016-161829-arnaudb.json
  • 16:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T343198)', diff saved to https://phabricator.wikimedia.org/P52974 and previous config saved to /var/cache/conftool/dbconfig/20231016-161806-arnaudb.json
  • 16:10 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 16:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P52973 and previous config saved to /var/cache/conftool/dbconfig/20231016-160300-arnaudb.json
  • 15:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P52972 and previous config saved to /var/cache/conftool/dbconfig/20231016-154754-arnaudb.json
  • 15:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T343198)', diff saved to https://phabricator.wikimedia.org/P52971 and previous config saved to /var/cache/conftool/dbconfig/20231016-153247-arnaudb.json
  • 15:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for sessionstore2001.codfw.wmnet
  • 15:10 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for sessionstore2001.codfw.wmnet
  • 15:08 sukhe: running authdns-update
  • 15:03 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 14:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4003.wikimedia.org with OS bookworm
  • 14:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sessionstore2001.codfw.wmnet with reason: Moving host — T348142
  • 14:54 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on sessionstore2001.codfw.wmnet with reason: Moving host — T348142
  • 14:42 ejegg: Standalone (IPN listener) SmashPig upgraded from 211284b9 to e27dfbce
  • 14:35 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:34 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:34 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:33 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:33 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:33 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:30 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:30 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 14:26 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 14:25 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 14:23 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:22 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:22 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:21 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:20 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:20 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:18 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:17 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:16 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:15 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:10 ladsgroup@deploy2002: Finished scap: Backport for Disable DoubleWiki extension everywhere (T344544) (duration: 08m 09s)
  • 14:05 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 14:03 ladsgroup@deploy2002: ladsgroup: Backport for Disable DoubleWiki extension everywhere (T344544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bookworm
  • 14:02 ladsgroup@deploy2002: Started scap: Backport for Disable DoubleWiki extension everywhere (T344544)
  • 13:53 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:52 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:52 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:42 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:41 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:39 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:38 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 13:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 13:36 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 13:36 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 13:36 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 13:35 jayme@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 13:34 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:34 TheresNoTime: close UTC afternoon backport window
  • 13:34 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:34 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:34 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:34 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:30 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:30 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:14 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:14 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:12 samtar@deploy2002: Finished scap: Backport for fix incubatorwiki wordmark (T348577), update throttle rule for UIUC Wikipedia edit-a-thon November 13, 2023 and remove old throttle rules (T346043) (duration: 08m 08s)
  • 13:07 samtar@deploy2002: samtar and anzx: Continuing with sync
  • 13:05 samtar@deploy2002: samtar and anzx: Backport for fix incubatorwiki wordmark (T348577), update throttle rule for UIUC Wikipedia edit-a-thon November 13, 2023 and remove old throttle rules (T346043) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:05 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:04 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:04 samtar@deploy2002: Started scap: Backport for fix incubatorwiki wordmark (T348577), update throttle rule for UIUC Wikipedia edit-a-thon November 13, 2023 and remove old throttle rules (T346043)
  • 12:35 ladsgroup@deploy2002: Finished scap: Backport for Switch ES cluster to cluster28 and cluster29 (T342685) (duration: 18m 52s)
  • 12:29 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:17 ladsgroup@deploy2002: ladsgroup: Backport for Switch ES cluster to cluster28 and cluster29 (T342685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:16 ladsgroup@deploy2002: Started scap: Backport for Switch ES cluster to cluster28 and cluster29 (T342685)
  • 11:15 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 11:12 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 11:10 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 11:07 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 11:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
  • 10:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
  • 10:18 ladsgroup@deploy2002: Finished scap: Backport for Change default of pagelinks to write both (T345732) (duration: 07m 44s)
  • 10:12 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 10:11 ladsgroup@deploy2002: ladsgroup: Backport for Change default of pagelinks to write both (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:10 ladsgroup@deploy2002: Started scap: Backport for Change default of pagelinks to write both (T345732)
  • 10:06 ladsgroup@deploy2002: Finished scap: Backport for Enable pagelinks migration WRITE BOTH on some more wikis (T345732) (duration: 09m 19s)
  • 10:01 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 09:58 ladsgroup@deploy2002: ladsgroup: Backport for Enable pagelinks migration WRITE BOTH on some more wikis (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:57 ladsgroup@deploy2002: Started scap: Backport for Enable pagelinks migration WRITE BOTH on some more wikis (T345732)
  • 09:52 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:52 phuedx@deploy2002: Finished scap: Backport for Revert "Introduce Web Accessibility Features and Submodule" (duration: 10m 04s)
  • 09:52 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:51 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:47 phuedx@deploy2002: phuedx: Continuing with sync
  • 09:43 phuedx@deploy2002: phuedx: Backport for Revert "Introduce Web Accessibility Features and Submodule" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:42 phuedx@deploy2002: Started scap: Backport for Revert "Introduce Web Accessibility Features and Submodule"
  • 09:38 phuedx@deploy2002: backport Cancelled
  • 09:00 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-test-master1001.eqiad.wmnet
  • 08:56 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 08:52 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 08:51 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:48 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
  • 08:44 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:44 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:44 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:43 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:43 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:42 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:41 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 08:40 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 08:40 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 08:39 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 08:38 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 08:38 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 08:36 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 08:35 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 08:35 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 08:35 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 08:35 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:34 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 08:34 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:34 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 08:34 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:10 hashar@deploy2002: Finished scap: Backport for Don't try to lock to serialize m3u8 file writes (T348689 T348667 T348375 T348753) (duration: 27m 04s)
  • 07:58 hashar@deploy2002: jforrester and hashar: Continuing with sync
  • 07:57 hashar@deploy2002: jforrester and hashar: Backport for Don't try to lock to serialize m3u8 file writes (T348689 T348667 T348375 T348753) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:55 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 07:54 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 07:54 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 07:53 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 07:43 hashar@deploy2002: Started scap: Backport for Don't try to lock to serialize m3u8 file writes (T348689 T348667 T348375 T348753)
  • 07:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T343198)', diff saved to https://phabricator.wikimedia.org/P52968 and previous config saved to /var/cache/conftool/dbconfig/20231016-073731-arnaudb.json
  • 07:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 07:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 07:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 07:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 07:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52967 and previous config saved to /var/cache/conftool/dbconfig/20231016-073653-arnaudb.json
  • 07:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P52966 and previous config saved to /var/cache/conftool/dbconfig/20231016-072147-arnaudb.json
  • 07:17 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 07:17 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 07:15 aqu@deploy2002: Finished deploy [analytics/refinery@1baf3be] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1baf3be2] (duration: 02m 51s)
  • 07:12 aqu@deploy2002: Started deploy [analytics/refinery@1baf3be] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1baf3be2]
  • 07:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P52965 and previous config saved to /var/cache/conftool/dbconfig/20231016-070640-arnaudb.json
  • 06:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52964 and previous config saved to /var/cache/conftool/dbconfig/20231016-065134-arnaudb.json
  • 05:41 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 05:41 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 05:40 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 05:40 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 05:39 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 05:38 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 05:36 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 05:35 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 05:34 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 05:33 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 05:33 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 05:33 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 05:32 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 05:32 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 05:32 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 05:32 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 05:31 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 05:31 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .

2023-10-15

  • 22:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52963 and previous config saved to /var/cache/conftool/dbconfig/20231015-222435-arnaudb.json
  • 22:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52962 and previous config saved to /var/cache/conftool/dbconfig/20231015-222414-arnaudb.json
  • 22:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P52961 and previous config saved to /var/cache/conftool/dbconfig/20231015-220907-arnaudb.json
  • 21:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P52960 and previous config saved to /var/cache/conftool/dbconfig/20231015-215401-arnaudb.json
  • 21:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52959 and previous config saved to /var/cache/conftool/dbconfig/20231015-213855-arnaudb.json
  • 19:10 urandom: starting Cassandra decommission of restbase1016-b — T328490
  • 14:35 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:32 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:31 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:31 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:31 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 14:30 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:30 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 13:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52958 and previous config saved to /var/cache/conftool/dbconfig/20231015-130027-arnaudb.json
  • 13:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 13:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 13:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T343198)', diff saved to https://phabricator.wikimedia.org/P52957 and previous config saved to /var/cache/conftool/dbconfig/20231015-130005-arnaudb.json
  • 12:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P52956 and previous config saved to /var/cache/conftool/dbconfig/20231015-124459-arnaudb.json
  • 12:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P52955 and previous config saved to /var/cache/conftool/dbconfig/20231015-122953-arnaudb.json
  • 12:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T343198)', diff saved to https://phabricator.wikimedia.org/P52954 and previous config saved to /var/cache/conftool/dbconfig/20231015-121446-arnaudb.json
  • 11:03 hashar@deploy2002: Finished deploy [integration/docroot@096f637]: (no justification provided) (duration: 00m 05s)
  • 11:03 hashar@deploy2002: Started deploy [integration/docroot@096f637]: (no justification provided)
  • 03:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T343198)', diff saved to https://phabricator.wikimedia.org/P52953 and previous config saved to /var/cache/conftool/dbconfig/20231015-035420-arnaudb.json
  • 03:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 03:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 03:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T343198)', diff saved to https://phabricator.wikimedia.org/P52952 and previous config saved to /var/cache/conftool/dbconfig/20231015-035347-arnaudb.json
  • 03:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P52951 and previous config saved to /var/cache/conftool/dbconfig/20231015-033841-arnaudb.json
  • 03:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P52950 and previous config saved to /var/cache/conftool/dbconfig/20231015-032335-arnaudb.json
  • 03:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T343198)', diff saved to https://phabricator.wikimedia.org/P52949 and previous config saved to /var/cache/conftool/dbconfig/20231015-030828-arnaudb.json

2023-10-14

  • 18:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T343198)', diff saved to https://phabricator.wikimedia.org/P52948 and previous config saved to /var/cache/conftool/dbconfig/20231014-184517-arnaudb.json
  • 18:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 18:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 18:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T343198)', diff saved to https://phabricator.wikimedia.org/P52947 and previous config saved to /var/cache/conftool/dbconfig/20231014-184455-arnaudb.json
  • 18:30 urandom: starting Cassandra decommission of restbase1016-a — T328490
  • 18:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P52946 and previous config saved to /var/cache/conftool/dbconfig/20231014-182949-arnaudb.json
  • 18:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P52945 and previous config saved to /var/cache/conftool/dbconfig/20231014-181442-arnaudb.json
  • 17:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T343198)', diff saved to https://phabricator.wikimedia.org/P52944 and previous config saved to /var/cache/conftool/dbconfig/20231014-175936-arnaudb.json
  • 17:34 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:34 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:33 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:33 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:32 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:32 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T343198)', diff saved to https://phabricator.wikimedia.org/P52943 and previous config saved to /var/cache/conftool/dbconfig/20231014-091542-arnaudb.json
  • 09:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 02:29 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 02:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 02:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 02:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52942 and previous config saved to /var/cache/conftool/dbconfig/20231014-022208-arnaudb.json
  • 02:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P52941 and previous config saved to /var/cache/conftool/dbconfig/20231014-020701-arnaudb.json
  • 01:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P52940 and previous config saved to /var/cache/conftool/dbconfig/20231014-015154-arnaudb.json
  • 01:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52939 and previous config saved to /var/cache/conftool/dbconfig/20231014-013648-arnaudb.json
  • 00:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)

2023-10-13

  • 23:56 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:21 ejegg: fundraising civicrm upgraded from c5f54d97 to e71ccffb
  • 21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1107.eqiad.wmnet with OS bullseye
  • 21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1106.eqiad.wmnet with OS bullseye
  • 21:29 hashar@deploy2002: Finished deploy [integration/docroot@096f637]: Expand Purtle doc card (duration: 00m 05s)
  • 21:29 hashar@deploy2002: Started deploy [integration/docroot@096f637]: Expand Purtle doc card
  • 21:29 hashar@deploy2002: Finished deploy [integration/docroot@504d455]: Fix php-session-serializer tagline (duration: 00m 06s)
  • 21:28 hashar@deploy2002: Started deploy [integration/docroot@504d455]: Fix php-session-serializer tagline
  • 20:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1111.eqiad.wmnet with OS bullseye
  • 20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1114.eqiad.wmnet with OS bullseye
  • 20:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1113.eqiad.wmnet with OS bullseye
  • 20:29 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1108']
  • 20:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1108']
  • 20:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1108.eqiad.wmnet with OS bullseye
  • 20:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1107.eqiad.wmnet with OS bullseye
  • 20:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 20:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 20:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
  • 20:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1112.eqiad.wmnet with OS bullseye
  • 20:11 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:10 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 20:07 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
  • 20:07 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:06 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
  • 20:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1109.eqiad.wmnet with OS bullseye
  • 20:04 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:03 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 19:57 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
  • 19:56 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 19:56 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:55 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
  • 19:54 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
  • 19:53 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 19:53 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:52 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
  • 19:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 19:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS bullseye
  • 19:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
  • 19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1102']
  • 19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:39 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp1113']
  • 19:38 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
  • 19:37 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
  • 19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
  • 19:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
  • 19:28 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
  • 19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
  • 19:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:25 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 19:24 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:24 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1102']
  • 19:24 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 19:24 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 19:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp1114']
  • 19:23 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
  • 19:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1112']
  • 19:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:22 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1113']
  • 19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 19:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1110']
  • 19:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 19:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
  • 19:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 19:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
  • 19:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1113']
  • 19:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
  • 19:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
  • 19:14 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@520fa55]: (no justification provided) (duration: 00m 23s)
  • 19:14 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1113']
  • 19:14 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@520fa55]: (no justification provided)
  • 19:14 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
  • 19:08 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:08 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1112']
  • 19:07 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 19:07 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
  • 19:04 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 19:03 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 19:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 18:58 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1109']
  • 18:52 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1109']
  • 18:06 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1108']
  • 18:06 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
  • 18:03 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1115.eqiad.wmnet with OS bullseye
  • 18:03 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:02 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1107']
  • 18:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1108']
  • 18:00 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@520fa55]: (no justification provided) (duration: 00m 59s)
  • 17:59 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@520fa55]: (no justification provided)
  • 17:56 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1107']
  • 17:55 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1106']
  • 17:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1106']
  • 17:54 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
  • 17:53 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 17:53 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
  • 17:52 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1103']
  • 17:50 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1101']
  • 17:48 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1100']
  • 17:46 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1103']
  • 17:46 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 17:46 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 17:45 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 17:44 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1101']
  • 17:42 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 17:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 17:26 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
  • 17:26 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@520fa55]: (no justification provided) (duration: 01m 01s)
  • 17:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 17:25 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@520fa55]: (no justification provided)
  • 17:16 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 17:16 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1105']
  • 17:15 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 17:14 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
  • 17:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 17:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1115']
  • 17:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1115']
  • 16:58 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 16:57 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:49 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:43 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:43 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:42 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1105']
  • 16:42 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 16:41 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1105']
  • 16:41 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
  • 16:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 16:29 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52936 and previous config saved to /var/cache/conftool/dbconfig/20231013-162902-arnaudb.json
  • 16:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 16:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 16:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T343198)', diff saved to https://phabricator.wikimedia.org/P52935 and previous config saved to /var/cache/conftool/dbconfig/20231013-162840-arnaudb.json
  • 16:24 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1114']
  • 16:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P52934 and previous config saved to /var/cache/conftool/dbconfig/20231013-161333-arnaudb.json
  • 16:12 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
  • 16:11 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1113']
  • 16:10 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1115']
  • 16:06 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1114.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1115']
  • 15:59 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1113']
  • 15:59 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1115.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P52933 and previous config saved to /var/cache/conftool/dbconfig/20231013-155827-arnaudb.json
  • 15:55 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
  • 15:55 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1113.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 15:54 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
  • 15:45 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
  • 15:45 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
  • 15:44 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 15:44 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1112']
  • 15:44 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
  • 15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T343198)', diff saved to https://phabricator.wikimedia.org/P52932 and previous config saved to /var/cache/conftool/dbconfig/20231013-154321-arnaudb.json
  • 15:43 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1112.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1115.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1114.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:40 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1115
  • 15:40 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1114
  • 15:39 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1115
  • 15:39 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1114
  • 15:37 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1113.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:35 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
  • 15:35 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1111.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:35 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
  • 15:33 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1109']
  • 15:32 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1113
  • 15:32 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:32 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1108']
  • 15:32 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1107']
  • 15:31 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1113
  • 15:25 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1112.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:23 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1109']
  • 15:23 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1106']
  • 15:22 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1108']
  • 15:21 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1107']
  • 15:20 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1109.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:19 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:19 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1108.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:18 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:16 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1112
  • 15:16 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:15 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1112
  • 15:15 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1111.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1106']
  • 15:12 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:10 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:10 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1111
  • 15:08 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1111
  • 15:07 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1110
  • 15:06 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1110
  • 15:02 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1109.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:01 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1109
  • 14:59 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1109
  • 14:58 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1108.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:56 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1108
  • 14:55 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:55 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1107
  • 14:54 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1108
  • 14:53 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1107
  • 14:51 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1106
  • 14:51 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1106
  • 14:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bookworm
  • 14:30 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1106
  • 14:30 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1106
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1008.wikimedia.org with reason: host reimage
  • 14:28 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:26 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1008.wikimedia.org with reason: host reimage
  • 14:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 14:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 14:17 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:17 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1105
  • 14:17 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1105
  • 14:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 14:06 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 14:03 sukhe: remove redundant 208.80.154.238/32 dev from /e/n/i on A:dns-rec and A:eqiad (superseded by label lo:anycast): T348041
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 13:20 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
  • 13:07 mvernon@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2003.codfw.wmnet with OS bookworm
  • 13:04 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
  • 13:04 mvernon@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2003.codfw.wmnet with OS bookworm
  • 13:04 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 12:53 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided) (duration: 00m 39s)
  • 12:52 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided)
  • 12:52 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided) (duration: 01m 26s)
  • 12:50 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided)
  • 12:47 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
  • 11:53 urandom: starting decommission of restbase2012-c — T328490
  • 11:07 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:10 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
  • 07:54 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
  • 06:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 150552
  • 06:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 150552
  • 06:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T343198)', diff saved to https://phabricator.wikimedia.org/P52925 and previous config saved to /var/cache/conftool/dbconfig/20231013-064400-arnaudb.json
  • 06:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 06:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 06:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T343198)', diff saved to https://phabricator.wikimedia.org/P52924 and previous config saved to /var/cache/conftool/dbconfig/20231013-064328-arnaudb.json
  • 06:43 moritzm: installing Linux 5.10.197 updates from Bullseye point release (no reboots, just installing the new kernels)
  • 06:39 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync
  • 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: setup in progress
  • 06:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: setup in progress
  • 06:38 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: sync
  • 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2002.codfw.wmnet with reason: setup in progress
  • 06:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2002.codfw.wmnet with reason: setup in progress
  • 06:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P52923 and previous config saved to /var/cache/conftool/dbconfig/20231013-062821-arnaudb.json
  • 06:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P52922 and previous config saved to /var/cache/conftool/dbconfig/20231013-061315-arnaudb.json
  • 05:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T343198)', diff saved to https://phabricator.wikimedia.org/P52921 and previous config saved to /var/cache/conftool/dbconfig/20231013-055809-arnaudb.json
  • 03:20 TimStarling: on non-CentralAuth wikis, created the loginnotify_seen_net table T346989
  • 03:08 TimStarling: on x1 wikishared, created loginnotify_seen_net table T346989
  • 01:11 cstone: payments-wiki upgraded from aa5cd24d to 7f4da789

2023-10-12

  • 21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 21:27 thcipriani@deploy2002: Finished scap: Backport for Set UseParserMigration true in wmf-config (T333179) (duration: 15m 20s)
  • 21:22 thcipriani@deploy2002: sbailey and thcipriani: Continuing with sync
  • 21:13 thcipriani@deploy2002: sbailey and thcipriani: Backport for Set UseParserMigration true in wmf-config (T333179) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:12 thcipriani@deploy2002: Started scap: Backport for Set UseParserMigration true in wmf-config (T333179)
  • 21:10 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:10 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:10 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 21:10 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:09 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 21:09 thcipriani: mwmaint2002:foreachwikiindblist 'group2 & s6' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230613000000 | tee /tmp/persistentRevisionThreadItems-s6.log
  • 21:09 thcipriani: mwmaint2002:foreachwikiindblist 'group2 & s7' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230613000000 | tee /tmp/persistentRevisionThreadItems-s7.log
  • 21:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T343198)', diff saved to https://phabricator.wikimedia.org/P52920 and previous config saved to /var/cache/conftool/dbconfig/20231012-210646-arnaudb.json
  • 21:06 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 21:06 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 21:06 thcipriani@deploy2002: Finished scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s7 group2 (T315353) (duration: 07m 55s)
  • 21:00 thcipriani@deploy2002: thcipriani and matmarex: Continuing with sync
  • 20:59 thcipriani@deploy2002: thcipriani and matmarex: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s7 group2 (T315353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:58 thcipriani@deploy2002: Started scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s7 group2 (T315353)
  • 20:50 dr0ptp4kt@deploy2002: Finished scap: Backport for Revert "Growth: Enable Welcome survey user research for enwiki" (T342353) (duration: 08m 32s)
  • 20:45 dr0ptp4kt@deploy2002: dr0ptp4kt and urbanecm: Continuing with sync
  • 20:43 dr0ptp4kt@deploy2002: dr0ptp4kt and urbanecm: Backport for Revert "Growth: Enable Welcome survey user research for enwiki" (T342353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:41 dr0ptp4kt@deploy2002: Started scap: Backport for Revert "Growth: Enable Welcome survey user research for enwiki" (T342353)
  • 20:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 20:37 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
  • 20:37 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
  • 20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 20:26 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 20:25 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:22 dr0ptp4kt@deploy2002: Finished scap: Backport for Allow FundraiseUp scripts in Donatewiki CSP (T345379) (duration: 07m 40s)
  • 20:21 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:17 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:17 dr0ptp4kt@deploy2002: dr0ptp4kt and ejegg: Continuing with sync
  • 20:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 20:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:16 dr0ptp4kt@deploy2002: dr0ptp4kt and ejegg: Backport for Allow FundraiseUp scripts in Donatewiki CSP (T345379) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 20:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:15 dr0ptp4kt@deploy2002: Started scap: Backport for Allow FundraiseUp scripts in Donatewiki CSP (T345379)
  • 20:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 20:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 20:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:06 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 20:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:05 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.wikimedia.org with reason: host reimage
  • 20:04 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 20:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 20:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.wikimedia.org with reason: host reimage
  • 20:00 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.wikimedia.org with reason: host reimage
  • 19:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.wikimedia.org with reason: host reimage
  • 19:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 19:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:58 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 19:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 19:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 19:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:47 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
  • 19:47 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
  • 19:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:46 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:45 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 19:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 19:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
  • 19:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1009']
  • 19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:38 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1010']
  • 19:38 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 19:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 19:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009']
  • 19:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010']
  • 19:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 19:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 19:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 19:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: new graph split hosts T347505
  • 19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: new graph split hosts T347505
  • 17:57 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:53 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
  • 17:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 17:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 17:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:37 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010
  • 17:36 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010
  • 17:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009
  • 17:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye
  • 17:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
  • 17:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1009.wikimedia.org with OS bullseye
  • 17:34 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009
  • 17:33 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
  • 17:32 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:32 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:31 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
  • 17:27 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1106
  • 17:26 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1106
  • 17:23 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:22 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1105
  • 17:21 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1105
  • 17:19 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
  • 17:13 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 17:13 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 17:12 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
  • 17:12 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 16:57 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:55 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 16:55 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 16:53 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 16:50 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 16:41 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 16:35 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:33 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1103']
  • 16:31 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 16:28 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 16:28 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 16:27 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
  • 16:27 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
  • 16:26 sukhe: enable puppet on A:dns-rec and force agent run: T348041
  • 16:25 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:24 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:22 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1103']
  • 16:19 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 16:19 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:17 sukhe: disable puppet on A:dns-rec to roll out CR: 965187 T348041
  • 16:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
  • 16:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
  • 16:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
  • 16:14 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
  • 16:13 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
  • 16:12 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:11 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:09 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:09 moritzm: installing batik security updates
  • 16:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
  • 16:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
  • 16:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 15:57 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:56 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 15:56 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
  • 15:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2002.codfw.wmnet
  • 15:48 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1102.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:48 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2002.codfw.wmnet
  • 15:46 moritzm: restart FPM on mediawiki canaries to pick up new libxpm
  • 15:44 moritzm: installing libxpm security updates
  • 15:42 Lucas_WMDE: (mostly?) Finished scap: Backport for specials: Use correct title in NewPagesPager (T348665) (duration: 07m 13s) – scap failed in the purgeMessageBlobStore step (php-fpm-restarts finished)
  • 15:35 lucaswerkmeister-wmde@deploy2002: jforrester and lucaswerkmeister-wmde: Continuing with sync
  • 15:34 lucaswerkmeister-wmde@deploy2002: jforrester and lucaswerkmeister-wmde: Backport for specials: Use correct title in NewPagesPager (T348665) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:33 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for specials: Use correct title in NewPagesPager (T348665)
  • 15:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1007.wikimedia.org with OS bullseye
  • 15:31 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:30 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16591
  • 15:16 lucaswerkmeister-wmde@deploy2002: Backport cancelled.
  • 15:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1007.wikimedia.org with reason: host reimage
  • 15:11 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1007.wikimedia.org with reason: host reimage
  • 15:08 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 15:04 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bookworm
  • 15:00 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2003.codfw.wmnet
  • 15:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16591
  • 14:57 sukhe: stopping gdnsd on dns2006 to simulate bird prefix withdrawal
  • 14:57 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.wikimedia.org with OS bullseye
  • 14:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.wikimedia.org with OS bullseye
  • 14:56 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.wikimedia.org with OS bullseye
  • 14:53 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2003.codfw.wmnet
  • 14:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
  • 14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35008
  • 14:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35008
  • 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
  • 14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12200
  • 14:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28458
  • 14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 28458
  • 14:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 400474
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 400474
  • 14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398196
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398196
  • 14:47 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007 - jclark@cumin1001"
  • 14:46 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007 - jclark@cumin1001"
  • 14:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3267
  • 14:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3267
  • 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 30132
  • 14:45 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
  • 14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 30132
  • 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15703
  • 14:44 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15703
  • 14:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25542
  • 14:42 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25542
  • 14:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
  • 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15435
  • 14:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15435
  • 14:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46562
  • 14:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
  • 14:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46562
  • 14:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6412
  • 14:34 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6412
  • 14:33 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync
  • 14:33 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: sync
  • 14:32 sukhe: completed restarts of pdns-recursor in doh* and dns*
  • 14:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 14:23 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 14:17 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: sync
  • 14:16 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: sync
  • 14:16 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: sync
  • 14:15 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: sync
  • 14:12 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
  • 14:11 urbanecm: mwmaint2002: stop previous instance of `refreshLinkRecommendations` maintenance job (T348719)
  • 14:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 14:04 sukhe: sudo cumin -b1 -s120 'A:dns-rec and not P{dns6002*}' 'systemctl restart pdns-recursor.service'
  • 14:03 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.30 refs T347081
  • 14:00 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
  • 13:50 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:50 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:50 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:49 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:43 sukhe: remove old ns2 IP 91.198.174.239/32 from /e/n/i on A:dns-rec: T329219
  • 13:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 54994
  • 13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 54994
  • 13:35 sukhe: remove redundant 208.80.153.231/32 from /e/n/i on A:dns-rec and A:codfw (superseded by label lo:anycast): T348041
  • 13:34 kartik@deploy2002: Finished scap: Backport for Add Akan language (T333765) (duration: 09m 39s)
  • 13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 139901
  • 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 139901
  • 13:28 kartik@deploy2002: kartik and srishakatux: Continuing with sync
  • 13:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
  • 13:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
  • 13:25 kartik@deploy2002: kartik and srishakatux: Backport for Add Akan language (T333765) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:25 kartik@deploy2002: Started scap: Backport for Add Akan language (T333765)
  • 13:24 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
  • 13:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
  • 13:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
  • 13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40317
  • 13:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 40317
  • 13:18 hashar@deploy2002: Finished scap: Backport for LinkRecommendationUpdater: Update $linkRecommendationTaskType declaration (T348719) (duration: 06m 51s)
  • 13:13 hashar@deploy2002: phuedx and hashar: Continuing with sync
  • 13:13 hashar@deploy2002: phuedx and hashar: Backport for LinkRecommendationUpdater: Update $linkRecommendationTaskType declaration (T348719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:11 hashar@deploy2002: Started scap: Backport for LinkRecommendationUpdater: Update $linkRecommendationTaskType declaration (T348719)
  • 12:26 jayme: re-enable puppet on A:cp - T347544
  • 12:18 jayme: disable puppet on A:cp - T347544
  • 12:16 jayme: disable puppet on A:cp-text - T347544
  • 11:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 11:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 11:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:37 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 11:36 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 11:34 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 11:33 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: testing
  • 11:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: testing
  • 11:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 11:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 11:21 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:20 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 10:52 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: sync
  • 10:51 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: sync
  • 10:50 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
  • 10:49 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
  • 10:26 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
  • 10:26 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
  • 10:26 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
  • 10:15 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
  • 10:13 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
  • 10:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
  • 09:40 fabfur: repooling cp4040 (depooled for T347837 and forgot)
  • 09:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1002.eqiad.wmnet
  • 09:31 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1002.eqiad.wmnet
  • 09:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-master1002.eqiad.wmnet
  • 09:31 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-master1002.eqiad.wmnet
  • 09:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-master1002.eqiad.wmnet with reason: Rebooting misbehaving an-master1002
  • 09:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on an-master1002.eqiad.wmnet with reason: Rebooting misbehaving an-master1002
  • 08:53 hashar@deploy2002: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.41.0-wmf.30" # T347081
  • 08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56099
  • 08:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56099
  • 08:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 38195
  • 08:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38195
  • 08:40 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 38195
  • 08:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38195
  • 08:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 08:38 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 08:38 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 08:38 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 08:35 godog: add 200G to prometheus/ops in eqiad
  • 08:28 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.30 refs T347081
  • 08:15 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 06:59 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Arturo Borrero Gonzalez out of all services on: 2156 hosts
  • 06:58 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Arturo Borrero Gonzalez out of all services on: 2156 hosts
  • 06:46 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 00:09 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1063.eqiad.wmnet with OS bullseye

2023-10-11

  • 23:23 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 23:22 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 23:09 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 23:05 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2003.codfw.wmnet with OS bullseye
  • 22:47 eileen: civicrm upgraded from f2f1e23e to ceaeaa19
  • 22:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 22:18 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:18 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:15 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:15 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir1001.eqiad.wmnet with OS bookworm
  • 21:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 21:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 21:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for apifeatureusage2001.codfw.wmnet,apifeatureusage1001.eqiad.wmnet
  • 21:30 ryankemper@cumin1001: START - Cookbook sre.hosts.remove-downtime for apifeatureusage2001.codfw.wmnet,apifeatureusage1001.eqiad.wmnet
  • 21:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1001.eqiad.wmnet with OS bookworm
  • 21:26 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir1002.eqiad.wmnet with OS bookworm
  • 21:20 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on apifeatureusage2001.codfw.wmnet with reason: reboot T348418
  • 21:20 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on apifeatureusage2001.codfw.wmnet with reason: reboot T348418
  • 21:11 ryankemper: T348418 Rebooting `apifeatureusage1001.eqiad.wmnet`
  • 21:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 21:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 21:06 taavi@deploy2002: Finished scap: Backport for Set WRITE_NEW for CA wikis on OATHAuth multiple devices (T242031) (duration: 10m 33s)
  • 21:01 taavi@deploy2002: taavi: Continuing with sync
  • 20:57 taavi@deploy2002: taavi: Backport for Set WRITE_NEW for CA wikis on OATHAuth multiple devices (T242031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:55 taavi@deploy2002: Started scap: Backport for Set WRITE_NEW for CA wikis on OATHAuth multiple devices (T242031)
  • 20:54 cstone: payments-wiki upgraded from d6ad0376 to aa5cd24d
  • 20:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1002.eqiad.wmnet with OS bookworm
  • 20:45 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:45 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:44 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:43 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir2001.codfw.wmnet with OS bookworm
  • 20:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
  • 20:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
  • 20:19 samtar@deploy2002: Finished scap: Backport for Remove override to allow mobile edit notices to display on all wikis (T316178) (duration: 08m 18s)
  • 20:14 samtar@deploy2002: kemayo and samtar: Continuing with sync
  • 20:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 20:13 samtar@deploy2002: kemayo and samtar: Backport for Remove override to allow mobile edit notices to display on all wikis (T316178) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:12 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 20:11 samtar@deploy2002: Started scap: Backport for Remove override to allow mobile edit notices to display on all wikis (T316178)
  • 20:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 20:09 samtar@deploy2002: Finished scap: Backport for Enable Edit Check on initial partner wikis (T347908) (duration: 07m 32s)
  • 20:07 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:07 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:04 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 20:04 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 20:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir2001.codfw.wmnet with OS bookworm
  • 20:04 samtar@deploy2002: samtar and kemayo: Continuing with sync
  • 20:04 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 20:04 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:03 samtar@deploy2002: samtar and kemayo: Backport for Enable Edit Check on initial partner wikis (T347908) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:03 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:03 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 20:03 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:02 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 20:02 samtar@deploy2002: Started scap: Backport for Enable Edit Check on initial partner wikis (T347908)
  • 20:00 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1104']
  • 20:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
  • 19:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir2002.codfw.wmnet with OS bookworm
  • 19:52 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1103.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:44 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1102.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
  • 19:37 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
  • 19:12 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir2002.codfw.wmnet with OS bookworm
  • 19:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3003.esams.wmnet with OS bookworm
  • 19:08 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1101']
  • 19:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52914 and previous config saved to /var/cache/conftool/dbconfig/20231011-190408-arnaudb.json
  • 18:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stat1011.eqiad.wmnet with OS bullseye
  • 18:49 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P52913 and previous config saved to /var/cache/conftool/dbconfig/20231011-184902-arnaudb.json
  • 18:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3003.esams.wmnet with reason: host reimage
  • 18:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3003.esams.wmnet with reason: host reimage
  • 18:36 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:35 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:35 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P52911 and previous config saved to /var/cache/conftool/dbconfig/20231011-183355-arnaudb.json
  • 18:33 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:32 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:31 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:31 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:24 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:24 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:23 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:23 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stat1011.eqiad.wmnet with reason: host reimage
  • 18:22 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:21 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:21 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:19 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stat1011.eqiad.wmnet with reason: host reimage
  • 18:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52910 and previous config saved to /var/cache/conftool/dbconfig/20231011-181849-arnaudb.json
  • 18:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3003.esams.wmnet with OS bookworm
  • 18:08 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:07 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:07 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:07 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:05 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:56 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3004.esams.wmnet with OS bookworm
  • 17:55 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:47 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:47 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3004.esams.wmnet with reason: host reimage
  • 17:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3004.esams.wmnet with reason: host reimage
  • 17:27 sukhe: repool cp2030 for service=cdn
  • 17:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3004.esams.wmnet with OS bookworm
  • 16:57 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:57 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
  • 16:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host stat1011.eqiad.wmnet with OS bullseye
  • 16:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
  • 16:47 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['stat1011']
  • 16:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1011']
  • 16:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host stat1011.eqiad.wmnet with OS bullseye
  • 16:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
  • 16:43 taavi@deploy2002: Finished scap: Backport for Don't double-escape link contents (T348669) (duration: 07m 35s)
  • 16:38 taavi@deploy2002: taavi: Continuing with sync
  • 16:37 taavi@deploy2002: taavi: Backport for Don't double-escape link contents (T348669) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:36 taavi@deploy2002: Started scap: Backport for Don't double-escape link contents (T348669)
  • 16:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir5001.eqsin.wmnet with OS bookworm
  • 15:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
  • 15:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
  • 15:53 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mw-wikifunctions.discovery.wmnet on codfw recursors
  • 15:53 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache mw-wikifunctions.discovery.wmnet on codfw recursors
  • 15:53 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mw-wikifunctions.discovery.wmnet on eqiad recursors
  • 15:53 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache mw-wikifunctions.discovery.wmnet on eqiad recursors
  • 15:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host stat1011.eqiad.wmnet with OS bullseye
  • 15:52 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
  • 15:25 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 15:25 vgutierrez: depool ncredir5001
  • 15:23 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:22 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:22 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 15:21 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 15:20 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:20 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
  • 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on apt1002.wikimedia.org with reason: setup in progress
  • 15:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on apt1002.wikimedia.org with reason: setup in progress
  • 14:55 jayme: restarting pybal on lvs1019 and lvs2013
  • 14:52 jayme: restarting pybal on lvs1020 and lvs2014
  • 14:49 jayme: running puppet on 'O:lvs::balancer'
  • 14:45 jayme: disabling puppet on 'P{O:lvs::balancer} and (A:codfw or A:eqiad)'
  • 14:28 claime: Running authdns-update - T348631
  • 14:25 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:25 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:25 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:25 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:24 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:23 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:22 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1101']
  • 14:21 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:21 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:21 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:18 moritzm: installing curl security updates on bullseye/bookworm
  • 14:17 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:15 jayme@deploy2002: Finished scap: (no justification provided) (duration: 02m 15s)
  • 14:13 jayme@deploy2002: Started scap: (no justification provided)
  • 14:07 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:06 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Edit check: Simplify "experience" config to "maximumEditcount" (duration: 07m 13s)
  • 14:05 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kemayo: Continuing with sync
  • 14:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kemayo: Backport for Edit check: Simplify "experience" config to "maximumEditcount" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:58 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Edit check: Simplify "experience" config to "maximumEditcount"
  • 13:58 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 13:58 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 13:50 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:45 elukey: restart kube-apiserver on ml-serve-ctrl1002
  • 13:42 elukey: restart kube-apiserver on ml-serve-ctrl1001 as attempt to clear a weird golang/protobuf issue while retrieving secrets
  • 13:40 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:40 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:39 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:39 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:38 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:38 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:37 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:37 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 150552
  • 13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 150552
  • 13:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38628
  • 13:36 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38628
  • 13:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40317
  • 13:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40317
  • 13:34 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38195
  • 13:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38195
  • 13:28 sukhe: disable puppet on P:bird::anycast: T348041
  • 13:28 sukhe: disable puppet on P:bird::anycast
  • 13:27 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 13:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9031
  • 13:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9031
  • 13:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6368
  • 13:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6368
  • 13:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2497
  • 13:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2497
  • 13:24 urandom: starting decommission of restbase2012-a — T328490
  • 13:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 13:23 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:16 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 13:16 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 13:16 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 13:15 jbond@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 13:14 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:14 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:02 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:59 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 12:56 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:55 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:53 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:53 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:52 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:52 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:51 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Cleanup decommissioned services apple-search and graphoid - cgoubert@cumin1001"
  • 12:37 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Cleanup decommissioned services apple-search and graphoid - cgoubert@cumin1001"
  • 12:34 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 12:34 cgoubert@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:33 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 12:16 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:16 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ORES svc records - elukey@cumin1001"
  • 12:15 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ORES svc records - elukey@cumin1001"
  • 12:12 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 12:00 kart_: Updated cxserver to 2023-10-11-114410-production (T341478, T347939)
  • 12:00 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:59 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:58 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:57 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:55 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:54 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:28 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on druid1011.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
  • 11:27 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on druid1011.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
  • 11:12 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:12 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52901 and previous config saved to /var/cache/conftool/dbconfig/20231011-110127-arnaudb.json
  • 11:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 11:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 11:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52900 and previous config saved to /var/cache/conftool/dbconfig/20231011-110105-arnaudb.json
  • 10:52 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:52 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P52899 and previous config saved to /var/cache/conftool/dbconfig/20231011-104558-arnaudb.json
  • 10:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P52898 and previous config saved to /var/cache/conftool/dbconfig/20231011-103052-arnaudb.json
  • 10:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52897 and previous config saved to /var/cache/conftool/dbconfig/20231011-101545-arnaudb.json
  • 10:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 09:52 moritzm: rebuilding RAID after disk replacement T348429
  • 09:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 09:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 09:34 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 09:31 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1001.eqiad.wmnet with OS bullseye
  • 09:23 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:23 jayme@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add VIPs for mw-wikifunction - jayme@cumin1001"
  • 09:23 jayme@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add VIPs for mw-wikifunction - jayme@cumin1001"
  • 09:19 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 09:15 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 08:53 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:44 hashar@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.30 refs T347081 (duration: 06m 00s)
  • 08:38 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.30 refs T347081
  • 08:00 hashar@deploy2002: Synchronized php-1.41.0-wmf.30/skins/Vector: Backports for Vector styling issues T348572 T348530 (duration: 06m 16s)
  • 07:35 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:35 sgimeno@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink backend 15th round of wikis (T308141) (duration: 07m 45s)
  • 07:29 sgimeno@deploy2002: sgimeno: Continuing with sync
  • 07:28 sgimeno@deploy2002: sgimeno: Backport for GrowthExperiments: enable AddLink backend 15th round of wikis (T308141) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:27 sgimeno@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink backend 15th round of wikis (T308141)
  • 07:24 sgimeno@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139) (duration: 09m 05s)
  • 07:19 sgimeno@deploy2002: sgimeno: Continuing with sync
  • 07:17 sgimeno@deploy2002: sgimeno: Backport for GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:15 sgimeno@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139)
  • 05:46 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:45 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:45 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:45 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:44 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:24 kart_: Updated cxserver to 2023-10-11-045323-production (T341478, T344982, T338432, T347939)
  • 05:21 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:21 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:19 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:18 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:11 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:10 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 03:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52896 and previous config saved to /var/cache/conftool/dbconfig/20231011-030054-arnaudb.json
  • 03:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 03:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 03:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T343198)', diff saved to https://phabricator.wikimedia.org/P52895 and previous config saved to /var/cache/conftool/dbconfig/20231011-030032-arnaudb.json
  • 02:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P52894 and previous config saved to /var/cache/conftool/dbconfig/20231011-024526-arnaudb.json
  • 02:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P52893 and previous config saved to /var/cache/conftool/dbconfig/20231011-023019-arnaudb.json
  • 02:18 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T343198)', diff saved to https://phabricator.wikimedia.org/P52892 and previous config saved to /var/cache/conftool/dbconfig/20231011-021513-arnaudb.json
  • 02:03 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:02 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1104
  • 02:01 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1104

2023-10-10

  • 22:45 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir5001.eqsin.wmnet with OS bookworm
  • 22:41 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 22:40 cstone: SmashPig upgraded from a78a91d9 to 211284b9
  • 22:13 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 21:45 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f6-eqiad
  • 21:43 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f6-eqiad
  • 21:34 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
  • 21:33 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir5001.eqsin.wmnet with OS bookworm
  • 20:48 taavi@deploy2002: Finished scap: Backport for Set READ_NEW for CA wikis on OATHAuth multiple devices (T242031) (duration: 08m 24s)
  • 20:43 taavi@deploy2002: taavi: Continuing with sync
  • 20:41 taavi@deploy2002: taavi: Backport for Set READ_NEW for CA wikis on OATHAuth multiple devices (T242031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:40 taavi@deploy2002: Started scap: Backport for Set READ_NEW for CA wikis on OATHAuth multiple devices (T242031)
  • 20:19 hmonroy@deploy2002: Finished scap: Backport for diffs: add line number headings to inline diffs (T346460) (duration: 30m 26s)
  • 20:17 eileen: civicrm upgraded from 4329014b to f2f1e23e
  • 20:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
  • 20:13 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ncredir5001.eqsin.wmnet with OS bookworm
  • 20:07 hmonroy@deploy2002: musikanimal and hmonroy: Continuing with sync
  • 20:07 hmonroy@deploy2002: musikanimal and hmonroy: Backport for diffs: add line number headings to inline diffs (T346460) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:49 hmonroy@deploy2002: Started scap: Backport for diffs: add line number headings to inline diffs (T346460)
  • 19:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T343198)', diff saved to https://phabricator.wikimedia.org/P52890 and previous config saved to /var/cache/conftool/dbconfig/20231010-194311-arnaudb.json
  • 19:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 19:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 19:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52889 and previous config saved to /var/cache/conftool/dbconfig/20231010-194249-arnaudb.json
  • 19:33 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 19:33 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 19:33 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 19:32 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 19:32 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 19:31 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 19:29 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: changing bgp rr config
  • 19:29 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: changing bgp rr config
  • 19:29 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: changing bgp rr config
  • 19:29 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: changing bgp rr config
  • 19:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P52888 and previous config saved to /var/cache/conftool/dbconfig/20231010-192742-arnaudb.json
  • 19:26 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 19:26 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 19:26 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 19:25 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 19:24 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:23 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
  • 19:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P52887 and previous config saved to /var/cache/conftool/dbconfig/20231010-191236-arnaudb.json
  • 18:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52886 and previous config saved to /var/cache/conftool/dbconfig/20231010-185730-arnaudb.json
  • 18:15 bvibber: brion running TimedMediaHandler requeueTranscodes.php batch jobs on mwmaint2002. expect many deletions & new file stores on swift
  • 18:11 ejegg: fundraising python tools upgraded from 2e19cd39 to 0c17296c
  • 18:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: changing bgp rr config
  • 18:09 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: changing bgp rr config
  • 18:07 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: changing bgp rr config
  • 18:06 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: changing bgp rr config
  • 18:01 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:59 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:56 topranks: disable BGP RR_CLIENT peerings on lsw1-e1-eqiad
  • 17:52 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f5-eqiad
  • 17:50 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f5-eqiad
  • 17:46 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e6-eqiad
  • 17:44 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e6-eqiad
  • 17:41 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e5-eqiad
  • 17:39 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e5-eqiad
  • 17:23 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f7-eqiad
  • 17:22 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f7-eqiad
  • 17:21 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e7-eqiad
  • 17:21 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e7-eqiad
  • 17:15 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
  • 17:14 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
  • 17:14 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
  • 17:13 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
  • 16:32 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:18 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
  • 16:17 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
  • 16:14 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:09 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:06 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:03 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:02 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:00 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:00 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:54 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:46 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 15:34 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1100']
  • 15:23 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:23 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:06 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 14:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 14:10 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 14:06 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 14:06 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 14:05 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:05 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 13:58 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 13:57 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:54 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:54 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:52 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 13:52 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:50 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:49 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:48 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:44 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:44 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:40 urbanecm@deploy2002: Finished scap: Backport for Growth: Enable Welcome survey user research for enwiki (T342353) (duration: 13m 19s)
  • 13:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:37 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:36 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:35 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:33 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 13:32 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:32 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:29 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:28 urbanecm@deploy2002: urbanecm: Backport for Growth: Enable Welcome survey user research for enwiki (T342353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 13:26 urbanecm@deploy2002: Started scap: Backport for Growth: Enable Welcome survey user research for enwiki (T342353)
  • 13:26 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:25 urbanecm@deploy2002: Finished scap: Backport for cswiki: Remove engineer group (T348279) (duration: 07m 24s)
  • 13:24 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:24 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:20 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:19 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 13:19 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 13:19 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:19 urbanecm@deploy2002: urbanecm: Backport for cswiki: Remove engineer group (T348279) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:18 urbanecm@deploy2002: Started scap: Backport for cswiki: Remove engineer group (T348279)
  • 13:17 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:17 urbanecm@deploy2002: Finished scap: Backport for growth: Enable section-image recommendations on 10 new wikis (T345940) (duration: 09m 59s)
  • 13:16 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:15 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:11 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 13:08 urbanecm@deploy2002: urbanecm: Backport for growth: Enable section-image recommendations on 10 new wikis (T345940) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:07 urbanecm@deploy2002: Started scap: Backport for growth: Enable section-image recommendations on 10 new wikis (T345940)
  • 13:02 fnegri@cumin1001: START - Cookbook sre.dns.netbox
  • 12:19 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 12:18 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 12:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 12:01 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52885 and previous config saved to /var/cache/conftool/dbconfig/20231010-114024-arnaudb.json
  • 11:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 11:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 11:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T343198)', diff saved to https://phabricator.wikimedia.org/P52884 and previous config saved to /var/cache/conftool/dbconfig/20231010-114002-arnaudb.json
  • 11:33 volans: installed spicerack 7.4.1 on the cumin hosts
  • 11:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.mediawiki.restart-appservers (exit_code=0)
  • 11:32 cgoubert@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
  • 11:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.mediawiki.restart-appservers (exit_code=0)
  • 11:29 cgoubert@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
  • 11:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P52883 and previous config saved to /var/cache/conftool/dbconfig/20231010-112456-arnaudb.json
  • 11:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P52882 and previous config saved to /var/cache/conftool/dbconfig/20231010-110950-arnaudb.json
  • 10:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T343198)', diff saved to https://phabricator.wikimedia.org/P52880 and previous config saved to /var/cache/conftool/dbconfig/20231010-105443-arnaudb.json
  • 10:52 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 10:52 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 09:56 ladsgroup@deploy2002: Finished scap: Backport for Set pagelinks migration stage of cebwiki to write both (T345732) (duration: 09m 10s)
  • 09:50 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 09:48 ladsgroup@deploy2002: ladsgroup: Backport for Set pagelinks migration stage of cebwiki to write both (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:47 ladsgroup@deploy2002: Started scap: Backport for Set pagelinks migration stage of cebwiki to write both (T345732)
  • 09:33 volans: uploaded spicerack_7.4.1 to apt.wikimedia.org bullseye-wikimedia
  • 08:35 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.30 refs T347081
  • 08:24 taavi: wikitech-static: cleanup image archive directory: T348503
  • 08:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T343198)', diff saved to https://phabricator.wikimedia.org/P52879 and previous config saved to /var/cache/conftool/dbconfig/20231010-080924-arnaudb.json
  • 08:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 08:09 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 08:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 08:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 08:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T343198)', diff saved to https://phabricator.wikimedia.org/P52878 and previous config saved to /var/cache/conftool/dbconfig/20231010-080847-arnaudb.json
  • 08:00 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P52877 and previous config saved to /var/cache/conftool/dbconfig/20231010-075340-arnaudb.json
  • 07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P52876 and previous config saved to /var/cache/conftool/dbconfig/20231010-073834-arnaudb.json
  • 07:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T343198)', diff saved to https://phabricator.wikimedia.org/P52875 and previous config saved to /var/cache/conftool/dbconfig/20231010-072327-arnaudb.json
  • 07:19 kostajh: UTC morning deploys done
  • 07:18 kharlan@deploy2002: Finished scap: Backport for ReportIncident: Set developer mode to false (duration: 10m 17s)
  • 07:12 kharlan@deploy2002: kharlan: Continuing with sync
  • 07:09 kharlan@deploy2002: kharlan: Backport for ReportIncident: Set developer mode to false synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:08 kharlan@deploy2002: Started scap: Backport for ReportIncident: Set developer mode to false
  • 06:42 moritzm: installing qemu security updates on bookworm
  • 03:54 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.28 (duration: 02m 08s)
  • 03:52 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.30 refs T347081 (duration: 49m 56s)
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.30 refs T347081

2023-10-09

  • 22:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T343198)', diff saved to https://phabricator.wikimedia.org/P52873 and previous config saved to /var/cache/conftool/dbconfig/20231009-225429-arnaudb.json
  • 22:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 22:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 22:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T343198)', diff saved to https://phabricator.wikimedia.org/P52872 and previous config saved to /var/cache/conftool/dbconfig/20231009-225407-arnaudb.json
  • 22:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P52871 and previous config saved to /var/cache/conftool/dbconfig/20231009-223900-arnaudb.json
  • 22:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P52870 and previous config saved to /var/cache/conftool/dbconfig/20231009-222354-arnaudb.json
  • 22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T343198)', diff saved to https://phabricator.wikimedia.org/P52869 and previous config saved to /var/cache/conftool/dbconfig/20231009-220848-arnaudb.json
  • 20:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1156.eqiad.wmnet
  • 20:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1156.eqiad.wmnet
  • 20:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1155.eqiad.wmnet
  • 20:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1155.eqiad.wmnet
  • 20:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1154.eqiad.wmnet
  • 20:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1154.eqiad.wmnet
  • 20:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1153.eqiad.wmnet
  • 20:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1153.eqiad.wmnet
  • 20:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1152.eqiad.wmnet
  • 20:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1152.eqiad.wmnet
  • 20:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1151.eqiad.wmnet
  • 19:54 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1151.eqiad.wmnet
  • 19:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1150.eqiad.wmnet
  • 19:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1150.eqiad.wmnet
  • 19:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1149.eqiad.wmnet
  • 19:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1149.eqiad.wmnet
  • 19:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1148.eqiad.wmnet
  • 19:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1148.eqiad.wmnet
  • 19:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1147.eqiad.wmnet
  • 19:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T343198)', diff saved to https://phabricator.wikimedia.org/P52868 and previous config saved to /var/cache/conftool/dbconfig/20231009-193219-arnaudb.json
  • 19:32 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1147.eqiad.wmnet
  • 19:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1146.eqiad.wmnet
  • 19:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1146.eqiad.wmnet
  • 19:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1145.eqiad.wmnet
  • 19:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1145.eqiad.wmnet
  • 19:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1144.eqiad.wmnet
  • 19:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1144.eqiad.wmnet
  • 19:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1143.eqiad.wmnet
  • 18:55 ladsgroup@deploy2002: Finished scap: Backport for Update interwiki cache (duration: 100m 07s)
  • 18:54 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1143.eqiad.wmnet
  • 18:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1142.eqiad.wmnet
  • 18:49 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 18:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1142.eqiad.wmnet
  • 18:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1141.eqiad.wmnet
  • 18:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1141.eqiad.wmnet
  • 18:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1140.eqiad.wmnet
  • 18:36 mforns@deploy2002: Finished deploy [airflow-dags/analytics@c334eaf]: (no justification provided) (duration: 01m 12s)
  • 18:35 mforns@deploy2002: Started deploy [airflow-dags/analytics@c334eaf]: (no justification provided)
  • 18:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1140.eqiad.wmnet
  • 18:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1139.eqiad.wmnet
  • 18:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1139.eqiad.wmnet
  • 18:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1138.eqiad.wmnet
  • 18:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1138.eqiad.wmnet
  • 18:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1137.eqiad.wmnet
  • 18:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1137.eqiad.wmnet
  • 18:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1136.eqiad.wmnet
  • 17:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1136.eqiad.wmnet
  • 17:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1135.eqiad.wmnet
  • 17:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1135.eqiad.wmnet
  • 17:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1134.eqiad.wmnet
  • 17:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1134.eqiad.wmnet
  • 17:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1133.eqiad.wmnet
  • 17:35 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1133.eqiad.wmnet
  • 17:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1132.eqiad.wmnet
  • 17:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1132.eqiad.wmnet
  • 17:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1131.eqiad.wmnet
  • 17:24 ladsgroup@deploy2002: ladsgroup: Backport for Update interwiki cache synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1131.eqiad.wmnet
  • 17:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1130.eqiad.wmnet
  • 17:15 ladsgroup@deploy2002: Started scap: Backport for Update interwiki cache
  • 17:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1130.eqiad.wmnet
  • 17:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1129.eqiad.wmnet
  • 17:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1129.eqiad.wmnet
  • 17:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1128.eqiad.wmnet
  • 16:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1128.eqiad.wmnet
  • 16:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1127.eqiad.wmnet
  • 16:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1127.eqiad.wmnet
  • 16:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1126.eqiad.wmnet
  • 16:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1126.eqiad.wmnet
  • 16:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1125.eqiad.wmnet
  • 16:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1125.eqiad.wmnet
  • 16:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1124.eqiad.wmnet
  • 16:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1124.eqiad.wmnet
  • 16:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1123.eqiad.wmnet
  • 16:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1123.eqiad.wmnet
  • 16:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1122.eqiad.wmnet
  • 16:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1122.eqiad.wmnet
  • 16:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1121.eqiad.wmnet
  • 16:11 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:11 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:03 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1121.eqiad.wmnet
  • 16:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1120.eqiad.wmnet
  • 15:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1120.eqiad.wmnet
  • 15:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1119.eqiad.wmnet
  • 15:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1119.eqiad.wmnet
  • 15:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1118.eqiad.wmnet
  • 15:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1118.eqiad.wmnet
  • 15:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1117.eqiad.wmnet
  • 15:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1117.eqiad.wmnet
  • 15:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1116.eqiad.wmnet
  • 15:31 moritzm: installing qemu security updates on bookworm
  • 15:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1116.eqiad.wmnet
  • 15:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1115.eqiad.wmnet
  • 15:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1115.eqiad.wmnet
  • 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1114.eqiad.wmnet
  • 15:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1114.eqiad.wmnet
  • 15:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1113.eqiad.wmnet
  • 15:09 volans: installed spicerack 7.4.0 to cumin2002
  • 15:08 moritzm: installing nftables bugfix updates from Bookworm point release
  • 15:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1113.eqiad.wmnet
  • 15:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1112.eqiad.wmnet
  • 14:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1112.eqiad.wmnet
  • 14:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1111.eqiad.wmnet
  • 14:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1111.eqiad.wmnet
  • 14:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1110.eqiad.wmnet
  • 14:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1110.eqiad.wmnet
  • 14:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1109.eqiad.wmnet
  • 14:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1109.eqiad.wmnet
  • 14:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1108.eqiad.wmnet
  • 14:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1108.eqiad.wmnet
  • 14:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1107.eqiad.wmnet
  • 14:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1107.eqiad.wmnet
  • 14:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1106.eqiad.wmnet
  • 14:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1106.eqiad.wmnet
  • 14:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1105.eqiad.wmnet
  • 14:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1105.eqiad.wmnet
  • 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1104.eqiad.wmnet
  • 13:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 13:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 13:54 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1104.eqiad.wmnet
  • 13:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1103.eqiad.wmnet
  • 13:48 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:48 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:48 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:47 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:46 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1103.eqiad.wmnet
  • 13:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1102.eqiad.wmnet
  • 13:46 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:46 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:46 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:43 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 13:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1102.eqiad.wmnet
  • 13:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
  • 13:35 volans: uploaded spicerack_7.4.0 to apt.wikimedia.org bullseye-wikimedia
  • 13:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
  • 13:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
  • 13:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
  • 13:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
  • 13:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
  • 13:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
  • 13:06 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
  • 13:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
  • 12:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
  • 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1095.eqiad.wmnet
  • 12:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1095.eqiad.wmnet
  • 12:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1094.eqiad.wmnet
  • 12:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1094.eqiad.wmnet
  • 12:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1093.eqiad.wmnet
  • 12:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1093.eqiad.wmnet
  • 12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1092.eqiad.wmnet
  • 12:35 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1092.eqiad.wmnet
  • 12:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1091.eqiad.wmnet
  • 12:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1091.eqiad.wmnet
  • 12:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1090.eqiad.wmnet
  • 12:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1090.eqiad.wmnet
  • 12:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1089.eqiad.wmnet
  • 12:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1089.eqiad.wmnet
  • 12:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1088.eqiad.wmnet
  • 12:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1088.eqiad.wmnet
  • 12:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1087.eqiad.wmnet
  • 12:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1087.eqiad.wmnet
  • 12:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1086.eqiad.wmnet
  • 11:51 godog: restart k8s-aux in eqiad to pick up new certs - T343529
  • 11:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1086.eqiad.wmnet
  • 11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1085.eqiad.wmnet
  • 11:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1085.eqiad.wmnet
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
  • 11:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
  • 11:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1083.eqiad.wmnet
  • 11:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1083.eqiad.wmnet
  • 11:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1082.eqiad.wmnet
  • 11:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1082.eqiad.wmnet
  • 11:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1081.eqiad.wmnet
  • 11:06 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1081.eqiad.wmnet
  • 11:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet
  • 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 11:00 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet
  • 11:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1079.eqiad.wmnet
  • 10:59 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 10:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 10:53 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1079.eqiad.wmnet
  • 10:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1078.eqiad.wmnet
  • 10:50 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:48 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1078.eqiad.wmnet
  • 10:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1077.eqiad.wmnet
  • 10:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1077.eqiad.wmnet
  • 10:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1076.eqiad.wmnet
  • 10:29 moritzm: installing Linux 6.1.55 on Bookworm hosts
  • 10:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1076.eqiad.wmnet
  • 10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1075.eqiad.wmnet
  • 10:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1075.eqiad.wmnet
  • 10:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1074.eqiad.wmnet
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
  • 10:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1074.eqiad.wmnet
  • 10:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1073.eqiad.wmnet
  • 10:10 ladsgroup@deploy2002: Finished scap: Backport for Set virtual domain mapping for url shortener (T330590) (duration: 15m 35s)
  • 10:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
  • 10:05 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 10:04 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1002:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.40.0-wmf.17/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.40.0-wmf.17/cache/l10n/ /srv/mediawiki/php-1.40.0-wmf.17/cache/ /srv/mediawiki/php-1.40.0-wmf.17/ # clean up old l10n cache'
  • 10:03 ladsgroup@deploy2002: ladsgroup: Backport for Set virtual domain mapping for url shortener (T330590) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:55 ladsgroup@deploy2002: Started scap: Backport for Set virtual domain mapping for url shortener (T330590)
  • 09:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1073.eqiad.wmnet
  • 09:49 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host analytics1072.eqiad.wmnet
  • 09:07 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1072.eqiad.wmnet
  • 09:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1071.eqiad.wmnet
  • 09:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1071.eqiad.wmnet
  • 09:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1070.eqiad.wmnet
  • 08:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1070.eqiad.wmnet
  • 08:53 moritzm: rebuilt bookworm d-i image for the Bookworm 12.2 point release T348326
  • 08:23 moritzm: rebuilt bullseye d-i image for the Bullseye 11.9 point release T348327
  • 07:06 taavi: kill stuck updateSpecialPages.php process on mwmaint2002 which was trying to re-connect to an unreachable db host
  • 07:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on db2109.codfw.wmnet with reason: investigating db2109
  • 07:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on db2109.codfw.wmnet with reason: investigating db2109

2023-10-08

  • 22:58 ryankemper: [WDQS] Depooled `wdqs1014` while it catches up on a day of lag
  • 22:57 ryankemper: [WDQS] Restarted `wdqs1014`; blazegraph has been deadlocked since `2023-10-07 12:30:00`

2023-10-07

  • 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T343198)', diff saved to https://phabricator.wikimedia.org/P52863 and previous config saved to /var/cache/conftool/dbconfig/20231007-092249-arnaudb.json
  • 09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P52862 and previous config saved to /var/cache/conftool/dbconfig/20231007-090742-arnaudb.json
  • 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P52861 and previous config saved to /var/cache/conftool/dbconfig/20231007-085236-arnaudb.json
  • 08:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T343198)', diff saved to https://phabricator.wikimedia.org/P52860 and previous config saved to /var/cache/conftool/dbconfig/20231007-083729-arnaudb.json
  • 02:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1030.eqiad.wmnet
  • 02:33 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1030.eqiad.wmnet

2023-10-06

  • 23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2054.codfw.wmnet with OS bullseye
  • 23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2054.codfw.wmnet with reason: host reimage
  • 22:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2054.codfw.wmnet with reason: host reimage
  • 22:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T343198)', diff saved to https://phabricator.wikimedia.org/P52859 and previous config saved to /var/cache/conftool/dbconfig/20231006-224306-arnaudb.json
  • 22:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 22:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 22:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52858 and previous config saved to /var/cache/conftool/dbconfig/20231006-224245-arnaudb.json
  • 22:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P52857 and previous config saved to /var/cache/conftool/dbconfig/20231006-222738-arnaudb.json
  • 22:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2054.codfw.wmnet with OS bullseye
  • 22:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P52856 and previous config saved to /var/cache/conftool/dbconfig/20231006-221232-arnaudb.json
  • 21:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52855 and previous config saved to /var/cache/conftool/dbconfig/20231006-215725-arnaudb.json
  • 20:45 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:45 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:35 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:34 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:29 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:29 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:11 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:10 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:46 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 19:45 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 19:44 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:43 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:43 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:41 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 19:40 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 19:39 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 18:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3b7df78]: Update rdf-spark-tools to 0.3.135 to fix query mapping job failure (duration: 00m 29s)
  • 18:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3b7df78]: Update rdf-spark-tools to 0.3.135 to fix query mapping job failure
  • 18:42 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:32 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:31 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:31 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:30 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1101
  • 18:30 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1101
  • 17:10 pt1979@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
  • 17:10 pt1979@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 17:08 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 17:08 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100.eqiad.wmnet']
  • 17:05 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
  • 17:05 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 17:03 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
  • 17:03 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 17:02 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
  • 17:02 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
  • 16:54 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:41 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:37 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:28 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:27 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 16:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 16:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 16:13 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 14:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 14:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 14:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2054.codfw.wmnet with OS bullseye
  • 14:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-master1003.eqiad.wmnet with OS bullseye
  • 14:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1003.eqiad.wmnet with reason: host reimage
  • 14:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:22 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1003.eqiad.wmnet with reason: host reimage
  • 14:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-master1004.eqiad.wmnet with OS bullseye
  • 13:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 13:53 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:52 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2054.codfw.wmnet with OS bullseye
  • 13:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 13:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 13:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "ganeti-test2004 - ayounsi@cumin1001"
  • 13:26 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "ganeti-test2004 - ayounsi@cumin1001"
  • 13:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 13:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1004.eqiad.wmnet with reason: host reimage
  • 13:18 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1004.eqiad.wmnet with reason: host reimage
  • 13:17 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:17 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:03 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 13:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 12:29 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:29 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52852 and previous config saved to /var/cache/conftool/dbconfig/20231006-122022-arnaudb.json
  • 12:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52851 and previous config saved to /var/cache/conftool/dbconfig/20231006-122000-arnaudb.json
  • 12:17 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:15 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:15 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:15 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:14 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:13 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 12:13 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 12:11 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:10 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P52850 and previous config saved to /var/cache/conftool/dbconfig/20231006-120454-arnaudb.json
  • 12:02 moritzm: rebalancing ganeti row D/eqiad
  • 11:55 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 11:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P52848 and previous config saved to /var/cache/conftool/dbconfig/20231006-114947-arnaudb.json
  • 11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52847 and previous config saved to /var/cache/conftool/dbconfig/20231006-113441-arnaudb.json
  • 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 10:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 10:21 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2023.codfw.wmnet to cluster codfw and group A
  • 10:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2023.codfw.wmnet to cluster codfw and group A
  • 10:13 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 10:13 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host apt1002.wikimedia.org
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apt1002.wikimedia.org with OS bookworm
  • 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apt1002.wikimedia.org with reason: host reimage
  • 09:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on apt1002.wikimedia.org with reason: host reimage
  • 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host apt1002.wikimedia.org with OS bookworm
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt1002.wikimedia.org - jmm@cumin2002"
  • 09:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt1002.wikimedia.org - jmm@cumin2002"
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt1002.wikimedia.org on all recursors
  • 09:26 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache apt1002.wikimedia.org on all recursors
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt1002.wikimedia.org - jmm@cumin2002"
  • 09:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt1002.wikimedia.org - jmm@cumin2002"
  • 09:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:22 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host apt1002.wikimedia.org
  • 09:19 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host apt2002.wikimedia.org
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt2002.wikimedia.org on all recursors
  • 09:19 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache apt2002.wikimedia.org on all recursors
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM apt2002.wikimedia.org - jmm@cumin2002"
  • 09:18 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM apt2002.wikimedia.org - jmm@cumin2002"
  • 09:11 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt2002.wikimedia.org on all recursors
  • 09:11 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache apt2002.wikimedia.org on all recursors
  • 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt2002.wikimedia.org - jmm@cumin2002"
  • 09:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt2002.wikimedia.org - jmm@cumin2002"
  • 09:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:05 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host apt2002.wikimedia.org
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 09:03 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 09:03 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 08:43 moritzm: installing vim security updates
  • 08:26 elukey@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:24 elukey@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2023.codfw.wmnet to cluster codfw and group A
  • 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2023.codfw.wmnet to cluster codfw and group A
  • 08:18 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
  • 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
  • 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2023.codfw.wmnet with OS bullseye
  • 07:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2023.codfw.wmnet with reason: host reimage
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2023.codfw.wmnet with reason: host reimage
  • 06:53 moritzm: installing bind9 security updates (client side libs/tools only)
  • 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2023.codfw.wmnet with OS bullseye
  • 02:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52843 and previous config saved to /var/cache/conftool/dbconfig/20231006-020509-arnaudb.json
  • 02:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 02:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 02:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52842 and previous config saved to /var/cache/conftool/dbconfig/20231006-020447-arnaudb.json
  • 01:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P52841 and previous config saved to /var/cache/conftool/dbconfig/20231006-014941-arnaudb.json
  • 01:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P52840 and previous config saved to /var/cache/conftool/dbconfig/20231006-013434-arnaudb.json
  • 01:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52839 and previous config saved to /var/cache/conftool/dbconfig/20231006-011928-arnaudb.json
  • 00:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 00:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 00:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 00:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti-test2004.codfw.wmnet with OS bullseye

2023-10-05

  • 23:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 23:22 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-master1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 23:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:59 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 22:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:37 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 21:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 21:17 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2020.codfw.wmnet: Maybe cleanup leaked file descriptors(?) - eevans@cumin1001
  • 21:07 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2020.codfw.wmnet: Maybe cleanup leaked file descriptors(?) - eevans@cumin1001
  • 21:03 thcipriani@deploy2002: Finished scap: Backport for [foundationwiki] Add Endowment, Agenda, Committee, and Memory namespaces (T347762 T347822 T348268), [foundationwiki] Provide 'translationadmin' group with 'edit-legal' right (T346187) (duration: 09m 56s)
  • 20:58 thcipriani@deploy2002: thcipriani and varnent: Continuing with sync
  • 20:57 eileen: civicrm upgraded from 05545fbc to 4329014b
  • 20:55 thcipriani@deploy2002: thcipriani and varnent: Backport for [foundationwiki] Add Endowment, Agenda, Committee, and Memory namespaces (T347762 T347822 T348268), [foundationwiki] Provide 'translationadmin' group with 'edit-legal' right (T346187) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:54 thcipriani@deploy2002: Started scap: Backport for [foundationwiki] Add Endowment, Agenda, Committee, and Memory namespaces (T347762 T347822 T348268), [foundationwiki] Provide 'translationadmin' group with 'edit-legal' right (T346187)
  • 20:49 thcipriani@deploy2002: Finished scap: Backport for [Prototype] Add screen resolution to Typography prototype, [Prototype] Edit project link page on reading prototype (duration: 23m 57s)
  • 20:39 thcipriani@deploy2002: jdrewniak and thcipriani: Continuing with sync
  • 20:37 thcipriani@deploy2002: jdrewniak and thcipriani: Backport for [Prototype] Add screen resolution to Typography prototype, [Prototype] Edit project link page on reading prototype synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:25 thcipriani@deploy2002: Started scap: Backport for [Prototype] Add screen resolution to Typography prototype, [Prototype] Edit project link page on reading prototype
  • 20:22 thcipriani@deploy2002: Finished scap: Backport for Enable Minerva site notice for Nepali Wikipedia (newiki) (T347814) (duration: 08m 57s)
  • 20:16 thcipriani@deploy2002: ammarpad and thcipriani: Continuing with sync
  • 20:14 thcipriani@deploy2002: ammarpad and thcipriani: Backport for Enable Minerva site notice for Nepali Wikipedia (newiki) (T347814) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 20:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 20:13 thcipriani@deploy2002: Started scap: Backport for Enable Minerva site notice for Nepali Wikipedia (newiki) (T347814)
  • 18:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 18:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti-test2004']
  • 18:47 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test2004']
  • 18:45 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 18:34 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.29 refs T347080
  • 18:17 sukhe: running authdns-update: T347054
  • 18:15 jhuneidi@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.29 refs T347080 (duration: 06m 12s)
  • 18:08 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.29 refs T347080
  • 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti-test2004']
  • 17:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test2004']
  • 17:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:57 bvibber: scaling back batch jobs for T312153 and T312152, will run these in further chunks as the new config rolls out
  • 16:47 bvibber: brion running requeueTranscodes.php on mwmaint2002 for VP9 transcode cleanup for T312153
  • 16:22 volans: installed 7.3.1 on cumin1001
  • 16:19 jbond@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling reboot on A:puppetboard
  • 16:15 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
  • 16:15 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
  • 16:12 dcausse: cleaning up rdf-streaming-updater-staging swift bucket
  • 16:11 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
  • 16:10 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
  • 16:10 jbond@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling reboot on A:puppetboard
  • 16:10 jbond@cumin2002: END (ERROR) - Cookbook sre.puppet.renew-cert (exit_code=97) for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
  • 16:09 jbond@cumin2002: START - Cookbook sre.puppet.renew-cert for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
  • 16:07 cgoubert@deploy2002: Finished scap: Testing mw-on-k8s deployment for T348228 (duration: 02m 15s)
  • 16:06 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
  • 16:05 cgoubert@deploy2002: Started scap: Testing mw-on-k8s deployment for T348228
  • 16:05 jbond@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling reboot on A:puppetboard
  • 16:05 jbond@cumin2002: START - Cookbook sre.puppet.renew-cert for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
  • 16:01 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
  • 16:01 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
  • 16:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52837 and previous config saved to /var/cache/conftool/dbconfig/20231005-160030-arnaudb.json
  • 16:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 16:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 16:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52836 and previous config saved to /var/cache/conftool/dbconfig/20231005-160009-arnaudb.json
  • 15:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 15:37 volans: installed 7.3.1 on cumin2002
  • 15:36 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 15:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 15:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster2001.codfw.wmnet
  • 15:30 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster2001.codfw.wmnet
  • 15:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P52834 and previous config saved to /var/cache/conftool/dbconfig/20231005-152956-arnaudb.json
  • 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2023.codfw.wmnet with reason: reimage to bullseye
  • 15:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2023.codfw.wmnet with reason: reimage to bullseye
  • 15:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster2001.codfw.wmnet with reason: Pick up vcpu change
  • 15:25 claime: rebooting kubemaster2001.codfw.wmnet - T348228
  • 15:25 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster2001.codfw.wmnet with reason: Pick up vcpu change
  • 15:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster2002.codfw.wmnet
  • 15:24 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster2002.codfw.wmnet
  • 15:20 claime: rebooting kubemaster2002.codfw.wmnet - T348228
  • 15:20 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster2002.codfw.wmnet with reason: Pick up vcpu change
  • 15:19 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster2002.codfw.wmnet with reason: Pick up vcpu change
  • 15:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd2004.codfw.wmnet
  • 15:16 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd2004.codfw.wmnet
  • 15:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52832 and previous config saved to /var/cache/conftool/dbconfig/20231005-151450-arnaudb.json
  • 15:13 claime: rebooting kubetcd2004.codfw.wmnet - T348228
  • 15:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd2004.codfw.wmnet with reason: Pick up vcpu change
  • 15:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2023.codfw.wmnet
  • 15:12 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd2004.codfw.wmnet with reason: Pick up vcpu change
  • 15:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd2005.codfw.wmnet
  • 15:10 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd2005.codfw.wmnet
  • 15:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd2005.codfw.wmnet with reason: Pick up vcpu change
  • 15:09 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd2005.codfw.wmnet with reason: Pick up vcpu change
  • 15:09 claime: rebooting kubetcd2005.codfw.wmnet - T348228
  • 15:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd2006.codfw.wmnet
  • 15:08 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd2006.codfw.wmnet
  • 15:07 claime: rebooting kubetcd2006.codfw.wmnet - T348228
  • 15:07 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd2006.codfw.wmnet with reason: Pick up vcpu change
  • 15:07 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd2006.codfw.wmnet with reason: Pick up vcpu change
  • 15:06 claime: Bumping kubetcd200[4-6].eqiad.wmnet vcpu to 2 - T348228
  • 15:04 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster1001.eqiad.wmnet
  • 15:03 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster1001.eqiad.wmnet
  • 15:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
  • 15:03 claime: rebooting kubemaster1001.eqiad.wmnet - T348228
  • 15:03 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster1001.eqiad.wmnet with reason: Pick up vcpu change
  • 14:59 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster1001.eqiad.wmnet with reason: Pick up vcpu change
  • 14:57 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster1002.eqiad.wmnet
  • 14:57 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster1002.eqiad.wmnet
  • 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 14:53 claime: rebooting kubemaster1002.eqiad.wmnet - T348228
  • 14:53 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster1002.eqiad.wmnet with reason: Pick up vcpu change
  • 14:53 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster1002.eqiad.wmnet with reason: Pick up vcpu change
  • 14:52 claime: Bumping kubemaster100[1-2].eqiad.wmnet vcpu to 2, ram to 4G - T348228
  • 14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd1004.eqiad.wmnet
  • 14:50 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd1004.eqiad.wmnet
  • 14:47 claime: rebooting kubetcd1004.eqiad.wmnet - T348228
  • 14:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd1004.eqiad.wmnet with reason: Pick up vcpu change
  • 14:47 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd1004.eqiad.wmnet with reason: Pick up vcpu change
  • 14:46 claime: rebooted kubetcd1005.eqiad.wmnet - T348228
  • 14:46 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd1005.eqiad.wmnet
  • 14:46 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd1005.eqiad.wmnet
  • 14:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd1005.eqiad.wmnet with reason: Pick up vcpu change
  • 14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd1005.eqiad.wmnet with reason: Pick up vcpu change
  • 14:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd1006.eqiad.wmnet
  • 14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd1006.eqiad.wmnet
  • 14:41 claime: rebooting kubetcd1006.eqiad.wmnet - T348228
  • 14:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd1006.eqiad.wmnet with reason: Pick up vcpu change
  • 14:41 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd1006.eqiad.wmnet with reason: Pick up vcpu change
  • 14:38 claime: Bumping kubetcd100[4-6].eqiad.wmnet vcpu to 2 - T348228
  • 14:38 claime: Bumping kubectd100[4-6].eqiad.wmnet vcpu to 2 - T348228
  • 14:33 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:33 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 14:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti-test2004']
  • 14:22 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test2004']
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:18 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Revert "Use HookHandlers for core hooks" (T348181) (duration: 08m 50s)
  • 14:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:09 lucaswerkmeister-wmde@deploy2002: umherirrender and lucaswerkmeister-wmde: Continuing with sync
  • 14:09 lucaswerkmeister-wmde@deploy2002: umherirrender and lucaswerkmeister-wmde: Backport for Revert "Use HookHandlers for core hooks" (T348181) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 14:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 14:08 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Revert "Use HookHandlers for core hooks" (T348181)
  • 14:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 14:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 14:04 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 13:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 13:49 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Drop old VP8 video transcodes, enable HLS on testwiki (T312152 T309823) (duration: 12m 07s)
  • 13:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:44 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:42 lucaswerkmeister-wmde@deploy2002: brion and lucaswerkmeister-wmde: Continuing with sync
  • 13:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 13:38 lucaswerkmeister-wmde@deploy2002: brion and lucaswerkmeister-wmde: Backport for Drop old VP8 video transcodes, enable HLS on testwiki (T312152 T309823) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:36 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Drop old VP8 video transcodes, enable HLS on testwiki (T312152 T309823)
  • 13:36 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:36 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 13:32 urandom: starting Cassandra rebuild, restbase1030-c — T346803
  • 13:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 13:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 13:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
  • 13:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 13:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
  • 13:14 urbanecm@deploy2002: Finished scap: Backport for [Growth] enwiki: Enable mentorship for 50% of new users (T341399) (duration: 10m 08s)
  • 13:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1002.eqiad.wmnet
  • 13:08 claime: respawning two misbehaving thumbor pods in codfw
  • 13:08 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 13:05 urbanecm@deploy2002: urbanecm: Backport for [Growth] enwiki: Enable mentorship for 50% of new users (T341399) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
  • 13:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
  • 13:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host matomo1002.eqiad.wmnet
  • 13:04 urbanecm@deploy2002: Started scap: Backport for [Growth] enwiki: Enable mentorship for 50% of new users (T341399)
  • 12:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
  • 12:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 12:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 12:51 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:50 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1001.wikimedia.org
  • 12:42 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 12:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1001.wikimedia.org
  • 12:38 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 12:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet
  • 12:27 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:27 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 12:27 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts puppetboard2002.codfw.wmnet,puppetboard1002.eqiad.wmnet
  • 12:27 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:26 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 12:24 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:22 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:13 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet
  • 12:10 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2002.codfw.wmnet,puppetboard1002.eqiad.wmnet
  • 12:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1063']
  • 12:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
  • 12:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1063']
  • 12:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1064']
  • 12:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1064']
  • 12:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
  • 12:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 12:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 12:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 11:57 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet2005-dev
  • 11:57 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2005-dev
  • 11:46 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab1004.wikimedia.org to gitlab2002.wikimedia.org
  • 11:36 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 11:36 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:24 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 11:24 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 11:23 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 11:23 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 11:23 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 11:23 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
  • 10:23 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet
  • 10:23 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:23 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 10:21 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 10:16 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 10:09 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet
  • 10:09 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores1001.eqiad.wmnet
  • 10:09 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:09 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 10:08 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 10:00 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 09:59 moritzm: installing python2.7 security updates
  • 09:55 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores1001.eqiad.wmnet
  • 09:01 jelto@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab1004.wikimedia.org to gitlab2002.wikimedia.org
  • 07:59 moritzm: installing jetty9 security updates
  • 07:51 godog: bounce vopsbot on alert1001
  • 05:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52831 and previous config saved to /var/cache/conftool/dbconfig/20231005-055637-arnaudb.json
  • 05:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 05:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 05:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T343198)', diff saved to https://phabricator.wikimedia.org/P52830 and previous config saved to /var/cache/conftool/dbconfig/20231005-055615-arnaudb.json
  • 05:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P52829 and previous config saved to /var/cache/conftool/dbconfig/20231005-054109-arnaudb.json
  • 05:29 denisse: Deleting old Jenkins builds on pcc-worker1002 to free disk space
  • 05:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P52828 and previous config saved to /var/cache/conftool/dbconfig/20231005-052602-arnaudb.json
  • 05:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T343198)', diff saved to https://phabricator.wikimedia.org/P52827 and previous config saved to /var/cache/conftool/dbconfig/20231005-051056-arnaudb.json
  • 02:50 eileen: civicrm upgraded from 44800fc0 to 05545fbc

2023-10-04

  • 23:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 23:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 23:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 22:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 22:40 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 22:39 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 22:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 22:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 22:21 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 22:18 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 22:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2054.codfw.wmnet with OS bullseye
  • 22:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 22:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 22:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 22:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 22:05 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 22:02 urandom: starting Cassandra rebuild, restbase1030-b — T346803
  • 22:02 brennen@deploy2002: Finished scap: Backport for Revert "Deprecate TOC mutation in OutputPageParserOutput hook" (T348134) (duration: 09m 13s)
  • 21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 21:58 volans: uploaded spicerack_7.3.1 to apt.wikimedia.org bullseye-wikimedia
  • 21:56 brennen@deploy2002: brennen and ssastry: Continuing with sync
  • 21:54 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 21:54 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:54 brennen@deploy2002: brennen and ssastry: Backport for Revert "Deprecate TOC mutation in OutputPageParserOutput hook" (T348134) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:53 brennen@deploy2002: Started scap: Backport for Revert "Deprecate TOC mutation in OutputPageParserOutput hook" (T348134)
  • 21:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 21:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 21:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 21:40 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 21:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 21:34 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 21:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
  • 21:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
  • 21:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 20:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2054.codfw.wmnet with OS bullseye
  • 20:54 urbanecm@deploy2002: Finished scap: Backport for SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), Fix phan for GrowthExperiments (T347571) (duration: 07m 49s)
  • 20:48 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 20:48 urbanecm@deploy2002: urbanecm: Backport for SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), Fix phan for GrowthExperiments (T347571) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:46 urbanecm@deploy2002: Started scap: Backport for SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), Fix phan for GrowthExperiments (T347571)
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 20:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 20:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 20:21 eileen: re-enable process control (more better hopefully) config revision changed from 89231b1b to d66626f6
  • 19:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T343198)', diff saved to https://phabricator.wikimedia.org/P52826 and previous config saved to /var/cache/conftool/dbconfig/20231004-195023-arnaudb.json
  • 19:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 19:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 19:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 19:49 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 19:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T343198)', diff saved to https://phabricator.wikimedia.org/P52825 and previous config saved to /var/cache/conftool/dbconfig/20231004-194946-arnaudb.json
  • 19:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P52824 and previous config saved to /var/cache/conftool/dbconfig/20231004-193439-arnaudb.json
  • 19:33 eileen: config revision changed from 89231b1b to d66626f6
  • 19:30 eileen: civicrm upgraded from 169c3288 to 44800fc0
  • 19:29 eileen: config revision changed from 4ae7bd71 to 89231b1b
  • 19:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P52823 and previous config saved to /var/cache/conftool/dbconfig/20231004-191933-arnaudb.json
  • 19:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 19:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T343198)', diff saved to https://phabricator.wikimedia.org/P52822 and previous config saved to /var/cache/conftool/dbconfig/20231004-190427-arnaudb.json
  • 18:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 18:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 18:19 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.29 refs T347080
  • 18:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 18:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 18:09 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.29 refs T347080
  • 17:52 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 17:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testreduce1002.eqiad.wmnet
  • 17:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 17:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testreduce1002.eqiad.wmnet
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1065']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1064']
  • 17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
  • 17:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1062']
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 17:22 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:22 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 17:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 17:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 16:59 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963326 (T347837). `purged` daemon will be restarted by puppet in esams in the next 30m
  • 16:54 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 16:49 taavi: taavi@mwmaint2002 ~ $ mwscript extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php metawiki | tee T242031-sul.log # T242031
  • 16:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 16:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 16:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1067']
  • 16:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1066']
  • 16:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
  • 16:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
  • 16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
  • 16:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
  • 16:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
  • 16:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bullseye
  • 16:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
  • 16:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
  • 16:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
  • 16:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1063']
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1065']
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1064']
  • 16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1062']
  • 16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1065']
  • 16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1064']
  • 16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
  • 16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1062']
  • 16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
  • 16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
  • 16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
  • 16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
  • 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 15:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 15:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:45 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1062.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1065.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1066.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:39 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 15:37 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:32 hashar@deploy2002: Finished deploy [integration/docroot@b3b712f]: (no justification provided) (duration: 00m 06s)
  • 15:32 hashar@deploy2002: Started deploy [integration/docroot@b3b712f]: (no justification provided)
  • 15:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 15:17 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 15:13 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1066.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1065.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1062.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudvirt1062-67 - jclark@cumin1001"
  • 15:07 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudvirt1062-67 - jclark@cumin1001"
  • 15:05 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 14:59 taavi: revoke a bot password, https://phabricator.wikimedia.org/T348132
  • 14:56 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 14:39 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ores[2001-2004].codfw.wmnet
  • 14:39 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores[2001-2004].codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:38 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ores[1002-1009].eqiad.wmnet
  • 14:38 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 Lucas_WMDE: spontaneously extended UTC afternoon backport+config window done now
  • 14:37 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:36 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores[2001-2004].codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:34 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for prod: Enable wgCampaignEventsEnableEmail in meta and officewiki (T347065) (duration: 18m 26s)
  • 14:25 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Continuing with sync
  • 14:24 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963321 (T347837). `purged` daemon will be restarted by puppet in drmrs in the next 30m
  • 14:22 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 14:22 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores[2001-2004].codfw.wmnet
  • 14:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2009.codfw.wmnet
  • 14:21 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:21 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:20 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:18 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores[1002-1009].eqiad.wmnet
  • 14:18 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:18 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2006.codfw.wmnet
  • 14:17 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:17 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2006.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2005.codfw.wmnet
  • 14:17 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:17 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2006.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:16 urandom: starting Cassandra rebuild, restbase1030-a — T346803
  • 14:16 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2007.codfw.wmnet
  • 14:16 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:16 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2007.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:15 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Backport for prod: Enable wgCampaignEventsEnableEmail in meta and officewiki (T347065) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:13 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2007.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:12 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2009.codfw.wmnet
  • 14:12 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for prod: Enable wgCampaignEventsEnableEmail in meta and officewiki (T347065)
  • 14:10 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:10 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2005.codfw.wmnet
  • 14:09 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ores2008.codfw.wmnet
  • 14:09 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:08 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:08 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2006.codfw.wmnet
  • 14:07 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
  • 14:05 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2007.codfw.wmnet
  • 14:04 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 14:00 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for beta: Explicitly assign campaignevents-email-participants to all users (T336939), metawiki: Restrict campaignevents-email-participants right (T336939) (duration: 10m 40s)
  • 13:57 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2008.codfw.wmnet
  • 13:54 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Continuing with sync
  • 13:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 13:51 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Backport for beta: Explicitly assign campaignevents-email-participants to all users (T336939), metawiki: Restrict campaignevents-email-participants right (T336939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:49 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for beta: Explicitly assign campaignevents-email-participants to all users (T336939), metawiki: Restrict campaignevents-email-participants right (T336939)
  • 13:47 Lucas_WMDE: mwscript namespaceDupes fonwiki --fix # T347939 – 0 pages to fix, 0 resolvable; 0 links to fix, 0 resolvable, 0 deleted
  • 13:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for fonwiki: add wgSiteName, wgMetaNamespace and timezone (T347939) (duration: 13m 46s)
  • 13:34 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Backport for fonwiki: add wgSiteName, wgMetaNamespace and timezone (T347939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:33 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for fonwiki: add wgSiteName, wgMetaNamespace and timezone (T347939)
  • 13:27 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963147 (T347837). `purged` daemon will be restarted by puppet in eqiad in the next 30m
  • 13:25 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 13:25 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 13:24 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 13:24 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 13:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 13:23 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 13:20 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for fonwiki: add logos (T347939) (duration: 11m 43s)
  • 13:19 rook@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
  • 13:14 urandom: Cassandra bootstrap, restbase1030-a (`auto_bootstrap: false`) — T346803
  • 13:14 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and anzx: Continuing with sync
  • 13:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and anzx: Backport for fonwiki: add logos (T347939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:09 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for fonwiki: add logos (T347939)
  • 13:03 rook@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 13:00 rook@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 12:56 klausman: powering off orespoolcounter{1004,2003,2004}.{eqiad,codfw}.wmnet (1003 is kept powered-on in case we need access to files from the old install). The machines have a 90d downtime already put in.
  • 12:53 klausman: powering off ores200{2..9}.codfw.wmnet (2001 is kept powered-on in case we need access to files from the old install). The machines have a 90d downtime already put in.
  • 12:51 klausman: powering off ores100{2..9}.eqiad.wmnet (1001 is kept powered-on in case we need access to files from the old install). The machines have a 90d downtime already put in
  • 12:46 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 90 days, 0:00:00 on 22 hosts with reason: Downtime for graceful shutdown and later decom
  • 12:46 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 90 days, 0:00:00 on 22 hosts with reason: Downtime for graceful shutdown and later decom
  • 12:43 rook@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
  • 11:45 moritzm: installing exim4 security updates
  • 11:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 11:30 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:20 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
  • 11:14 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
  • 11:14 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Zsoo out of all services on: 2175 hosts
  • 11:02 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Zsoo out of all services on: 2175 hosts
  • 10:58 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 10:29 filippo@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['thanos-fe2004']
  • 10:29 filippo@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2004']
  • 10:21 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:20 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 10:20 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:20 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:20 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:20 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 10:20 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 10:02 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 10:02 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T343198)', diff saved to https://phabricator.wikimedia.org/P52817 and previous config saved to /var/cache/conftool/dbconfig/20231004-094320-arnaudb.json
  • 09:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 09:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T343198)', diff saved to https://phabricator.wikimedia.org/P52816 and previous config saved to /var/cache/conftool/dbconfig/20231004-094258-arnaudb.json
  • 09:39 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 09:39 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 09:39 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 09:39 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 09:39 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 09:38 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 09:38 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 09:38 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 09:38 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 09:38 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 09:38 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 09:38 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 09:38 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 09:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 09:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 09:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 09:37 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:35 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 09:33 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging TsepoThoabala out of all services on: 2175 hosts
  • 09:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P52815 and previous config saved to /var/cache/conftool/dbconfig/20231004-092752-arnaudb.json
  • 09:27 jmm@cumin2002: START - Cookbook sre.idm.logout Logging TsepoThoabala out of all services on: 2175 hosts
  • 09:26 sg912@deploy2002: Finished deploy [airflow-dags/analytics@3b374a9]: (no justification provided) (duration: 00m 45s)
  • 09:25 sg912@deploy2002: Started deploy [airflow-dags/analytics@3b374a9]: (no justification provided)
  • 09:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P52814 and previous config saved to /var/cache/conftool/dbconfig/20231004-091245-arnaudb.json
  • 09:08 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging KMorgan out of all services on: 2175 hosts
  • 09:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging KMorgan out of all services on: 2175 hosts
  • 09:02 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 09:01 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 08:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T343198)', diff saved to https://phabricator.wikimedia.org/P52813 and previous config saved to /var/cache/conftool/dbconfig/20231004-085739-arnaudb.json
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging EllenR out of all services on: 2175 hosts
  • 08:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging EllenR out of all services on: 2175 hosts
  • 08:19 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 08:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2003.codfw.wmnet with OS bullseye
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Eigyan out of all services on: 2176 hosts
  • 08:00 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Eigyan out of all services on: 2176 hosts
  • 07:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: host reimage
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: host reimage
  • 07:34 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2003.codfw.wmnet with OS bullseye
  • 07:19 XioNoX: Remove static routes for anycast prefixes - T347494
  • 06:30 moritzm: installing glibc security updates
  • 06:19 Surbhi_: Deployed refinery using scap, then deployed onto hdfs
  • 05:54 sg912@deploy2002: Finished deploy [analytics/refinery@e954b12] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@e954b12a] (duration: 03m 00s)
  • 05:51 sg912@deploy2002: Started deploy [analytics/refinery@e954b12] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@e954b12a]
  • 05:50 sg912@deploy2002: Finished deploy [analytics/refinery@e954b12] (thin): Regular analytics weekly train THIN [analytics/refinery@e954b12a] (duration: 00m 06s)
  • 05:50 sg912@deploy2002: Started deploy [analytics/refinery@e954b12] (thin): Regular analytics weekly train THIN [analytics/refinery@e954b12a]
  • 05:49 sg912@deploy2002: Finished deploy [analytics/refinery@e954b12]: Regular analytics weekly train [analytics/refinery@e954b12a] (duration: 06m 02s)
  • 05:43 sg912@deploy2002: Started deploy [analytics/refinery@e954b12]: Regular analytics weekly train [analytics/refinery@e954b12a]
  • 03:56 kart_: Updated cxserver to 2023-09-28-043003-production (T343450, T347389, T338689)
  • 03:56 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 03:55 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 03:51 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 03:51 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 03:48 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 03:48 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-10-03

  • 23:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T343198)', diff saved to https://phabricator.wikimedia.org/P52812 and previous config saved to /var/cache/conftool/dbconfig/20231003-234343-arnaudb.json
  • 23:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 23:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 23:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T343198)', diff saved to https://phabricator.wikimedia.org/P52811 and previous config saved to /var/cache/conftool/dbconfig/20231003-234322-arnaudb.json
  • 23:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P52810 and previous config saved to /var/cache/conftool/dbconfig/20231003-232815-arnaudb.json
  • 23:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P52809 and previous config saved to /var/cache/conftool/dbconfig/20231003-231309-arnaudb.json
  • 22:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T343198)', diff saved to https://phabricator.wikimedia.org/P52808 and previous config saved to /var/cache/conftool/dbconfig/20231003-225803-arnaudb.json
  • 22:22 jdrewniak@deploy2002: Finished scap: Backport for Web typography prototype survey (T347208), Correct a recently-added message, [Prototype] Change i18n message (T347208) (duration: 39m 08s)
  • 22:11 jdrewniak@deploy2002: jdrewniak: Continuing with sync
  • 22:01 jdrewniak@deploy2002: jdrewniak: Backport for Web typography prototype survey (T347208), Correct a recently-added message, [Prototype] Change i18n message (T347208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:43 jdrewniak@deploy2002: Started scap: Backport for Web typography prototype survey (T347208), Correct a recently-added message, [Prototype] Change i18n message (T347208)
  • 21:32 jdrewniak@deploy2002: Finished scap: Backport for Promote several Wikipedias to Vector 2022 as default skin (T347321) (duration: 09m 26s)
  • 21:26 jdrewniak@deploy2002: jdlrobson and jdrewniak: Continuing with sync
  • 21:24 jdrewniak@deploy2002: jdlrobson and jdrewniak: Backport for Promote several Wikipedias to Vector 2022 as default skin (T347321) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:23 jdrewniak@deploy2002: Started scap: Backport for Promote several Wikipedias to Vector 2022 as default skin (T347321)
  • 20:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 20:56 eileen: tools upgraded from 130ca87e to 2e19cd39
  • 20:50 jdrewniak@deploy2002: Finished scap: Backport for Re-enable Extension:ParserMigration on labs (T333179) (duration: 38m 52s)
  • 20:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 20:35 jdrewniak@deploy2002: jdrewniak and sbailey: Continuing with sync
  • 20:34 jdrewniak@deploy2002: jdrewniak and sbailey: Backport for Re-enable Extension:ParserMigration on labs (T333179) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:16 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963081 (T347837). `purged` daemon will be restarted by puppet in eqsin in the next 30m
  • 20:11 jdrewniak@deploy2002: Started scap: Backport for Re-enable Extension:ParserMigration on labs (T333179)
  • 19:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 19:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 19:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 18:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 18:25 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.29 refs T347080
  • 18:13 jhuneidi@deploy2002: Pruned MediaWiki: 1.41.0-wmf.27 (duration: 02m 14s)
  • 18:11 jhuneidi@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.29 refs T347080 (duration: 43m 24s)
  • 17:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:34 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 17:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1027.eqiad.wmnet
  • 17:28 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1027.eqiad.wmnet
  • 17:27 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.29 refs T347080
  • 17:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1027.eqiad.wmnet with OS bullseye
  • 17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
  • 17:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
  • 17:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:02 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:59 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:59 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
  • 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
  • 16:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1027.eqiad.wmnet with reason: host reimage
  • 16:50 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1027.eqiad.wmnet with reason: host reimage
  • 16:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1027.eqiad.wmnet with OS bullseye
  • 16:36 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1027.eqiad.wmnet
  • 16:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
  • 16:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
  • 16:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1027.eqiad.wmnet
  • 16:20 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 16:20 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 16:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 16:19 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 16:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 16:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 16:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1026.eqiad.wmnet with OS bullseye
  • 16:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 16:06 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 16:04 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 16:03 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 16:03 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 16:03 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 16:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 16:01 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 15:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 15:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 15:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1026.eqiad.wmnet with reason: host reimage
  • 15:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 15:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1026.eqiad.wmnet with reason: host reimage
  • 15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:26 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:26 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:24 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1026.eqiad.wmnet with OS bullseye
  • 15:24 ottomata: mw-page-content-change-enrich - backfill is done, set replicas to 2 in eqiad and codfw
  • 15:23 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:23 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1026.eqiad.wmnet
  • 15:23 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
  • 15:22 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 15:11 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
  • 15:10 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1026.eqiad.wmnet
  • 15:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1033.eqiad.wmnet
  • 15:10 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1033.eqiad.wmnet
  • 15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@6f19600]: deploy to phab1004 for T348007 (duration: 00m 44s)
  • 15:06 brennen@deploy2002: Started deploy [phabricator/deployment@6f19600]: deploy to phab1004 for T348007
  • 15:06 brennen@deploy2002: Finished deploy [phabricator/deployment@6f19600]: test deploy to phab2002 for T348007 (duration: 00m 32s)
  • 15:06 brennen@deploy2002: Started deploy [phabricator/deployment@6f19600]: test deploy to phab2002 for T348007
  • 15:05 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
  • 15:05 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
  • 14:55 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1004']
  • 14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - ayounsi@cumin1001
  • 14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - ayounsi@cumin1001
  • 14:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:46 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1003']
  • 14:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:45 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:44 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:42 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['an-master1003']
  • 14:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
  • 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
  • 14:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:38 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
  • 14:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 14:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 14:37 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 14:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:37 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:36 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:36 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:36 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:36 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:36 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:35 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:35 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:35 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1033.eqiad.wmnet with OS bullseye
  • 14:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2002.codfw.wmnet with OS bullseye
  • 14:01 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963020 (T347837). `purged` daemon will be restarted by puppet in codfw in the next 30m
  • 14:01 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
  • 13:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 13:58 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-master1003
  • 13:57 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-master1003
  • 13:57 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
  • 13:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-master1004
  • 13:50 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Revert allocation of LVS VIPs for recommendation-api-ng - klausman@cumin1001"
  • 13:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-master1004
  • 13:49 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 13:49 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:49 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: host reimage
  • 13:48 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Revert allocation of LVS VIPs for recommendation-api-ng - klausman@cumin1001"
  • 13:48 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:46 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: host reimage
  • 13:44 klausman@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1033.eqiad.wmnet with OS bullseye
  • 13:43 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 13:43 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 13:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1025.eqiad.wmnet
  • 13:42 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1025.eqiad.wmnet
  • 13:41 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 13:38 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:38 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:38 ottomata: mw-page-content-change-enrich codfw - bump to 1.27.0 and set replicas to 12 while processing backlog - T347676
  • 13:34 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:34 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:34 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1025.eqiad.wmnet with OS bullseye
  • 13:34 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1003.eqiad.wmnet
  • 13:34 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:34 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 13:34 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1026.eqiad.wmnet
  • 13:33 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1026.eqiad.wmnet
  • 13:33 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 13:30 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:30 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:30 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:27 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2002.codfw.wmnet with OS bullseye
  • 13:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T343198)', diff saved to https://phabricator.wikimedia.org/P52807 and previous config saved to /var/cache/conftool/dbconfig/20231003-132733-arnaudb.json
  • 13:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 13:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 13:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T343198)', diff saved to https://phabricator.wikimedia.org/P52806 and previous config saved to /var/cache/conftool/dbconfig/20231003-132700-arnaudb.json
  • 13:23 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1003.eqiad.wmnet
  • 13:23 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 13:23 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:23 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 13:22 samtar@deploy2002: Finished scap: Backport for arwiki: add importsources (T347563), add throttle rules for Ada Lovelace Day October 10, 2023 and fix throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T347719) (duration: 09m 03s)
  • 13:21 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 13:19 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:16 samtar@deploy2002: anzx and samtar: Continuing with sync
  • 13:14 samtar@deploy2002: anzx and samtar: Backport for arwiki: add importsources (T347563), add throttle rules for Ada Lovelace Day October 10, 2023 and fix throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T347719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:13 samtar@deploy2002: Started scap: Backport for arwiki: add importsources (T347563), add throttle rules for Ada Lovelace Day October 10, 2023 and fix throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T347719)
  • 13:12 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 13:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P52805 and previous config saved to /var/cache/conftool/dbconfig/20231003-131154-arnaudb.json
  • 13:10 samtar@deploy2002: Finished scap: Backport for New donor experience stream for apps event schema (duration: 08m 26s)
  • 13:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1025.eqiad.wmnet with reason: host reimage
  • 13:04 samtar@deploy2002: sharvaniharan and samtar: Continuing with sync
  • 13:03 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1025.eqiad.wmnet with reason: host reimage
  • 13:03 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2001.codfw.wmnet with OS bullseye
  • 13:03 samtar@deploy2002: sharvaniharan and samtar: Backport for New donor experience stream for apps event schema synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:01 samtar@deploy2002: Started scap: Backport for New donor experience stream for apps event schema
  • 12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P52804 and previous config saved to /var/cache/conftool/dbconfig/20231003-125647-arnaudb.json
  • 12:50 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1025.eqiad.wmnet with OS bullseye
  • 12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: host reimage
  • 12:42 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: host reimage
  • 12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T343198)', diff saved to https://phabricator.wikimedia.org/P52803 and previous config saved to /var/cache/conftool/dbconfig/20231003-124141-arnaudb.json
  • 12:23 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2001.codfw.wmnet with OS bullseye
  • 11:54 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963004 (T347837). `purged` daemon will be restarted by puppet in ulsfo in the next 30m
  • 11:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 11:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1004.eqiad.wmnet with reason: host reimage
  • 11:29 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:29 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
  • 11:29 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1004.eqiad.wmnet with reason: host reimage
  • 11:29 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
  • 11:26 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:11 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 10:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1003.eqiad.wmnet with OS bullseye
  • 10:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix katran-test.svc.eqiad.wmnet IP allocation - vgutierrez@cumin1001"
  • 10:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: host reimage
  • 10:34 vgutierrez@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix katran-test.svc.eqiad.wmnet IP allocation - vgutierrez@cumin1001"
  • 10:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: host reimage
  • 10:32 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
  • 10:30 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:19 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
  • 10:15 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1003.eqiad.wmnet with OS bullseye
  • 09:50 claime: Uncordoned kubernetes2010.codfw.wmnet
  • 09:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2010.codfw.wmnet
  • 09:49 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2010.codfw.wmnet
  • 09:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1002.eqiad.wmnet with OS bullseye
  • 09:42 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 09:42 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 09:38 ladsgroup@deploy2002: Finished scap: Creating fonwiki (T347935) (duration: 07m 34s)
  • 09:30 ladsgroup@deploy2002: Started scap: Creating fonwiki (T347935)
  • 09:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: BIOS setting change
  • 09:28 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: BIOS setting change
  • 09:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: host reimage
  • 09:26 claime: Draining kubernetes2010.codfw.wmnet for reboot to change BIOS setting
  • 09:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: host reimage
  • 09:07 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1002.eqiad.wmnet with OS bullseye
  • 09:06 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:06 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:05 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Check chassis internals for GPU hosting
  • 08:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Check chassis internals for GPU hosting
  • 08:26 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1001.eqiad.wmnet with OS bullseye
  • 08:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 08:14 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:13 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 08:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 08:03 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: host reimage
  • 08:01 taavi: taavi@mwmaint2002 ~ $ mwscript resetAuthenticationThrottle.php --wiki=enwiki --signup --ip=155.232.7.202 # T347874
  • 07:59 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: host reimage
  • 07:57 taavi@deploy2002: Finished scap: T347874 and T347069 (duration: 29m 22s)
  • 07:42 taavi@deploy2002: taavi: Continuing with sync
  • 07:42 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1001.eqiad.wmnet with OS bullseye
  • 07:40 taavi@deploy2002: taavi: T347874 and T347069 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:27 taavi@deploy2002: Started scap: T347874 and T347069
  • 07:03 kart_: Updated MinT to 2023-09-28-043052-production (T343450, T341478)
  • 07:03 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:59 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 06:56 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:51 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:45 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:42 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:42 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:52 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on druid1009.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
  • 05:52 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on druid1009.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
  • 04:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 04:20 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 04:20 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 04:13 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 04:12 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 04:11 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 04:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 04:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 04:09 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 04:08 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 04:08 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 04:07 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 04:05 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 04:05 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 04:05 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 03:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T343198)', diff saved to https://phabricator.wikimedia.org/P52802 and previous config saved to /var/cache/conftool/dbconfig/20231003-034640-arnaudb.json
  • 03:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 03:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 02:41 krinkle@deploy2002: Finished scap: (no justification provided) (duration: 07m 34s)
  • 02:33 krinkle@deploy2002: Started scap: (no justification provided)
  • 02:17 krinkle@deploy2002: Synchronized docroot/noc/: (no justification provided) (duration: 08m 03s)
  • 01:48 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 01:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
  • 01:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
  • 01:34 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 01:33 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 01:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
  • 01:21 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
  • 01:18 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 01:06 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 01:06 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 00:39 ejegg: fundraising civicrm upgraded from c1b28287 to 995a3d5b
  • 00:38 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 00:29 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 00:28 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 00:28 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 00:28 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 00:28 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet

2023-10-02

  • 23:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1032.eqiad.wmnet with OS bullseye
  • 22:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 22:43 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 22:30 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS bullseye
  • 22:30 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1032.eqiad.wmnet with OS bullseye
  • 22:16 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS bullseye
  • 22:09 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 22:01 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host restbase1032.eqiad.wmnet
  • 22:01 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
  • 22:00 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:53 maryum: Deployed patch for T347704
  • 21:32 kindrobot: end UTC late backport window
  • 21:28 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 21:28 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:22 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:22 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:21 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:21 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1029.eqiad.wmnet with OS bullseye
  • 21:17 kindrobot@deploy2002: Finished scap: Backport for Ignore only site notices (T347645), HookUtils: Fix checking page props (T347878), Fix diff title escaping (T347578), Diff: Add missing .mw-diff-inline-moved selector (duration: 10m 06s)
  • 21:11 kindrobot@deploy2002: kindrobot and matmarex: Continuing with sync
  • 21:09 kindrobot@deploy2002: kindrobot and matmarex: Backport for Ignore only site notices (T347645), HookUtils: Fix checking page props (T347878), Fix diff title escaping (T347578), Diff: Add missing .mw-diff-inline-moved selector synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:07 kindrobot@deploy2002: Started scap: Backport for Ignore only site notices (T347645), HookUtils: Fix checking page props (T347878), Fix diff title escaping (T347578), Diff: Add missing .mw-diff-inline-moved selector
  • 20:59 ottomata: mw-page-content-change-enrich - CORRECTION - increase replicas to 20 to process backlog - T347676
  • 20:58 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:58 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1029.eqiad.wmnet with reason: host reimage
  • 20:57 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:57 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:56 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:56 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:56 ottomata: mw-page-content-change-enrich - increase replicas to 24 to process backlog - T347676
  • 20:54 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1029.eqiad.wmnet with reason: host reimage
  • 20:42 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:40 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:37 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:36 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:35 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:32 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:31 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 20:27 ottomata: mw-page-content-change-enrich - increase replicas to 12 to process backlog - T347676
  • 20:27 kindrobot@deploy2002: Finished scap: Backport for Undeploy Reader Demographics 2 pilot survey (T345951), DiscussionTools: Disable timestamp links in production initially (duration: 08m 49s)
  • 20:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 20:21 kindrobot@deploy2002: esanders and dani and kindrobot: Continuing with sync
  • 20:19 kindrobot@deploy2002: esanders and dani and kindrobot: Backport for Undeploy Reader Demographics 2 pilot survey (T345951), DiscussionTools: Disable timestamp links in production initially synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:18 kindrobot@deploy2002: Started scap: Backport for Undeploy Reader Demographics 2 pilot survey (T345951), DiscussionTools: Disable timestamp links in production initially
  • 20:13 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:13 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:12 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:12 eileen: process control revision changed from b370644b to 9760851c
  • 20:12 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:12 eileen: revision changed from b370644b to 9760851c
  • 20:11 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:11 kindrobot@deploy2002: Backport cancelled.
  • 20:01 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 20:01 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1029.eqiad.wmnet with OS bullseye
  • 19:54 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
  • 19:53 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 19:53 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
  • 19:53 moritzm: installing libvpx security updates
  • 19:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
  • 19:40 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 19:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1024.eqiad.wmnet with OS bullseye
  • 19:38 eileen: civicrm upgraded from 7406cdf3 to c1b28287
  • 19:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1024.eqiad.wmnet with reason: host reimage
  • 19:16 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1024.eqiad.wmnet with reason: host reimage
  • 19:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1003']
  • 19:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
  • 19:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:02 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1024.eqiad.wmnet with OS bullseye
  • 19:02 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 19:02 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 19:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 19:00 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1003.eqiad.wmnet']
  • 19:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003.eqiad.wmnet']
  • 19:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
  • 19:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
  • 18:56 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1024.eqiad.wmnet
  • 18:56 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
  • 18:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
  • 18:44 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1024.eqiad.wmnet
  • 18:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1023.eqiad.wmnet
  • 18:42 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1023.eqiad.wmnet
  • 18:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1023.eqiad.wmnet with OS bullseye
  • 18:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1023.eqiad.wmnet with reason: host reimage
  • 18:13 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1023.eqiad.wmnet with reason: host reimage
  • 17:59 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1023.eqiad.wmnet with OS bullseye
  • 17:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1022.eqiad.wmnet
  • 17:59 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1022.eqiad.wmnet
  • 17:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 17:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 17:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
  • 17:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
  • 17:38 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1022.eqiad.wmnet with OS bullseye
  • 17:30 sukhe: A:dns-rec enable puppet and run agent
  • 17:24 sukhe: sudo cumin "A:dns-rec" "disable-puppet 'merging CR 962648'"
  • 17:18 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 17:18 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 17:17 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 17:17 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 17:17 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 17:17 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 17:12 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1022.eqiad.wmnet with reason: host reimage
  • 17:09 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1022.eqiad.wmnet with reason: host reimage
  • 17:00 fabfur: upgrade purged package to version 0.21+deb12u1 cp4052 (bookworm) (T347837)
  • 16:56 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1022.eqiad.wmnet with OS bullseye
  • 16:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1031.eqiad.wmnet with OS bullseye
  • 16:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T347624, testing new cookbook changes) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet, repooling both afterwards w/ encryption
  • 16:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer (T347624, testing new cookbook changes) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet, repooling both afterwards w/ encryption
  • 16:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
  • 16:26 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
  • 16:13 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1031.eqiad.wmnet with OS bullseye
  • 16:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1028.eqiad.wmnet with OS bullseye
  • 16:06 fabfur: importing into bookworm-wikimedia package purged_0.21+deb12u1_amd64 (T347837)
  • 15:44 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:43 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:43 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1028.eqiad.wmnet with reason: host reimage
  • 15:40 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1028.eqiad.wmnet with reason: host reimage
  • 15:29 sukhe: enable puppet on A:dns-rec and force agent run
  • 15:28 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:28 joal@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1028.eqiad.wmnet with OS bullseye
  • 15:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1028.eqiad.wmnet
  • 15:27 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1028.eqiad.wmnet
  • 15:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1021.eqiad.wmnet
  • 15:24 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1021.eqiad.wmnet
  • 15:23 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab1003.wikimedia.org to gitlab2002.wikimedia.org
  • 15:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 15:20 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 15:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1021.eqiad.wmnet with OS bullseye
  • 15:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 14:55 elukey: restart kubelet on ml-serve1001 (high latencies registered)
  • 14:51 fabfur: upgrade purged package to version 0.21+deb11u1 on all cp hosts (T347837)
  • 14:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:47 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host ganeti-test2004 - jhancock@cumin2002"
  • 14:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host ganeti-test2004 - jhancock@cumin2002"
  • 14:44 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:40 stevemunene@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1021.eqiad.wmnet with reason: host reimage
  • 14:34 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1021.eqiad.wmnet with reason: host reimage
  • 14:23 fabfur: importing into bullseye-wikimedia package purged_0.21+deb11u1_amd64 (T347837)
  • 14:20 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1021.eqiad.wmnet with OS bullseye
  • 14:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:18 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1020.eqiad.wmnet with OS bullseye
  • 14:17 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:17 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:15 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:09 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 14:09 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 14:03 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:01 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 13:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1228.eqiad.wmnet with OS bullseye
  • 13:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS bullseye
  • 13:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1020.eqiad.wmnet with reason: host reimage
  • 13:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:52 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1020.eqiad.wmnet with reason: host reimage
  • 13:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1228.eqiad.wmnet with reason: host reimage
  • 13:40 taavi@deploy2002: Finished scap: Backport for Add 'testwikis' DB list to MWMultiVersion::DB_LISTS (T341110) (duration: 11m 15s)
  • 13:39 stevemunene@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 13:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
  • 13:39 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 13:38 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 13:38 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1020.eqiad.wmnet with OS bullseye
  • 13:38 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 13:38 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 13:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1228.eqiad.wmnet with reason: host reimage
  • 13:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 13:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
  • 13:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 13:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1228.eqiad.wmnet with OS bullseye
  • 13:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS bullseye
  • 13:34 taavi@deploy2002: taavi and dreamyjazz: Continuing with sync
  • 13:30 taavi@deploy2002: taavi and dreamyjazz: Backport for Add 'testwikis' DB list to MWMultiVersion::DB_LISTS (T341110) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:29 taavi@deploy2002: Started scap: Backport for Add 'testwikis' DB list to MWMultiVersion::DB_LISTS (T341110)
  • 13:27 taavi@deploy2002: Sync cancelled.
  • 13:19 taavi@deploy2002: taavi and dreamyjazz: Backport for clienthints: Enable display on testwikis and four production wikis (T341110) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:15 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:13 sukhe: disable puppet on A:dns-rec to merge CR 961818
  • 13:11 taavi@deploy2002: Started scap: Backport for clienthints: Enable display on testwikis and four production wikis (T341110)
  • 13:01 jelto@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab1003.wikimedia.org to gitlab2002.wikimedia.org
  • 12:39 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 12:39 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 12:34 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:31 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 12:31 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 12:29 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:29 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack codfw1dev - aborrero@cumin1001"
  • 12:25 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack codfw1dev - aborrero@cumin1001"
  • 12:22 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 12:18 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:12 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 12:12 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 12:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 12:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 11:56 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 11:55 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 11:55 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 11:55 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 11:55 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 11:54 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 11:51 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 11:49 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:47 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 11:47 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
  • 11:46 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:45 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:42 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:42 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:40 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:38 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:35 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 10:58 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 10:58 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 10:55 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 10:54 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 10:49 fabfur: swap purged on cp4040 to use UDS instead of TCP for Varnish (T347837)
  • 10:43 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 10:43 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 10:34 fabfur: depool cp4040 to test new purged version (T347837)
  • 09:48 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
  • 09:47 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
  • 09:06 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1006.eqiad.wmnet with OS bullseye
  • 08:34 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
  • 08:31 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
  • 08:24 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:21 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1006
  • 08:21 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1006
  • 08:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 08:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:49 godog: +150G to prometheus@k8s in codfw
  • 07:47 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.eqiad.wmnet with OS bullseye
  • 07:46 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1006 - taavi@cumin1001"
  • 07:45 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1006 - taavi@cumin1001"
  • 07:37 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 07:37 joal@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 07:36 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:36 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
  • 07:35 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
  • 07:32 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:31 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:31 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
  • 07:30 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
  • 07:28 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 05:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 05:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .

2023-10-01

  • 01:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T343198)', diff saved to https://phabricator.wikimedia.org/P52799 and previous config saved to /var/cache/conftool/dbconfig/20231001-013851-arnaudb.json
  • 01:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P52798 and previous config saved to /var/cache/conftool/dbconfig/20231001-012344-arnaudb.json
  • 01:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P52797 and previous config saved to /var/cache/conftool/dbconfig/20231001-010838-arnaudb.json
  • 00:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T343198)', diff saved to https://phabricator.wikimedia.org/P52796 and previous config saved to /var/cache/conftool/dbconfig/20231001-005332-arnaudb.json


Other archives

2000s

2010s

2020s

.