Server Admin Log/Archive 72

2023-10-31

23:59 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1111.eqiad.wmnet with OS bullseye
23:51 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1110.eqiad.wmnet with OS bullseye
23:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
23:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
23:38 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
23:38 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
23:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
23:30 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
23:23 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
23:23 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
23:23 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
23:22 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1111.eqiad.wmnet with OS bullseye
23:15 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1107.eqiad.wmnet with OS bullseye
23:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
23:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
23:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
23:15 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1111.eqiad.wmnet with OS bullseye
23:14 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
23:14 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
23:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1109.eqiad.wmnet with OS bullseye
23:09 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1108.eqiad.wmnet with OS bullseye
23:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
23:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
23:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
23:08 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
23:08 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1111.eqiad.wmnet with OS bullseye
23:08 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS bullseye
23:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
23:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
23:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
22:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
22:54 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
22:53 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
22:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
22:49 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
22:48 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
22:38 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
22:38 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
22:34 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
22:33 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
22:33 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
22:33 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1108.eqiad.wmnet with OS bullseye
22:25 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
22:24 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
22:24 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
22:24 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS bullseye
22:24 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
22:24 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1108.eqiad.wmnet with OS bullseye
22:19 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
22:18 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1106.eqiad.wmnet with OS bullseye
22:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
22:17 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1108.eqiad.wmnet with OS bullseye
22:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
22:16 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1107.eqiad.wmnet with OS bullseye
22:05 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
22:02 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
22:02 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS bullseye
21:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
21:54 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
21:53 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
21:46 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1103.eqiad.wmnet
21:39 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
21:38 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1106.eqiad.wmnet with OS bullseye
21:38 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1103.eqiad.wmnet
21:37 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp1103.eqiad.wmnet
21:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1103.eqiad.wmnet
21:34 eileen: civicrm upgraded from 86a08564 to 31d53b57
21:28 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
21:28 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1106.eqiad.wmnet with OS bullseye
21:21 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
21:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1105.eqiad.wmnet with OS bullseye
21:16 eileen: civicrm upgraded from a458c2bb to 86a08564
20:58 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
20:55 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
20:40 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
20:32 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1105.eqiad.wmnet with OS bullseye
20:16 TheresNoTime: close UTC late backport window
20:14 samtar@deploy2002: Finished scap: Backport for Deploy vector 2022 to non-English Wikibooks, etc (T349544) (duration: 10m 51s)
20:08 samtar@deploy2002: samtar and ksarabia: Continuing with sync
20:05 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:05 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
20:04 samtar@deploy2002: samtar and ksarabia: Backport for Deploy vector 2022 to non-English Wikibooks, etc (T349544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:03 samtar@deploy2002: Started scap: Backport for Deploy vector 2022 to non-English Wikibooks, etc (T349544)
19:56 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
19:55 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
19:12 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
19:12 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1105.eqiad.wmnet with OS bullseye
19:01 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
18:59 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1104.eqiad.wmnet with OS bullseye
18:50 ejegg: restarted fundraising scheduled jobs
18:40 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
18:37 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
18:24 ejegg: disabled fundraising scheduled jobs for table alter
18:24 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.3 refs T348356
18:22 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
18:10 ejegg: fundraising civicrm upgraded from 5862a3fc to a458c2bb
18:04 sukhe: reprepro -C component/dnsdist include bookworm-wikimedia dnsdist_1.8.2-1+wmf12u1_amd64.changes
17:59 taavi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
17:56 taavi@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
17:52 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1104.eqiad.wmnet with OS bullseye
17:51 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1002
17:51 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1002
17:43 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
17:43 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
17:42 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
17:42 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
17:42 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
17:42 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
17:27 Krinkle: krinkle@deploy2002:/srv/mediawiki/private: fix untracked warning for readme.FatalErrorSettings.php
16:49 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
16:49 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
16:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1103.eqiad.wmnet with OS bullseye
16:35 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
16:34 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
16:31 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
16:30 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1104.eqiad.wmnet with OS bullseye
16:27 taavi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
16:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
16:23 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
16:23 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
16:22 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
16:20 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
16:15 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
16:15 taavi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
16:12 taavi@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1002.mgmt.eqiad.wmnet with reboot policy FORCED
16:12 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:11 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1002 - taavi@cumin1001"
16:10 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1002 - taavi@cumin1001"
16:08 taavi@cumin1001: START - Cookbook sre.dns.netbox
16:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
16:04 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
15:52 arnaudb@cumin1001: dbctl commit (dc=all): 'discard db1131', diff saved to https://phabricator.wikimedia.org/P53120 and previous config saved to /var/cache/conftool/dbconfig/20231031-155253-arnaudb.json
15:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
15:42 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db1131.eqiad.wmnet
15:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1131.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:41 arnaudb@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1131.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1001"
15:38 arnaudb@cumin1001: START - Cookbook sre.dns.netbox
15:33 arnaudb@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1131.eqiad.wmnet
15:29 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
15:28 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
15:26 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:25 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:25 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:24 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
15:23 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T343198)', diff saved to https://phabricator.wikimedia.org/P53119 and previous config saved to /var/cache/conftool/dbconfig/20231031-152105-arnaudb.json
15:11 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
15:11 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
15:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53118 and previous config saved to /var/cache/conftool/dbconfig/20231031-150558-arnaudb.json
15:06 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
15:06 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
15:05 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
15:05 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
15:04 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
15:04 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
14:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
14:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P53117 and previous config saved to /var/cache/conftool/dbconfig/20231031-145052-arnaudb.json
14:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T343198)', diff saved to https://phabricator.wikimedia.org/P53116 and previous config saved to /var/cache/conftool/dbconfig/20231031-143545-arnaudb.json
14:13 sukhe: install4002:/etc/dhcp/automation/ttyS1-115200 rm cp4052.conf
14:06 sbassett: Deployed updated security mitigation for T348828
13:59 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
13:58 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
13:49 ejegg: fundraising civicrm upgraded from 71d26d3b to 5862a3fc
13:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
13:36 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
13:36 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
13:30 TheresNoTime: close UTC afternoon backport window
13:27 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
13:27 samtar@deploy2002: Finished scap: Backport for Roll-out Parsoid Kartographer support for all English language wikis (T342871) (duration: 10m 49s)
13:22 samtar@deploy2002: ihurbain and samtar: Continuing with sync
13:18 samtar@deploy2002: ihurbain and samtar: Backport for Roll-out Parsoid Kartographer support for all English language wikis (T342871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:17 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
13:16 samtar@deploy2002: Started scap: Backport for Roll-out Parsoid Kartographer support for all English language wikis (T342871)
12:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 100%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53113 and previous config saved to /var/cache/conftool/dbconfig/20231031-125348-arnaudb.json
12:49 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53112 and previous config saved to /var/cache/conftool/dbconfig/20231031-124918-arnaudb.json
12:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 80%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53108 and previous config saved to /var/cache/conftool/dbconfig/20231031-122338-arnaudb.json
12:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 80%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53107 and previous config saved to /var/cache/conftool/dbconfig/20231031-121908-arnaudb.json
12:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 70%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53106 and previous config saved to /var/cache/conftool/dbconfig/20231031-120833-arnaudb.json
12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 70%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53105 and previous config saved to /var/cache/conftool/dbconfig/20231031-120403-arnaudb.json
11:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 60%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53104 and previous config saved to /var/cache/conftool/dbconfig/20231031-115328-arnaudb.json
11:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 60%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53103 and previous config saved to /var/cache/conftool/dbconfig/20231031-114858-arnaudb.json
11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 50%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53102 and previous config saved to /var/cache/conftool/dbconfig/20231031-113823-arnaudb.json
11:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53101 and previous config saved to /var/cache/conftool/dbconfig/20231031-113353-arnaudb.json
11:24 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1007.eqiad.wmnet with OS bookworm
11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 40%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53099 and previous config saved to /var/cache/conftool/dbconfig/20231031-112318-arnaudb.json
11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 40%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53098 and previous config saved to /var/cache/conftool/dbconfig/20231031-111849-arnaudb.json
11:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 30%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53097 and previous config saved to /var/cache/conftool/dbconfig/20231031-110813-arnaudb.json
11:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 30%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53096 and previous config saved to /var/cache/conftool/dbconfig/20231031-110344-arnaudb.json
10:53 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
10:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 20%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53095 and previous config saved to /var/cache/conftool/dbconfig/20231031-105308-arnaudb.json
10:50 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
10:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 20%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53094 and previous config saved to /var/cache/conftool/dbconfig/20231031-104839-arnaudb.json
10:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 10%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53093 and previous config saved to /var/cache/conftool/dbconfig/20231031-103804-arnaudb.json
10:37 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1007.eqiad.wmnet with OS bookworm
10:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53092 and previous config saved to /var/cache/conftool/dbconfig/20231031-103334-arnaudb.json
10:22 arnaudb@cumin1001: dbctl commit (dc=all): 'db1227 (re)pooling @ 5%: dh1227 host warmup', diff saved to https://phabricator.wikimedia.org/P53091 and previous config saved to /var/cache/conftool/dbconfig/20231031-102259-arnaudb.json
10:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53090 and previous config saved to /var/cache/conftool/dbconfig/20231031-101829-arnaudb.json
10:17 arnaudb@cumin1001: dbctl commit (dc=all): 'set db1230 as a depooled host', diff saved to https://phabricator.wikimedia.org/P53089 and previous config saved to /var/cache/conftool/dbconfig/20231031-101750-arnaudb.json
09:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T343198)', diff saved to https://phabricator.wikimedia.org/P53088 and previous config saved to /var/cache/conftool/dbconfig/20231031-095054-arnaudb.json
09:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
09:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
09:47 arnaudb@cumin1001: dbctl commit (dc=all): 'set db1230 as a depooled host', diff saved to https://phabricator.wikimedia.org/P53087 and previous config saved to /var/cache/conftool/dbconfig/20231031-094737-arnaudb.json
09:39 arnaudb@cumin1001: dbctl commit (dc=all): 'set db1230 as a depooled host', diff saved to https://phabricator.wikimedia.org/P53086 and previous config saved to /var/cache/conftool/dbconfig/20231031-093919-arnaudb.json
09:35 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53085 and previous config saved to /var/cache/conftool/dbconfig/20231031-093457-arnaudb.json
09:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Set ', diff saved to https://phabricator.wikimedia.org/P53084 and previous config saved to /var/cache/conftool/dbconfig/20231031-093448-arnaudb.json
09:01 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
09:00 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
08:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: db1230 host warmup', diff saved to https://phabricator.wikimedia.org/P53083 and previous config saved to /var/cache/conftool/dbconfig/20231031-085740-arnaudb.json
08:56 arnaudb@cumin1001: dbctl commit (dc=all): 'db1230 config append', diff saved to https://phabricator.wikimedia.org/P53082 and previous config saved to /var/cache/conftool/dbconfig/20231031-085615-arnaudb.json
08:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 90%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53081 and previous config saved to /var/cache/conftool/dbconfig/20231031-085346-arnaudb.json
08:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53080 and previous config saved to /var/cache/conftool/dbconfig/20231031-083841-arnaudb.json
08:23 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 60%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53079 and previous config saved to /var/cache/conftool/dbconfig/20231031-082336-arnaudb.json
08:08 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 45%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53078 and previous config saved to /var/cache/conftool/dbconfig/20231031-080832-arnaudb.json
07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 30%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53077 and previous config saved to /var/cache/conftool/dbconfig/20231031-075327-arnaudb.json
07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 (re)pooling @ 15%: Host warmup', diff saved to https://phabricator.wikimedia.org/P53076 and previous config saved to /var/cache/conftool/dbconfig/20231031-073822-arnaudb.json
07:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 weight rebalancing - depooled', diff saved to https://phabricator.wikimedia.org/P53075 and previous config saved to /var/cache/conftool/dbconfig/20231031-073652-arnaudb.json
07:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 weight rebalancing', diff saved to https://phabricator.wikimedia.org/P53074 and previous config saved to /var/cache/conftool/dbconfig/20231031-073312-arnaudb.json
07:30 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 depooling from API and pooling in db2140', diff saved to https://phabricator.wikimedia.org/P53073 and previous config saved to /var/cache/conftool/dbconfig/20231031-073023-arnaudb.json
07:19 arnaudb@cumin1001: dbctl commit (dc=all): 'db2179 weight mimic old db2140', diff saved to https://phabricator.wikimedia.org/P53072 and previous config saved to /var/cache/conftool/dbconfig/20231031-071938-arnaudb.json
07:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Promote db2140 to s4 primary and set section read-write T349820', diff saved to https://phabricator.wikimedia.org/P53071 and previous config saved to /var/cache/conftool/dbconfig/20231031-070549-arnaudb.json
07:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T349820', diff saved to https://phabricator.wikimedia.org/P53070 and previous config saved to /var/cache/conftool/dbconfig/20231031-070405-arnaudb.json
07:02 arnaudb: Starting s4 codfw failover from db2179 to db2140 - T349820
06:49 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" (duration: 07m 12s)
06:44 marostegui@deploy2002: marostegui: Continuing with sync
06:43 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:42 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master"
06:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db2140 with weight 0 T349820', diff saved to https://phabricator.wikimedia.org/P53068 and previous config saved to /var/cache/conftool/dbconfig/20231031-063647-arnaudb.json
06:33 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 34 hosts with reason: Primary switchover s4 T349820
06:33 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 34 hosts with reason: Primary switchover s4 T349820
06:31 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master (duration: 06m 50s)
06:26 marostegui@deploy2002: marostegui: Continuing with sync
06:25 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc1 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:24 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master
03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.1 (duration: 02m 14s)
03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.3 refs T348356 (duration: 50m 44s)
03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.3 refs T348356
00:46 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
00:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
00:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye

2023-10-30

23:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
23:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
23:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
21:22 sbassett: Deployed updated security mitigation for T348828
21:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for search-loader[2001-2002].codfw.wmnet,search-loader[1001-1002].eqiad.wmnet
21:19 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for search-loader[2001-2002].codfw.wmnet,search-loader[1001-1002].eqiad.wmnet
20:58 ejegg: re-enabled fundraising scheduled jobs after deployment
20:45 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
20:45 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
20:44 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
20:44 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
20:43 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
20:43 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
20:41 ejegg: fundraising civicrm upgraded from 2c79475e to 71d26d3b
20:40 ejegg: disable fundraising scheduled jobs for deployment
20:29 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
20:29 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
20:28 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
20:21 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
20:20 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3004.wikimedia.org with OS bookworm
20:17 dancy@deploy2002: Finished scap: Backport for namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) (T349970) (duration: 10m 09s)
20:11 dancy@deploy2002: dancy and rhinosf1: Continuing with sync
20:10 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
20:08 dancy@deploy2002: dancy and rhinosf1: Backport for namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) (T349970) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:07 dancy@deploy2002: Started scap: Backport for namespaces:mediawiki: add Extensions/Skins as alias of Extension/Skin (+ tallk) (T349970)
19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
19:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
19:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3004.wikimedia.org with OS bookworm
18:59 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
18:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
18:53 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
18:52 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
18:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3003.wikimedia.org with OS bookworm
18:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
18:35 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
18:34 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
18:34 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
18:33 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: ping_offload
18:27 jbond: migrate ping_offload to puppet7
18:27 jbond@cumin1001: START - Cookbook sre.puppet.migrate-role for role: ping_offload
18:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
18:24 sukhe: racadm racreset cp1103.eqiad.wmnet
18:22 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
18:20 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on search-loader[2001-2002].codfw.wmnet with reason: T346039
18:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on search-loader[2001-2002].codfw.wmnet with reason: T346039
18:16 bking@deploy2002: Finished deploy [search/mjolnir/deploy@daf8c32]: T346039 (duration: 00m 06s)
18:16 bking@deploy2002: Started deploy [search/mjolnir/deploy@daf8c32]: T346039
18:11 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
18:10 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
17:56 jbond: migrate bastionhost to puppet7
17:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bookworm
17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
17:40 jbond: migrate pki::multirootca to puppet7
17:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
17:27 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host pki2002.codfw.wmnet
17:23 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host pki2002.codfw.wmnet
17:22 jbond: migrate pki2002 to puppet7
17:16 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
17:14 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
17:12 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bookworm
17:10 jbond: migrate pki::root to puppet7
17:04 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
16:51 sukhe: running authdns-update for CR 969816
16:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4052.ulsfo.wmnet with reason: depooled, reimaging
16:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp4052.ulsfo.wmnet with reason: depooled, reimaging
16:26 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
16:23 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
16:22 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
16:22 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
16:21 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
16:16 jbond: migrate O:ganeti_test to puppet7
16:14 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host ganeti-test1002.eqiad.wmnet
16:07 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
16:07 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
16:04 jbond: migrate ganeti-test1002.eqiad.wmnet to puppet7
16:03 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host ganeti-test1002.eqiad.wmnet
16:02 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
15:58 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
15:57 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudvirt-wdqs1003 - taavi@cumin1001"
15:56 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudvirt-wdqs1003 - taavi@cumin1001"
15:55 jbond: migrate failoid to puppet7
15:51 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
15:51 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS bullseye
15:49 jbond: move builder to puppet7
15:49 jbond: move cluster::unprivmanagement to puppet7
15:49 jbond: move config_master to puppet7
15:43 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
15:42 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
15:33 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1003
15:33 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1003
15:30 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:30 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1003 - taavi@cumin1001"
15:29 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1003 - taavi@cumin1001"
15:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1003
15:27 taavi@cumin1001: START - Cookbook sre.dns.netbox
15:21 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1003
14:41 bking@deploy2002: Finished deploy [search/mjolnir/deploy@daf8c32]: T346039 (duration: 00m 05s)
14:41 bking@deploy2002: Started deploy [search/mjolnir/deploy@daf8c32]: T346039
14:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on search-loader2001.codfw.wmnet with reason: T346039
14:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
14:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on search-loader2001.codfw.wmnet with reason: T346039
14:36 inflatador: bking@search-loader2001 disabling services as part of bullseye migration T346039
14:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
14:32 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
14:31 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
14:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
12:55 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1130.eqiad.wmnet onto db1230.eqiad.wmnet
12:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1217.eqiad.wmnet with OS bookworm
12:28 marostegui@cumin1001: dbctl commit (dc=all): 'New host', diff saved to https://phabricator.wikimedia.org/P53065 and previous config saved to /var/cache/conftool/dbconfig/20231030-122855-marostegui.json
12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage
12:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1217.eqiad.wmnet with reason: host reimage
12:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1217.eqiad.wmnet with OS bookworm
11:52 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1130.eqiad.wmnet onto db1230.eqiad.wmnet
11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Adding db1230 depooled, depooling db1130', diff saved to https://phabricator.wikimedia.org/P53064 and previous config saved to /var/cache/conftool/dbconfig/20231030-113401-arnaudb.json
11:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
11:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
11:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
11:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: provisionning db1230.eqiad.wmnet - T344036
09:42 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@af33784] (releasing): (no justification provided) (duration: 00m 40s)
09:42 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@af33784] (releasing): (no justification provided)
08:29 vgutierrez: switched to digicert-2023 in esams, eqsin and drmrs - T341119
08:17 wmde-fisch@deploy2002: Finished scap: Backport for Cleanup Kartographer Nearby flags (T332785) (duration: 07m 35s)
08:12 wmde-fisch@deploy2002: wmde-fisch: Continuing with sync
08:11 wmde-fisch@deploy2002: wmde-fisch: Backport for Cleanup Kartographer Nearby flags (T332785) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:10 wmde-fisch@deploy2002: Started scap: Backport for Cleanup Kartographer Nearby flags (T332785)
08:10 vgutierrez: triggering a puppet run on cp hosts in esams, eqsin and drmrs to switch to the new unified digicert certificates - T341119
08:06 vgutierrez: repool cp5025 - T341119
08:06 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" (duration: 06m 41s)
08:01 marostegui@deploy2002: marostegui: Continuing with sync
08:00 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:59 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master"
07:52 vgutierrez: depool cp5025 to perform some digicert-2023 related sanity checks - T341119
07:49 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (duration: 06m 36s)
07:48 marostegui@deploy2002: marostegui: Continuing with sync
07:44 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:43 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master
07:35 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
07:34 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
07:29 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" (duration: 06m 33s)
07:24 marostegui@deploy2002: marostegui: Continuing with sync
07:24 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:22 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc1 master"
07:22 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master (duration: 14m 04s)
07:18 elukey: arm keyholder on acmechief2002 and deploy1002
07:16 marostegui@deploy2002: marostegui: Continuing with sync
07:16 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc1014 to pc1 master synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:08 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc1 master

2023-10-28

21:25 fabfur: re-pooled cp1089 and cp3069
21:05 fabfur: depooled cp1089 and cp3069 to restart varnish|haproxy and let purged process incoming messages
20:20 fabfur: restarted purged on cp1089, cp6005, cp3069
19:46 fabfur: restarted purged on cp1078

2023-10-27

22:47 rzl: reprepro -C main include bullseye-wikimedia k8s-controller-sidecars_1.0.2-1_source.changes
22:05 ejegg: fundraising civicrm upgraded from 74781efd to 2c79475e
15:38 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2004.codfw.wmnet with OS bullseye
15:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
15:21 herron: power cycled titan1001
14:59 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
14:42 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
14:39 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
14:19 topranks: announcing internal core routes to esams asw's to test policy T344547
14:19 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:18 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
14:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
14:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
14:04 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
14:04 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
14:04 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
14:03 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
14:03 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
14:02 jayme@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
13:38 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host acmechief2002.codfw.wmnet
13:38 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bullseye
13:37 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2004.codfw.wmnet with OS bullseye
13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:36 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change sretest2004 DNS - cmooney@cumin1001"
13:35 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change sretest2004 DNS - cmooney@cumin1001"
13:33 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:31 jbond@cumin1001: START - Cookbook sre.puppet.migrate-host for host acmechief2002.codfw.wmnet
13:27 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host acmechief2002.codfw.wmnet
13:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host acmechief2002.codfw.wmnet with OS bookworm
13:00 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bullseye
12:55 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:54 cmooney@cumin1001: START - Cookbook sre.dns.netbox
12:41 jayme: updated mwdebug1001 to icu67 - T345561
12:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief2002.codfw.wmnet with reason: host reimage
12:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief2002.codfw.wmnet with reason: host reimage
11:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1102.eqiad.wmnet with OS bullseye
11:34 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
11:31 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
11:31 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host acmechief2002.codfw.wmnet with OS bookworm
11:30 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief2002.codfw.wmnet - jbond@cumin1001"
11:29 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM acmechief2002.codfw.wmnet - jbond@cumin1001"
11:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) acmechief2002.codfw.wmnet on all recursors
11:29 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache acmechief2002.codfw.wmnet on all recursors
11:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:29 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief2002.codfw.wmnet - jbond@cumin1001"
11:28 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM acmechief2002.codfw.wmnet - jbond@cumin1001"
11:26 jbond@cumin1001: START - Cookbook sre.dns.netbox
11:26 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host acmechief2002.codfw.wmnet
11:18 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1102.eqiad.wmnet with OS bullseye
11:17 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1102.eqiad.wmnet with OS bullseye
11:08 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
11:08 volans@cumin2002: START - Cookbook sre.ganeti.resource-report
11:01 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
11:01 jbond@cumin2002: START - Cookbook sre.ganeti.resource-report
11:00 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
10:48 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
10:48 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
10:48 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
10:45 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
10:45 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
10:44 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
10:40 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1102.eqiad.wmnet with OS bullseye
10:36 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1101.eqiad.wmnet with OS bullseye
10:20 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt-wdqs1001.eqiad.wmnet
10:20 taavi@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudvirt-wdqs1001.eqiad.wmnet
10:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
10:17 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
10:14 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
10:14 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
10:14 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
10:13 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
09:59 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
09:59 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1101.eqiad.wmnet with OS bullseye
09:34 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
09:19 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
09:19 btullis@cumin1001: Added views for new wiki: tlywiki T345169
09:02 moritzm: deployment-prep app servers are now using ICU67/Unicode 13
08:49 moritzm: uploaded libxml2 2.9.4+dfsg1-7+deb10u6+icu67+wmf1 to component/icu67 for buster-wikimedia (rebase of the ICU compat patches on top of the latest buster security update for libxml2) T345561
08:48 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
08:41 moritzm: downgrading dh-python on build2001 to the version which is in Bullseye. Before, 5.20230130~bpo11+1 was installed from bullseye-backports, but that version has dropped the python2 sequence we still need for some Buster builds
08:25 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1004.eqiad.wmnet with OS bookworm
08:10 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: host reimage
08:07 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: host reimage
07:55 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1004.eqiad.wmnet with OS bookworm
07:54 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudmetrics1003.eqiad.wmnet with OS bookworm
07:54 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
07:48 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: cloudmetrics1003 reimage
07:48 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1004.eqiad.wmnet with reason: cloudmetrics1003 reimage
07:39 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1003.eqiad.wmnet with reason: host reimage
07:36 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1003.eqiad.wmnet with reason: host reimage
07:32 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
07:24 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudmetrics1003.eqiad.wmnet with OS bookworm
06:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bullseye
01:49 cstone: civicrm upgraded from 70e0b88d to 74781efd

2023-10-26

22:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2006.wikimedia.org with OS bookworm
22:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
22:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
21:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bookworm
21:45 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:45 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:32 cstone: payments-wiki upgraded from f7407053 to 04428d6e
21:16 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: still trying to get nova to schedule hosts there
21:16 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: still trying to get nova to schedule hosts there
21:12 taavi@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudvirt-wdqs1001.eqiad.wmnet
21:00 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
20:45 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
20:45 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
20:44 cstone: payments-wiki upgraded from f7407053 to 99b330be
20:44 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
20:42 brennen: end of utc late backport & config window
20:42 brennen@deploy2002: Finished scap: Backport for OIDC: Return instead of null for email in profile (T283456) (duration: 07m 25s)
20:41 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2005.wikimedia.org with OS bookworm
20:37 brennen@deploy2002: brennen and tgr: Continuing with sync
20:36 brennen@deploy2002: brennen and tgr: Backport for OIDC: Return instead of null for email in profile (T283456) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:35 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudvirt-wdqs1001 - taavi@cumin1001"
20:34 brennen@deploy2002: Started scap: Backport for OIDC: Return instead of null for email in profile (T283456)
20:34 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudvirt-wdqs1001 - taavi@cumin1001"
20:34 brennen@deploy2002: Finished scap: Backport for Deploy pilot survey on metawiki (T349854) (duration: 08m 56s)
20:31 bvibber: brion running video transcode backfill via mwmaint2002 (requeueTranscodes.php) + job queue
20:29 brennen@deploy2002: dani and brennen: Continuing with sync
20:26 brennen@deploy2002: dani and brennen: Backport for Deploy pilot survey on metawiki (T349854) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:25 brennen@deploy2002: Started scap: Backport for Deploy pilot survey on metawiki (T349854)
20:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
20:20 brennen@deploy2002: Finished scap: Backport for "Soft-launch" iOS-compatible HLS video transcodes (T68722) (duration: 08m 29s)
20:19 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
20:15 brennen@deploy2002: brennen and brion: Continuing with sync
20:14 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
20:13 brennen@deploy2002: brennen and brion: Backport for "Soft-launch" iOS-compatible HLS video transcodes (T68722) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:12 brennen@deploy2002: Started scap: Backport for "Soft-launch" iOS-compatible HLS video transcodes (T68722)
20:11 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1001.eqiad.wmnet with reason: host reimage
20:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bookworm
19:59 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
19:59 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy FORCED
19:43 taavi@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy FORCED
19:41 taavi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
19:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2004.wikimedia.org with OS bookworm
19:30 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
19:29 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt-wdqs1001
19:29 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1001
19:28 taavi@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt-wdqs1001
19:28 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt-wdqs1001
19:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
19:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
18:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2004.wikimedia.org with OS bookworm
18:07 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.2 refs T348355
17:53 sukhe: sudo cumin -b1 -s300 'A:dns-rec and not A:codfw' 'systemctl restart pdns-recursor.service'
17:36 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:36 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1001 - taavi@cumin1001"
17:35 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudvirt-wdqs1001 - taavi@cumin1001"
17:32 taavi@cumin1001: START - Cookbook sre.dns.netbox
17:19 stevemunene@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid public cluster: Roll restart of Druid jvm daemons.
17:17 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
17:01 sukhe: sudo cumin -b1 -s30 'A:dns-rec and not A:codfw' 'systemctl restart haproxy.service'
16:18 stevemunene@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
16:05 hnowlan@deploy2002: Finished deploy [restbase/deploy@c461bad]: Adding fonwiki T347940 (duration: 16m 53s)
16:04 sukhe: sudo cumin -b1 -s300 'A:dns-rec and A:edges' 'systemctl restart ntp.service'
15:48 hnowlan@deploy2002: Started deploy [restbase/deploy@c461bad]: Adding fonwiki T347940
15:42 sukhe: sudo cumin -b1 -s600 'A:dns-rec and (A:eqiad or A:codfw)' 'systemctl restart ntp.service'
15:42 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
15:35 jgiannelos@deploy2002: Finished deploy [restbase/deploy@4c14785]: (no justification provided) (duration: 13m 21s)
15:30 XioNoX: test add BGP session between ssw1-e1-eqiad and lsw1-e8-eqiad
15:22 jgiannelos@deploy2002: Started deploy [restbase/deploy@4c14785]: (no justification provided)
15:15 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
15:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
15:09 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
14:53 Lucas_WMDE: UTC afternoon backport+config window (belatedly) done
14:52 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) (duration: 14m 01s)
14:49 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
14:46 lucaswerkmeister-wmde@deploy2002: kartik and lucaswerkmeister-wmde: Continuing with sync
14:42 jgiannelos@deploy2002: Finished deploy [restbase/deploy@ff46322]: (no justification provided) (duration: 01m 38s)
14:40 jgiannelos@deploy2002: Started deploy [restbase/deploy@ff46322]: (no justification provided)
14:39 lucaswerkmeister-wmde@deploy2002: kartik and lucaswerkmeister-wmde: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:38 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836)
14:36 filippo@deploy2002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
14:36 filippo@deploy2002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
14:36 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
14:36 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
14:36 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
14:35 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
14:33 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
14:23 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove broken QUnit test (T349485) (duration: 06m 53s)
14:20 ejegg: donorwiki upgraded from 894eacce to f7407053
14:17 lucaswerkmeister-wmde@deploy2002: abi and lucaswerkmeister-wmde: Continuing with sync
14:17 lucaswerkmeister-wmde@deploy2002: abi and lucaswerkmeister-wmde: Backport for Remove broken QUnit test (T349485) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:16 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove broken QUnit test (T349485)
14:14 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/similar-users: apply
14:14 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/similar-users: apply
14:14 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/similar-users: apply
14:09 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
14:09 jayme@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
13:56 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for cirrus: disable canary events for update & error streams (duration: 07m 19s)
13:51 lucaswerkmeister-wmde@deploy2002: dcausse and lucaswerkmeister-wmde: Continuing with sync
13:50 lucaswerkmeister-wmde@deploy2002: dcausse and lucaswerkmeister-wmde: Backport for cirrus: disable canary events for update & error streams synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:49 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for cirrus: disable canary events for update & error streams
13:46 moritzm: installing cpio security updates
13:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) (duration: 14m 48s)
13:40 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kartik: Continuing with sync
13:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kartik: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:32 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
13:31 moritzm: installing curl security updates on buster
13:31 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for CX3 Build 0.2.0+20231026 (T348563 T308836)
13:30 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Add throttle rule for Edit-a-Thon on 2023-11-03 (T349234) (duration: 06m 43s)
13:27 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
13:26 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
13:25 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Continuing with sync
13:24 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Backport for Add throttle rule for Edit-a-Thon on 2023-11-03 (T349234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:23 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Add throttle rule for Edit-a-Thon on 2023-11-03 (T349234)
13:21 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
13:21 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enable block feature for AbuseFilter on srwiki (T349727) (duration: 10m 23s)
13:20 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:20 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
13:15 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Continuing with sync
13:15 moritzm: installing poppler security updates
13:11 lucaswerkmeister-wmde@deploy2002: zoranzoki21 and lucaswerkmeister-wmde: Backport for Enable block feature for AbuseFilter on srwiki (T349727) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:10 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enable block feature for AbuseFilter on srwiki (T349727)
13:04 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
12:27 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
12:26 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
11:04 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:03 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:58 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:51 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:51 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:51 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:40 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:30 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
10:30 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
10:25 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
10:25 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
10:20 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
10:20 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
10:10 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
10:10 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
09:29 dcausse: erratum (replace wdqs1009 with wdqs2009 in the above msg): depooling and restarting blazegraph on wdqs2009 (stuck since 2023-10-12)
09:28 dcausse: depooling and restarting blazegraph on wdqs1009 (stuck since 2023-10-12)
09:23 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1009.eqiad.wmnet with OS bullseye
09:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
09:14 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
09:06 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage
09:03 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1009.eqiad.wmnet with reason: host reimage
08:50 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1009.eqiad.wmnet with OS bullseye
08:49 urbanecm: mwmaint2002: `foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue` (testing T344428; after enabling backend on all Wikipedias)
08:48 urbanecm@deploy2002: Finished scap: Backport for Growth: Enable new Impact backend everywhere (T344143) (duration: 09m 29s)
08:43 urbanecm@deploy2002: urbanecm: Continuing with sync
08:40 urbanecm@deploy2002: urbanecm: Backport for Growth: Enable new Impact backend everywhere (T344143) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:40 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
08:40 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1008.eqiad.wmnet with OS bullseye
08:39 urbanecm@deploy2002: Started scap: Backport for Growth: Enable new Impact backend everywhere (T344143)
08:32 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
08:32 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
08:31 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
08:29 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
08:28 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
08:28 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
08:27 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage
08:21 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1008.eqiad.wmnet with reason: host reimage
08:07 brouberol@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1008.eqiad.wmnet with OS bullseye
08:02 godog: restart prometheus k8s k8s-aux - T343529
07:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
07:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
07:36 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
07:32 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
07:31 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
07:23 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
07:21 apergos: UTC morning backport and config window closed
07:19 kartik@deploy2002: Finished scap: Backport for testwiki: Enable Section translation on some Wikipedias with potential to be supported with MinT (T345267) (duration: 13m 11s)
07:13 kartik@deploy2002: kartik: Continuing with sync
07:08 kartik@deploy2002: kartik: Backport for testwiki: Enable Section translation on some Wikipedias with potential to be supported with MinT (T345267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:06 kartik@deploy2002: Started scap: Backport for testwiki: Enable Section translation on some Wikipedias with potential to be supported with MinT (T345267)
06:52 moritzm: installing openssl security updates
06:40 _joe_: rebuilding the base httpd image for mediawiki to pick up glogger changes
04:31 cstone: civicrm upgraded from 16175067 to 70e0b88d
01:35 cstone: payments-wiki upgraded from 382a5a70 to f7407053

2023-10-25

22:28 jforrester@deploy2002: Finished scap: Backport for diff: Fix LinkRenderer method call (T349726) (duration: 07m 21s)
22:22 jforrester@deploy2002: jforrester and umherirrender: Continuing with sync
22:22 jforrester@deploy2002: jforrester and umherirrender: Backport for diff: Fix LinkRenderer method call (T349726) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:20 jforrester@deploy2002: Started scap: Backport for diff: Fix LinkRenderer method call (T349726)
21:01 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
21:00 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
21:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
20:59 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
20:58 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:57 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:23 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
20:20 ejegg: payments-wiki upgraded from 7575f0e6 to 382a5a70
20:11 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:10 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
20:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1006.wikimedia.org with OS bookworm
19:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase1018.eqiad.wmnet
19:57 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:57 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1018.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
19:56 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1018.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
19:50 eevans@cumin1001: START - Cookbook sre.dns.netbox
19:44 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:44 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
19:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
19:40 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase1018.eqiad.wmnet
19:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase1017.eqiad.wmnet
19:36 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:36 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
19:35 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1017.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
19:33 eevans@cumin1001: START - Cookbook sre.dns.netbox
19:27 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1006.wikimedia.org with OS bookworm
19:25 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase1017.eqiad.wmnet
19:20 sukhe: sukhe@cumin2002:~$ sudo cumin 'A:dns-rec' "enable-puppet 'wait before enabling'"
19:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase1016.eqiad.wmnet
19:19 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:19 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
19:18 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
19:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
19:16 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
19:16 cmooney@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if it makes vlan1054 records - cmooney@cumin1001"
19:14 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: check if it makes vlan1054 records - cmooney@cumin1001"
19:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
18:33 dancy@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.2 refs T348355 (duration: 05m 52s)
18:32 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:32 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:28 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.2 refs T348355
18:17 eevans@cumin1001: START - Cookbook sre.dns.netbox
18:11 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase1016.eqiad.wmnet
18:04 ejegg: fundraising civicrm upgraded from 6cfae26a to 16175067
17:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1005.wikimedia.org with OS bookworm
17:21 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
17:20 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
17:15 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
17:15 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
17:10 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
17:09 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
17:04 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
17:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
17:04 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
17:02 ottomata: temporarily increasing log level to trace for eventgate-logging-external in eqiad canary release only - T347477
16:59 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
16:47 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
16:46 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
16:46 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
16:45 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1005.wikimedia.org with OS bookworm
16:45 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
16:44 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:44 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:07 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1004.wikimedia.org with OS bookworm
14:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
14:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
14:30 jforrester@deploy2002: sync-world aborted: Backport for Allow logged out users to run FunctionEvaluator widget (T301670 T349055 T349057) (duration: 55m 10s)
14:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bookworm
14:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns1004.wikimedia.org with OS bookworm
14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1100.eqiad.wmnet with OS bullseye
14:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1007.eqiad.wmnet with OS bullseye
14:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bookworm
14:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
14:02 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host deploy1002
14:02 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host deploy1002
13:59 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
13:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1007.eqiad.wmnet with reason: host reimage
13:54 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-tool1010
13:54 jforrester@deploy2002: jforrester: Continuing with sync
13:53 jforrester@deploy2002: jforrester: Backport for Allow logged out users to run FunctionEvaluator widget (T301670 T349055 T349057) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:52 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-tool1010
13:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
13:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1007.eqiad.wmnet with reason: host reimage
13:51 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 15 hosts with reason: not pooled, reimaging in progress
13:42 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
13:36 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
13:35 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
13:35 jforrester@deploy2002: Started scap: Backport for Allow logged out users to run FunctionEvaluator widget (T301670 T349055 T349057)
13:29 jforrester@deploy2002: Finished scap: Backport for Remove no-op $wgHiddenPrefs[] = 'prefershttps' (duration: 06m 54s)
13:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on an-tool1010.eqiad.wmnet with reason: Moving an-tool1010
13:25 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on an-tool1010.eqiad.wmnet with reason: Moving an-tool1010
13:24 jforrester@deploy2002: matmarex and jforrester: Continuing with sync
13:24 jforrester@deploy2002: matmarex and jforrester: Backport for Remove no-op $wgHiddenPrefs[] = 'prefershttps' synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:22 jforrester@deploy2002: Started scap: Backport for Remove no-op $wgHiddenPrefs[] = 'prefershttps'
13:21 jforrester@deploy2002: Finished scap: Backport for [wikifunctions] Allow logged-out users to run approved functions (T349055) (duration: 07m 59s)
13:16 jforrester@deploy2002: jforrester: Continuing with sync
13:14 jforrester@deploy2002: jforrester: Backport for [wikifunctions] Allow logged-out users to run approved functions (T349055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:13 jforrester@deploy2002: Started scap: Backport for [wikifunctions] Allow logged-out users to run approved functions (T349055)
13:11 jforrester@deploy2002: Finished scap: Backport for ExtensionDistributor: Add REL1_41 as the development snapshot (T346929) (duration: 07m 01s)
13:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1017.eqiad.wmnet
13:06 jforrester@deploy2002: jforrester: Continuing with sync
13:05 jforrester@deploy2002: jforrester: Backport for ExtensionDistributor: Add REL1_41 as the development snapshot (T346929) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:04 jforrester@deploy2002: Started scap: Backport for ExtensionDistributor: Add REL1_41 as the development snapshot (T346929)
13:01 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1017.eqiad.wmnet
10:56 urbanecm: mwmaint2002: foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue (T344428; all wikis, higher file limit)
10:24 urbanecm: mwmaint2002: foreachwikiindblist /srv/mediawiki/dblists/growth-biggest.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue (T344428; with higher file limit)
10:02 taavi: import kubernetes 1.23 packages for debian bookworm T284656
09:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-jumbo1007.eqiad.wmnet with OS bullseye
09:50 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
09:48 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
09:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53053 and previous config saved to /var/cache/conftool/dbconfig/20231025-090648-arnaudb.json
08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 90%: Maint over', diff saved to https://phabricator.wikimedia.org/P53052 and previous config saved to /var/cache/conftool/dbconfig/20231025-085143-arnaudb.json
08:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 80%: Maint over', diff saved to https://phabricator.wikimedia.org/P53051 and previous config saved to /var/cache/conftool/dbconfig/20231025-083638-arnaudb.json
08:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 70%: Maint over', diff saved to https://phabricator.wikimedia.org/P53050 and previous config saved to /var/cache/conftool/dbconfig/20231025-082133-arnaudb.json
08:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 60%: Maint over', diff saved to https://phabricator.wikimedia.org/P53049 and previous config saved to /var/cache/conftool/dbconfig/20231025-080628-arnaudb.json
07:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P53048 and previous config saved to /var/cache/conftool/dbconfig/20231025-075123-arnaudb.json
07:36 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 40%: Maint over', diff saved to https://phabricator.wikimedia.org/P53047 and previous config saved to /var/cache/conftool/dbconfig/20231025-073618-arnaudb.json
07:21 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 30%: Maint over', diff saved to https://phabricator.wikimedia.org/P53046 and previous config saved to /var/cache/conftool/dbconfig/20231025-072113-arnaudb.json
07:06 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 20%: Maint over', diff saved to https://phabricator.wikimedia.org/P53045 and previous config saved to /var/cache/conftool/dbconfig/20231025-070608-arnaudb.json
06:51 arnaudb@cumin1001: dbctl commit (dc=all): 'db1231 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53044 and previous config saved to /var/cache/conftool/dbconfig/20231025-065103-arnaudb.json
06:50 arnaudb: repooling db1231

2023-10-24

21:58 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:58 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:09 sukhe: running authdns-update for CR 968354
21:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS bookworm
21:06 jdrewniak@deploy2002: Finished scap: Backport for Disable Parsoid internal REST API everywhere except on Parsoid cluster (T334980) (duration: 12m 39s)
21:00 jdrewniak@deploy2002: jdrewniak and cscott: Continuing with sync
20:54 jdrewniak@deploy2002: jdrewniak and cscott: Backport for Disable Parsoid internal REST API everywhere except on Parsoid cluster (T334980) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:53 jdrewniak@deploy2002: Started scap: Backport for Disable Parsoid internal REST API everywhere except on Parsoid cluster (T334980)
20:49 jdrewniak@deploy2002: Finished scap: Backport for Enable Vector readability survey on select wikis (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232) (duration: 06m 57s)
20:44 jdrewniak@deploy2002: jdrewniak: Continuing with sync
20:44 jdrewniak@deploy2002: jdrewniak: Backport for Enable Vector readability survey on select wikis (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:42 jdrewniak@deploy2002: Started scap: Backport for Enable Vector readability survey on select wikis (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232), Follow-up to 74b5834: Add language prefix to Readability survey (T349232)
20:24 jdrewniak@deploy2002: Finished scap: Backport for Update comment about EditAttemptStep instruments, CentralAuth: Clarify why we don't use second-level domain for some wikis (T257852), Remove unused VisualEditor config settings (T344757 T344759), [noop] Explain more thoroughly how the '-' prefix works (duration: 07m 21s)
20:18 jdrewniak@deploy2002: tgr and matmarex and jdrewniak: Continuing with sync
20:18 jdrewniak@deploy2002: tgr and matmarex and jdrewniak: Backport for Update comment about EditAttemptStep instruments, CentralAuth: Clarify why we don't use second-level domain for some wikis (T257852), Remove unused VisualEditor config settings (T344757 T344759), [noop] Explain more thoroughly how the '-' prefix works synced to the testservers (htt
20:16 jdrewniak@deploy2002: Started scap: Backport for Update comment about EditAttemptStep instruments, CentralAuth: Clarify why we don't use second-level domain for some wikis (T257852), Remove unused VisualEditor config settings (T344757 T344759), [noop] Explain more thoroughly how the '-' prefix works
20:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
20:10 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
19:57 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@c585842]: T346373: Update mjolnir to use python 3.10 (duration: 00m 28s)
19:56 ebernhardson@deploy2002: Started deploy [airflow-dags/search@c585842]: T346373: Update mjolnir to use python 3.10
19:49 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
19:47 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
19:47 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
19:45 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
19:45 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
19:43 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
19:43 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
19:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS bookworm
19:00 andrew@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
19:00 andrew@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
18:59 andrew@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
18:59 andrew@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
18:55 andrew@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
18:55 andrew@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
18:54 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5003.wikimedia.org with OS bookworm
18:50 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
18:50 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:50 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
18:48 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:48 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
18:48 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
18:47 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
18:47 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:47 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
18:42 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
18:42 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:42 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
18:42 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
18:41 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
18:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase2012.codfw.wmnet
18:41 eevans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:41 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
18:41 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
18:39 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase2012.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1001"
18:39 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
18:38 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
18:37 eevans@cumin1001: START - Cookbook sre.dns.netbox
18:31 eevans@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase2012.codfw.wmnet
18:24 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
18:23 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
18:18 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
18:18 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
18:16 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
18:15 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
18:13 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
18:13 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.2 refs T348355
18:13 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
18:03 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
18:00 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
17:50 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
17:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
17:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5003.wikimedia.org with reason: host reimage
17:41 ejegg: fundraising civicrm upgraded from 8e8ffec0 to 6cfae26a
16:59 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns5003.wikimedia.org with OS bookworm
16:46 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@cc56357]: Deploying latest DAGs to analytics Airflow instance (duration: 01m 55s)
16:44 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@cc56357]: Deploying latest DAGs to analytics Airflow instance
15:48 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1100.eqiad.wmnet with OS bullseye
15:32 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
15:26 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
15:22 godog: clean up overlapping blocks from thanos for instance 'cloud'
15:11 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
15:10 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1100.eqiad.wmnet with OS bullseye
14:59 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
14:58 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1100.eqiad.wmnet with OS bullseye
14:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1016.eqiad.wmnet
14:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1016.eqiad.wmnet
14:48 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
14:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Adding db1227 depooled', diff saved to https://phabricator.wikimedia.org/P53041 and previous config saved to /var/cache/conftool/dbconfig/20231024-143204-arnaudb.json
14:01 TheresNoTime: close backport window
14:00 samtar@deploy2002: Finished scap: Backport for Fix typo (undefined event) (T349271) (duration: 09m 26s)
13:55 samtar@deploy2002: samtar and cparle: Continuing with sync
13:52 samtar@deploy2002: samtar and cparle: Backport for Fix typo (undefined event) (T349271) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:51 samtar@deploy2002: Started scap: Backport for Fix typo (undefined event) (T349271)
13:43 samtar@deploy2002: Finished scap: Backport for Add stream config for iOS schema (T347122) (duration: 07m 52s)
13:38 samtar@deploy2002: samtar and tsev: Continuing with sync
13:37 samtar@deploy2002: samtar and tsev: Backport for Add stream config for iOS schema (T347122) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:36 samtar@deploy2002: Started scap: Backport for Add stream config for iOS schema (T347122)
13:34 samtar@deploy2002: Finished scap: Backport for cirrus: add wgCirrusSearchUseEventBusBridge and enable it on testwiki (T325565) (duration: 06m 55s)
13:31 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
13:30 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
13:30 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
13:30 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
13:30 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
13:29 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
13:28 samtar@deploy2002: samtar and dcausse: Continuing with sync
13:28 samtar@deploy2002: samtar and dcausse: Backport for cirrus: add wgCirrusSearchUseEventBusBridge and enable it on testwiki (T325565) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:27 samtar@deploy2002: Started scap: Backport for cirrus: add wgCirrusSearchUseEventBusBridge and enable it on testwiki (T325565)
13:25 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
13:25 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
13:24 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
13:24 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
13:24 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
13:23 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
13:23 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
13:22 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
13:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
13:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
13:22 samtar@deploy2002: Finished scap: Backport for cirrus: add the mediawiki.cirrussearch.page_rerender.v1 stream (T325565) (duration: 07m 45s)
13:17 samtar@deploy2002: samtar and dcausse: Continuing with sync
13:15 samtar@deploy2002: samtar and dcausse: Backport for cirrus: add the mediawiki.cirrussearch.page_rerender.v1 stream (T325565) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:14 samtar@deploy2002: Started scap: Backport for cirrus: add the mediawiki.cirrussearch.page_rerender.v1 stream (T325565)
13:10 samtar@deploy2002: Finished scap: Backport for Increase Lua memory limit to 100MB on Wiktionary only (T165935) (duration: 07m 51s)
13:05 samtar@deploy2002: samtar and tstarling: Continuing with sync
13:04 samtar@deploy2002: samtar and tstarling: Backport for Increase Lua memory limit to 100MB on Wiktionary only (T165935) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:03 samtar@deploy2002: Started scap: Backport for Increase Lua memory limit to 100MB on Wiktionary only (T165935)
12:41 jbond: migrate idp_test to puppet7
11:17 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:17 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:16 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:15 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:13 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
11:12 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
11:12 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
11:12 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
11:11 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
11:11 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
11:09 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
11:08 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
11:08 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
11:08 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
11:08 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
11:07 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
11:05 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
11:05 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
11:04 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
11:04 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
10:59 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
10:59 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
10:58 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
10:57 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
10:57 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
10:57 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
10:57 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
10:56 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
10:54 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
10:53 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
10:47 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
10:46 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
10:44 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
10:43 jiji@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
10:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on an-test-client1002.eqiad.wmnet with reason: Cold booting with ganeti to increase RAM
10:42 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on an-test-client1002.eqiad.wmnet with reason: Cold booting with ganeti to increase RAM
10:42 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:41 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:40 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:39 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:27 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
10:27 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
10:26 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
10:26 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
10:15 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
10:14 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
10:10 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
10:10 jiji@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
10:08 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
10:07 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
10:04 jnuche@deploy2002: Pruned MediaWiki: 1.41.0-wmf.30 (duration: 02m 08s)
10:02 jnuche@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.2 refs T348355 (duration: 25m 27s)
09:49 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:48 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:45 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53039 and previous config saved to /var/cache/conftool/dbconfig/20231024-094329-arnaudb.json
09:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
09:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
09:36 jnuche@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.2 refs T348355
09:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 90%: Maint over', diff saved to https://phabricator.wikimedia.org/P53038 and previous config saved to /var/cache/conftool/dbconfig/20231024-092824-arnaudb.json
09:16 vgutierrez: upload golang-github-florianl-go-tc to apt.wm.o (bookworm) - T348837
09:13 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 80%: Maint over', diff saved to https://phabricator.wikimedia.org/P53037 and previous config saved to /var/cache/conftool/dbconfig/20231024-091319-arnaudb.json
09:11 taavi: restart ferm on deploy1002 T349587
09:04 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host deploy1002
09:03 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host deploy1002
08:58 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 70%: Maint over', diff saved to https://phabricator.wikimedia.org/P53036 and previous config saved to /var/cache/conftool/dbconfig/20231024-085815-arnaudb.json
08:43 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 60%: Maint over', diff saved to https://phabricator.wikimedia.org/P53035 and previous config saved to /var/cache/conftool/dbconfig/20231024-084310-arnaudb.json
08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1127.eqiad.wmnet onto db1227.eqiad.wmnet
08:28 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P53034 and previous config saved to /var/cache/conftool/dbconfig/20231024-082805-arnaudb.json
08:13 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 40%: Maint over', diff saved to https://phabricator.wikimedia.org/P53033 and previous config saved to /var/cache/conftool/dbconfig/20231024-081300-arnaudb.json
07:57 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 30%: Maint over', diff saved to https://phabricator.wikimedia.org/P53032 and previous config saved to /var/cache/conftool/dbconfig/20231024-075755-arnaudb.json
07:42 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 20%: Maint over', diff saved to https://phabricator.wikimedia.org/P53031 and previous config saved to /var/cache/conftool/dbconfig/20231024-074250-arnaudb.json
07:27 arnaudb@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53030 and previous config saved to /var/cache/conftool/dbconfig/20231024-072745-arnaudb.json
07:27 arnaudb: repool db2109
07:08 marostegui@cumin1001: START - Cookbook sre.mysql.clone of db1127.eqiad.wmnet onto db1227.eqiad.wmnet
06:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
06:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
06:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
06:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
06:54 godog: +50G to prometheus/analytics in eqiad
06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 from pc1015 - marostegui@cumin1001"
06:44 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 from pc1015 - marostegui@cumin1001"
06:42 marostegui@cumin1001: START - Cookbook sre.dns.netbox
06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
06:33 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
06:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15435
06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15435
05:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 39180
05:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 39180
03:51 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.2 refs T348355 (duration: 47m 53s)
03:03 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.2 refs T348355

2023-10-23

23:05 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
23:05 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
22:58 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
22:58 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
22:55 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
22:54 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
21:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm
21:05 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
21:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
20:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
20:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm
19:50 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
19:49 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6002.wikimedia.org with OS bookworm
18:33 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
18:32 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
18:31 herron: sretest1001:~/tmp/backfill$ promtool tsdb create-blocks-from rules --start 1672531200 --end 1698080718 --url http://prometheus.svc.eqiad.wmnet/ops/ logstash-requests.yaml T349521
18:19 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
18:18 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
18:14 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:13 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns6002.wikimedia.org with reason: host reimage
18:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns6002.wikimedia.org with reason: host reimage
18:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
18:00 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
17:59 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
17:59 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
17:57 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
17:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns6002.wikimedia.org with OS bookworm
17:41 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
17:40 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:26 ejegg: fundraising python tools upgraded from e56ae8ae to 9e84c689
17:25 ejegg: standalone (IPN listener) SmashPig upgraded from e27dfbce to c5b12dc3
16:57 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:47 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:46 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:09 marostegui@cumin1001: dbctl commit (dc=all): 'New host being setup', diff saved to https://phabricator.wikimedia.org/P53029 and previous config saved to /var/cache/conftool/dbconfig/20231023-160926-marostegui.json
16:08 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:08 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:52 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
15:51 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
15:05 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
15:05 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
14:56 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:55 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:55 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
14:55 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
14:55 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
14:55 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:55 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:54 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1021']
14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P53028 and previous config saved to /var/cache/conftool/dbconfig/20231023-145101-arnaudb.json
14:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Provision db1227 depooled as a candidate master for s7', diff saved to https://phabricator.wikimedia.org/P53027 and previous config saved to /var/cache/conftool/dbconfig/20231023-145011-arnaudb.json
14:48 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
14:48 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
14:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: provisionning db1227 - T344036
14:47 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
14:47 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: provisionning db1227 - T344036
14:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1021']
14:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1021']
14:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1021']
14:30 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
14:26 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
14:26 jayme: switched mw-api-int (mw-on-k8s) to certmanager certificates - T300033
14:26 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
14:25 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:24 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:14 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
14:14 jayme: switched mw-api-ext (mw-on-k8s) to certmanager certificates - T300033
14:13 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
14:13 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
14:12 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
14:06 jayme@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
14:06 jayme@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
14:06 jayme@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
14:06 jayme@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
14:06 jayme: switched mw-jobrunner (mw-on-k8s) to certmanager certificates - T300033
14:05 jayme@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
14:05 jayme@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
14:05 jayme@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
14:05 jayme@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
13:53 urbanecm@deploy2002: Finished scap: Backport for Stop writing to $wgCentralAuthCookieDomain in 'EnterMobileMode' hook (duration: 15m 50s)
13:52 moritzm: installing batik security updates
13:48 urbanecm@deploy2002: urbanecm and matmarex: Continuing with sync
13:38 urbanecm@deploy2002: urbanecm and matmarex: Backport for Stop writing to $wgCentralAuthCookieDomain in 'EnterMobileMode' hook synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:37 urbanecm@deploy2002: Started scap: Backport for Stop writing to $wgCentralAuthCookieDomain in 'EnterMobileMode' hook
13:37 urbanecm@deploy2002: Finished scap: Backport for New stream for Android Patroller tasks feature (T348816) (duration: 06m 54s)
13:31 urbanecm@deploy2002: urbanecm and sharvaniharan: Continuing with sync
13:31 urbanecm@deploy2002: urbanecm and sharvaniharan: Backport for New stream for Android Patroller tasks feature (T348816) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:30 urbanecm@deploy2002: Started scap: Backport for New stream for Android Patroller tasks feature (T348816)
13:29 urbanecm@deploy2002: Finished scap: Backport for Remove 'currentProto'/'finalProto'/'proto' business (T348852), Remove unused $wgIncludeLegacyJavaScript, Remove $wgApiFrameOptions override for enwiki and zhwiki (T131183) (duration: 11m 56s)
13:23 urbanecm@deploy2002: matmarex and urbanecm: Continuing with sync
13:18 urbanecm@deploy2002: matmarex and urbanecm: Backport for Remove 'currentProto'/'finalProto'/'proto' business (T348852), Remove unused $wgIncludeLegacyJavaScript, Remove $wgApiFrameOptions override for enwiki and zhwiki (T131183) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:17 urbanecm@deploy2002: Started scap: Backport for Remove 'currentProto'/'finalProto'/'proto' business (T348852), Remove unused $wgIncludeLegacyJavaScript, Remove $wgApiFrameOptions override for enwiki and zhwiki (T131183)
13:16 urbanecm@deploy2002: Finished scap: Backport for wikidatawiki: Switch property for determining Lexeme language code (T348923) (duration: 12m 50s)
13:11 urbanecm@deploy2002: migr and urbanecm: Continuing with sync
13:05 urbanecm@deploy2002: migr and urbanecm: Backport for wikidatawiki: Switch property for determining Lexeme language code (T348923) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:04 moritzm: installing libxpm security updates on buster
13:04 urbanecm@deploy2002: Started scap: Backport for wikidatawiki: Switch property for determining Lexeme language code (T348923)
12:41 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
12:40 jayme: switched mw-web (mw-on-k8s) to certmanager certificates - T300033
12:40 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
12:40 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
12:39 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
12:33 moritzm: installing libx11 security updates
12:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1131.eqiad.wmnet onto db1231.eqiad.wmnet
11:49 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@054e07d] (releasing): (no justification provided) (duration: 00m 42s)
11:49 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@054e07d] (releasing): (no justification provided)
11:49 moritzm: added Balthazar to pwstore
11:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Server not yet in productin use
11:33 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Server not yet in productin use
10:51 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1001.eqiad.wmnet
10:51 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:51 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
10:51 arnaudb@cumin1001: START - Cookbook sre.mysql.clone of db1131.eqiad.wmnet onto db1231.eqiad.wmnet
10:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depool db1131 T344036', diff saved to https://phabricator.wikimedia.org/P53025 and previous config saved to /var/cache/conftool/dbconfig/20231023-105036-arnaudb.json
10:50 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
10:41 jayme: switched mw-debug (mw-on-k8s) to certmanager certificates - T300033
10:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1002.eqiad.wmnet
10:40 brouberol@cumin1001: START - Cookbook sre.dns.netbox
10:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: provisionning - T344036
10:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: provisionning - T344036
10:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: provisionning - T344036
10:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: provisionning - T344036
10:36 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1002.eqiad.wmnet
10:35 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:34 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:34 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:34 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:32 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1001.eqiad.wmnet
10:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Provision db1231 depooled as a candidate master for s6', diff saved to https://phabricator.wikimedia.org/P53024 and previous config saved to /var/cache/conftool/dbconfig/20231023-103202-arnaudb.json
10:31 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:29 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
10:28 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
10:26 jbond@cumin1001: START - Cookbook sre.dns.netbox
10:26 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:26 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
10:25 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test - jbond@cumin1001"
10:23 jbond@cumin1001: START - Cookbook sre.dns.netbox
10:20 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:19 jbond@cumin1001: START - Cookbook sre.dns.netbox
10:13 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1002.eqiad.wmnet
10:13 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:13 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
10:12 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
10:11 taavi: reprepro: drop thirdparty/kubeadm-k8s-1-22 component
10:10 brouberol@cumin1001: START - Cookbook sre.dns.netbox
10:04 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1002.eqiad.wmnet
10:02 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
10:02 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
09:57 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1003.eqiad.wmnet
09:57 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:55 brouberol@cumin1001: START - Cookbook sre.dns.netbox
09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
09:54 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
09:51 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
09:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
09:50 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
09:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
09:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
09:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
09:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
09:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
09:47 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
09:37 brouberol@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: kafka-jumbo1004.eqiad.wmnet
09:37 brouberol@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: kafka-jumbo1004.eqiad.wmnet
09:36 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001 - brouberol@cumin1001 - T336044"
09:35 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001 - brouberol@cumin1001 - T336044"
09:32 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
09:32 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
09:31 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
09:31 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
09:28 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:21 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
09:21 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
09:19 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
09:18 elukey@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:18 elukey@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:17 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:13 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
09:00 brouberol@cumin1001: START - Cookbook sre.dns.netbox
08:55 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1004.eqiad.wmnet
08:52 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1005.eqiad.wmnet
08:52 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:52 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
08:51 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
08:38 brouberol@cumin1001: START - Cookbook sre.dns.netbox
08:33 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1005.eqiad.wmnet
08:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1003.eqiad.wmnet
08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1006.eqiad.wmnet
08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:24 brouberol@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
08:21 brouberol@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1001"
08:19 brouberol@cumin1001: START - Cookbook sre.dns.netbox
08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1003.eqiad.wmnet
08:14 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1006.eqiad.wmnet
08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1004.eqiad.wmnet
08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1004.eqiad.wmnet
08:01 moritzm: installing Linux kernel updates for Buster 5.10 backport
07:42 taavi: mwscript purgeList.php enwiki <<< "https://en.wikipedia.org/static/images/project-logos/knwiktionary.png" (and for 1.5x and 2x variants)
07:36 hashar: Upgrading CI Jenkins # T349282
07:26 taavi@deploy2002: Finished scap: Backport for knwiktionary: update logo (T349036), dewiktionary: add tagline (T348978), hiwikisource: Adjust width-height ratio of logo to fix display issue (T310961) (duration: 16m 59s)
07:22 elukey@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
07:22 elukey@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
07:21 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
07:20 taavi@deploy2002: taavi and anzx: Continuing with sync
07:17 taavi@deploy2002: taavi and anzx: Backport for knwiktionary: update logo (T349036), dewiktionary: add tagline (T348978), hiwikisource: Adjust width-height ratio of logo to fix display issue (T310961) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:09 taavi@deploy2002: Started scap: Backport for knwiktionary: update logo (T349036), dewiktionary: add tagline (T348978), hiwikisource: Adjust width-height ratio of logo to fix display issue (T310961)

2023-10-21

00:10 krinkle@deploy2002: Synchronized wmf-config/logging.php: (no justification provided) (duration: 06m 03s)

2023-10-20

22:47 cstone: civicrm upgraded from ca081c11 to 8e8ffec0
21:39 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:38 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:33 ejegg: fundraising civicrm upgraded from 1263a91b to ca081c11
21:06 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
21:06 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:21 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:20 ejegg: fundraising civicrm upgraded from e57425a9 to 1263a91b
20:20 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:19 bvibber: brion running requeueTranscodes.php on mwmaint2002 for audio and video transcode backfill, will use some jobqueue cpu but should be nicely throttled
20:05 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:05 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:46 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:44 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:35 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:35 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:08 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
19:07 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:07 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:06 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:06 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
19:05 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:05 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
19:05 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:57 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:56 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:43 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:42 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:42 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:41 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:36 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:36 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:59 topranks: Disabling BGP from asw1-by27-esams to cr1-esams to move BGP peers to new group T349125
15:55 topranks: Disabling BGP from asw1-by27-esams to cr2-esams to move BGP peers to new group T349125
15:47 topranks: Disabling BGP from asw1-bw27-esams to cr2-esams to move BGP peers to new group T349125
15:39 topranks: Disabling BGP from asw1-bw27-esams to cr1-esams to move BGP peers to new group T349125
15:37 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@fd88cfa]: Update kafka hosts mjolnir communicates with (duration: 00m 27s)
15:36 ebernhardson@deploy2002: Started deploy [airflow-dags/search@fd88cfa]: Update kafka hosts mjolnir communicates with
15:26 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: changing bgp config on esams switches
15:25 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: changing bgp config on esams switches
15:18 topranks: Disabling BGP from asw1-b13-drmrs to cr1-drmrs to move BGP peers to new group T349125
15:16 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:15 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:13 topranks: Disabling BGP from asw1-b13-drmrs to cr2-drmrs to move BGP peers to new group T349125
15:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: changing bgp config on drmrs switches
15:09 topranks: Disabling BGP from asw1-b12-drmrs to cr2-drmrs to move BGP peers to new group T349125
15:08 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: changing bgp config on drmrs switches
14:57 topranks: Disabling BGP from asw1-b12-drmrs to cr1-drmrs to move BGP peers to new group T349125
14:49 ejegg: payments-wiki upgraded from 87cda414 to 7575f0e6
14:33 topranks: Disabling BGP from ssw1-f1-eqiad to cr2-eqiad to move BGP peers to new group T349125
away: fundraising civicrm upgraded from f11ad380 to e57425a9
13:19 topranks: Disabling BGP from ssw1-e1-eqiad to cr1-eqiad to move BGP peers to new group T349125
12:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bullseye
11:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
11:52 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
11:42 jynus: refactoring tables @ db1164[bbackups] T349360
11:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
11:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
10:46 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:46 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:39 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:19 godog: powercycle titan1001
10:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:13 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
10:12 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
10:04 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
09:58 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
08:45 brouberol@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts kafka-jumbo1006.eqiad.wmnet
08:43 brouberol@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1006.eqiad.wmnet
07:43 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
07:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
07:26 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-f8/ssw links - ayounsi@cumin1001"
07:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on etherpad1003.eqiad.wmnet with reason: Reboot to use new CPU and memory config
07:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on etherpad1003.eqiad.wmnet with reason: Reboot to use new CPU and memory config
07:22 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
07:21 jelto: increase etherpad1003 CPU and memory (1CPU,1GB -> 2CPU,2GB) - T348386
06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1119 from dbctl T349272', diff saved to https://phabricator.wikimedia.org/P53021 and previous config saved to /var/cache/conftool/dbconfig/20231020-061822-marostegui.json
03:15 tstarling@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Enable source maps everywhere T47514 (duration: 06m 26s)
03:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye

2023-10-19

22:37 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
22:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
21:32 hmonroy@deploy2002: Finished scap: Backport for PhonosButton: use text() instead of append() (T349312) (duration: 06m 48s)
21:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
21:27 hmonroy@deploy2002: hmonroy: Continuing with sync
21:27 hmonroy@deploy2002: hmonroy: Backport for PhonosButton: use text() instead of append() (T349312) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:25 hmonroy@deploy2002: Started scap: Backport for PhonosButton: use text() instead of append() (T349312)
21:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
20:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
20:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
20:02 brennen: utc late backport window: no patches
18:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:09 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.1 refs T348354
17:33 urandom: Decommissioning Cassandra, restbase1018-{a,b,c} — T328490
16:50 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:49 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:17 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
16:16 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
16:16 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
16:15 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
16:15 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:14 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:42 jgiannelos@deploy2002: Finished deploy [restbase/deploy@a311c5d]: (no justification provided) (duration: 00m 54s)
15:41 jgiannelos@deploy2002: Started deploy [restbase/deploy@a311c5d]: (no justification provided)
15:30 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:30 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:25 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
15:15 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1006.eqiad.wmnet with reason: host is being decommissioned
15:15 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1006.eqiad.wmnet with reason: host is being decommissioned
15:15 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1005.eqiad.wmnet with reason: host is being decommissioned
15:14 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1005.eqiad.wmnet with reason: host is being decommissioned
15:14 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1004.eqiad.wmnet with reason: host is being decommissioned
15:14 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1004.eqiad.wmnet with reason: host is being decommissioned
15:14 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1003.eqiad.wmnet with reason: host is being decommissioned
15:13 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1003.eqiad.wmnet with reason: host is being decommissioned
15:13 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1002.eqiad.wmnet with reason: host is being decommissioned
15:13 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1002.eqiad.wmnet with reason: host is being decommissioned
15:13 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: host is being decommissioned
15:13 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on kafka-jumbo1001.eqiad.wmnet with reason: host is being decommissioned
15:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudnet1008-dev']
15:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudnet1007-dev']
15:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1007-dev']
15:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1008-dev']
15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet1008-dev.eqiad.wmnet']
15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet1007-dev.eqiad.wmnet']
15:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
15:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
15:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
15:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
15:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol1009-dev.eqiad.wmnet']
15:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
15:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
15:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
15:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1007-dev.eqiad.wmnet']
14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet1008-dev.eqiad.wmnet']
14:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
14:58 elukey: powercycle titan1001
14:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev.eqiad.wmnet']
14:57 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev.eqiad.wmnet']
14:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev']
14:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev']
14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1009-dev']
14:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev']
14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1010-dev']
14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1009-dev']
14:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1009-dev']
14:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1010-dev']
14:54 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:44 elukey: powercycle titan1001
14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:35 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
14:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet1007-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudnet1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:29 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:28 jclark@cumin1001: START - Cookbook sre.dns.netbox
14:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudnet1007-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:04 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1010-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1009-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
14:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:01 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol100[8-10]-dev cloudnet100[7-8]-dev - jclark@cumin1001"
14:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol100[8-10]-dev cloudnet100[7-8]-dev - jclark@cumin1001"
13:58 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:48 wmde-fisch@deploy2002: Finished scap: Backport for Revert "Revert "Workaround to center search terms label"" (T252346) (duration: 07m 50s)
13:43 wmde-fisch@deploy2002: wmde-fisch: Continuing with sync
13:42 wmde-fisch@deploy2002: wmde-fisch: Backport for Revert "Revert "Workaround to center search terms label"" (T252346) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:41 wmde-fisch@deploy2002: Started scap: Backport for Revert "Revert "Workaround to center search terms label"" (T252346)
13:00 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:00 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: noop - volans@cumin1001"
12:59 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: noop - volans@cumin1001"
12:52 volans@cumin1001: START - Cookbook sre.dns.netbox
12:50 volans@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
12:50 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
12:50 volans@cumin2002: START - Cookbook sre.dns.netbox
12:50 volans@cumin1001: START - Cookbook sre.dns.netbox
11:47 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
11:46 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@6f09297] (releasing): (no justification provided) (duration: 01m 08s)
11:44 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@6f09297] (releasing): (no justification provided)
11:30 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
08:36 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
07:33 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
07:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
07:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
07:17 tgr: UTC morning deploys done
07:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
07:13 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
06:57 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
06:34 volans: enabled distributed locking support in spicerack/cookbooks T341973
06:32 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
06:32 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
06:31 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
06:31 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
05:14 tchin@deploy2002: Finished deploy [airflow-dags/analytics@60950f6]: Deploying airflow [data-engineering/airflow-dags@60950f6b] (duration: 01m 12s)
05:12 tchin@deploy2002: Started deploy [airflow-dags/analytics@60950f6]: Deploying airflow [data-engineering/airflow-dags@60950f6b]

2023-10-18

23:58 eileen: civicrm upgraded from 4a5634ed to f11ad380
22:12 eileen: civicrm upgraded from 52202980 to 4a5634ed
21:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
21:54 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
21:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
21:35 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with reboot policy FORCED
21:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1229.eqiad.wmnet with OS bullseye
21:23 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:16 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:08 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:08 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
21:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1229.eqiad.wmnet with reason: host reimage
20:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1229.eqiad.wmnet with reason: host reimage
20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
20:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
20:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
20:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
20:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
20:42 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
19:40 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
19:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
19:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
19:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
19:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:25 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:16 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6001.wikimedia.org with OS bookworm
19:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
19:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1110.eqiad.wmnet with OS bullseye
19:00 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
19:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1104.eqiad.wmnet with OS bullseye
19:00 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns6001.wikimedia.org with reason: host reimage
18:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns6001.wikimedia.org with reason: host reimage
18:36 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:36 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:35 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:35 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:33 brennen@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.1 refs T348354 (duration: 05m 40s)
18:28 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.1 refs T348354
18:20 brennen: train 1.42.0-wmf.1 (T348354): logs clean and no blockers, rolling to group1
18:17 brennen@deploy2002: Finished scap: Backport for Fix Typo in OS Dark Mode field (T346106) (duration: 13m 46s)
18:17 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns6001.wikimedia.org with OS bookworm
18:12 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:12 brennen@deploy2002: brennen and jdlrobson: Continuing with sync
18:05 brennen@deploy2002: brennen and jdlrobson: Backport for Fix Typo in OS Dark Mode field (T346106) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:03 brennen@deploy2002: Started scap: Backport for Fix Typo in OS Dark Mode field (T346106)
17:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
17:52 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:52 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
17:51 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
17:47 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:46 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
17:44 sukhe: running authdns-update for CR 966573
17:43 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:43 tchin@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
17:34 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
17:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
17:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
17:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1110
17:28 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1110
17:27 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:26 jclark@cumin1001: START - Cookbook sre.dns.netbox
17:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
17:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
17:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
17:22 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
17:13 XioNoX: restart turnilo to pickup UI change
17:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1110.eqiad.wmnet with OS bullseye
17:07 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
17:07 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
17:05 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS bullseye
17:04 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
17:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
17:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1100.eqiad.wmnet with OS bullseye
17:04 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
17:03 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
17:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1101.eqiad.wmnet with OS bullseye
17:01 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
17:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1102.eqiad.wmnet with OS bullseye
16:56 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:54 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
16:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
16:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
16:40 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
16:39 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
16:37 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
16:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1010.eqiad.wmnet with OS bullseye
16:30 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:30 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:29 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
16:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1103.eqiad.wmnet with OS bullseye
16:28 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:28 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
16:28 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
16:26 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
16:25 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1102.eqiad.wmnet with OS bullseye
16:24 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
16:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS bullseye
16:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS bullseye
16:22 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1100']
16:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
16:20 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1101']
16:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1101']
16:19 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
16:19 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
16:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
16:18 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
16:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
16:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:18 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1102 - jclark@cumin1001"
16:17 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
16:17 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
16:17 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1102 - jclark@cumin1001"
16:17 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
16:16 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:15 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:15 jclark@cumin1001: START - Cookbook sre.dns.netbox
16:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
16:14 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp1110']
16:14 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
16:14 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
16:13 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
16:11 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1010.eqiad.wmnet with reason: host reimage
16:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
16:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
16:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
16:08 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1010.eqiad.wmnet with reason: host reimage
16:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
16:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
16:07 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
16:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1105.eqiad.wmnet with OS bullseye
16:07 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:06 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:06 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
16:05 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
16:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1102']
16:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
16:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1106.eqiad.wmnet with OS bullseye
16:02 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1110.eqiad.wmnet with OS bullseye
15:53 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:52 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:51 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
15:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
15:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1107.eqiad.wmnet with OS bullseye
15:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:50 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS bullseye
15:50 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1010.eqiad.wmnet with OS bullseye
15:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
15:49 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
15:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
15:47 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1010.eqiad.wmnet with OS bullseye
15:46 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
15:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1111.eqiad.wmnet with OS bullseye
15:46 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
15:43 inflatador: bking@deploy2002 destroy dse-k8s-services instance of rdf-streaming-updater T349095
15:40 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
15:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:32 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
15:29 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS bullseye
15:29 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
15:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
15:28 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:28 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:28 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
15:26 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1106']
15:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
15:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
15:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1106']
15:13 dancy@deploy2002: Finished deploy [releng/jenkins-deploy@2cf7af2] (releasing): (no justification provided) (duration: 00m 44s)
15:12 dancy@deploy2002: Started deploy [releng/jenkins-deploy@2cf7af2] (releasing): (no justification provided)
15:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
15:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1107']
15:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1107']
15:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1114.eqiad.wmnet with OS bullseye
15:07 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:06 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:04 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS bullseye
15:03 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
15:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
15:02 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
15:02 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1010.eqiad.wmnet with OS bullseye
15:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
15:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
15:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
15:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:00 jclark@cumin1001: START - Cookbook sre.dns.netbox
14:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
14:59 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1010.eqiad.wmnet with OS bullseye
14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
14:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
14:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
14:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1111
14:58 elukey: powercycle titan1001 (no mgmt console / tty available, no host metrics, no ssh)
14:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1111
14:57 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
14:57 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
14:57 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
14:57 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
14:56 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
14:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
14:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1108.eqiad.wmnet with OS bullseye
14:51 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
14:46 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
14:44 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host aqs1010.eqiad.wmnet with OS bullseye
14:40 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:31 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
14:25 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1114']
14:24 ejegg: fundraising civicrm upgraded from d8fe92e3 to 52202980
14:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
14:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
14:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
14:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
13:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
13:23 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
13:23 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
13:14 volans: uploaded spicerack_8.0.2 to apt.wikimedia.org bullseye-wikimedia
13:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
13:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
13:06 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
13:06 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
13:05 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
13:05 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
13:04 sukhe: running authdns-update for CR 966243
13:04 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
13:04 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
13:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53008 and previous config saved to /var/cache/conftool/dbconfig/20231018-130343-arnaudb.json
13:03 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P53007 and previous config saved to /var/cache/conftool/dbconfig/20231018-130325-arnaudb.json
12:59 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
12:59 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
12:52 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
12:51 jbond: upload puppet_7.23.0-1~debu11u1 (bullseye backport
12:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P53006 and previous config saved to /var/cache/conftool/dbconfig/20231018-124838-arnaudb.json
12:48 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P53005 and previous config saved to /var/cache/conftool/dbconfig/20231018-124820-arnaudb.json
12:44 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
12:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
12:44 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
12:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
12:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
12:39 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2109.codfw.wmnet with reason: db2109 downtime while repooling
12:38 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
12:37 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
12:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P53004 and previous config saved to /var/cache/conftool/dbconfig/20231018-123333-arnaudb.json
12:33 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P53003 and previous config saved to /var/cache/conftool/dbconfig/20231018-123315-arnaudb.json
12:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53002 and previous config saved to /var/cache/conftool/dbconfig/20231018-121828-arnaudb.json
12:18 arnaudb@cumin1001: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P53001 and previous config saved to /var/cache/conftool/dbconfig/20231018-121811-arnaudb.json
12:17 arnaudb: repool db2161 and db1126
11:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1009.eqiad.wmnet
11:44 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1009.eqiad.wmnet
11:43 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
11:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
11:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
11:29 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
11:29 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
11:23 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
11:21 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
11:20 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
11:16 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
11:16 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
11:14 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
11:12 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
11:11 ladsgroup@deploy2002: Finished scap: Backport for Set s6 and s8 to write both for pagelinks migration (T345732) (duration: 10m 10s)
11:08 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
11:05 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:02 ladsgroup@deploy2002: ladsgroup: Backport for Set s6 and s8 to write both for pagelinks migration (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:01 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
11:01 ladsgroup@deploy2002: Started scap: Backport for Set s6 and s8 to write both for pagelinks migration (T345732)
10:40 volans: re-enabled puppet on the cumin hosts. installed spicerack 8.0.1 on the cumin hosts
10:37 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
10:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1007.eqiad.wmnet
10:32 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
10:28 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:19 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
10:16 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
10:09 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
10:07 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
10:03 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
09:54 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
09:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on stat1009.eqiad.wmnet with reason: Extending downtime for stat1009
09:52 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on stat1009.eqiad.wmnet with reason: Extending downtime for stat1009
09:48 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
09:47 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
09:25 volans: uploaded spicerack_8.0.1 to apt.wikimedia.org bullseye-wikimedia
09:23 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
09:23 jynus: aborting backup of es1022, es1025 (there was already another backup running)
09:23 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
09:22 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
09:21 jynus: starting new backup of es1022, es1025 (new clusters only)
09:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1006.eqiad.wmnet
09:20 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
09:19 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
09:17 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
09:17 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
09:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on stat1009.eqiad.wmnet with reason: Moving /home to /srv/home on stat1009 and rebooting
09:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on stat1009.eqiad.wmnet with reason: Moving /home to /srv/home on stat1009 and rebooting
09:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1007.eqiad.wmnet
09:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1006.eqiad.wmnet
09:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
09:10 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
09:06 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1002-dev.eqiad.wmnet with reason: host reimage
09:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet
09:02 aqu@deploy2002: Finished deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy - Second try [airflow-dags@c17c91ce] (duration: 00m 06s)
09:02 aqu@deploy2002: Started deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy - Second try [airflow-dags@c17c91ce]
09:01 aqu@deploy2002: deploy aborted: Fix following yesterday weekly train deploy [airflow-dags@c17c91ce] (duration: 01m 10s)
09:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@c17c91c]: Fix following yesterday weekly train deploy [airflow-dags@c17c91ce]
08:54 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1002-dev.eqiad.wmnet with OS bookworm
08:51 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
08:40 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
08:18 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
08:14 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
08:08 volans@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
08:06 volans@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
08:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-e8/ssw links - ayounsi@cumin1001"
08:02 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add A/PTR for lsw1-e8/ssw links - ayounsi@cumin1001"
07:54 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
07:47 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
07:46 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2132.codfw.wmnet with OS bookworm
07:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2132.codfw.wmnet with reason: host reimage
07:37 volans: temporarily disabled puppet on the A:cumin hosts to deploy and test spicerack v8.0.0
07:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2132.codfw.wmnet with reason: host reimage
07:28 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
07:28 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
07:28 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
07:28 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
07:27 filippo@deploy2002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
07:27 filippo@deploy2002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
07:20 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2132.codfw.wmnet with OS bookworm
07:06 aqu@deploy2002: Finished deploy [airflow-dags/analytics@5dcce3b]: Add missing MR in yesterday weekly train (run 2) [airflow-dags@5dcce3bd] (duration: 00m 07s)
07:05 aqu@deploy2002: Started deploy [airflow-dags/analytics@5dcce3b]: Add missing MR in yesterday weekly train (run 2) [airflow-dags@5dcce3bd]
07:05 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 06s)
07:05 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
07:04 aqu@deploy2002: deploy aborted: Add missing MR in yesterday weekly train [airflow-dags@5dcce3bd] (duration: 03m 52s)
07:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@5dcce3b]: Add missing MR in yesterday weekly train [airflow-dags@5dcce3bd]
07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2160.codfw.wmnet with OS bookworm
06:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2160.codfw.wmnet with reason: host reimage
06:38 XioNoX: push pfw policies - T349101
06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2160.codfw.wmnet with reason: host reimage
06:16 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2160.codfw.wmnet with OS bookworm
06:08 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2160.codfw.wmnet with OS bookworm
05:57 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2160.codfw.wmnet with OS bookworm
01:22 eileen: civicrm upgraded from da11d010 to d8fe92e3

2023-10-17

22:03 herron: pyrra.wm.o upgraded to 0.7.1 T302995
21:32 catrope@deploy2002: backport Cancelled
21:10 inflatador: bking@cumin1001 repool wdqs eqiad after rdf-streaming-updater fix
21:05 catrope@deploy2002: Finished scap: Backport for Add language prefix to Readability survey (T347208) (duration: 13m 03s)
21:00 catrope@deploy2002: catrope and jdrewniak: Continuing with sync
20:53 catrope@deploy2002: catrope and jdrewniak: Backport for Add language prefix to Readability survey (T347208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:52 inflatador: bking@cumin1001 depool wdqs eqiad due to rdf-streaming-updater failure
20:52 catrope@deploy2002: Started scap: Backport for Add language prefix to Readability survey (T347208)
20:36 volans: uploaded spicerack_8.0.0 to apt.wikimedia.org bullseye-wikimedia
20:36 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
20:36 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
20:35 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
20:34 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
20:31 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
20:31 eevans@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
20:29 catrope@deploy2002: Finished scap: Backport for Fixes incorrect Hebrew logo and applies gotwiki (T341253 T341251) (duration: 09m 59s)
20:27 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
20:26 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/echostore: apply
20:24 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
20:24 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/echostore: apply
20:24 catrope@deploy2002: jdlrobson and catrope: Continuing with sync
20:21 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
20:21 eevans@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
20:20 catrope@deploy2002: jdlrobson and catrope: Backport for Fixes incorrect Hebrew logo and applies gotwiki (T341253 T341251) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:19 catrope@deploy2002: Started scap: Backport for Fixes incorrect Hebrew logo and applies gotwiki (T341253 T341251)
20:16 catrope@deploy2002: Finished scap: Backport for Wordmark for blk wiktionary and got wikipedia (T341253 T341257) (duration: 11m 17s)
20:11 catrope@deploy2002: catrope and jdlrobson: Continuing with sync
20:06 catrope@deploy2002: catrope and jdlrobson: Backport for Wordmark for blk wiktionary and got wikipedia (T341253 T341257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:05 catrope@deploy2002: Started scap: Backport for Wordmark for blk wiktionary and got wikipedia (T341253 T341257)
18:46 hashar@deploy2002: Finished scap: Backport for logging: reorder wmgMonologProcessors entries (T349086) (duration: 08m 14s)
18:43 hashar@deploy2002: hashar: Continuing with sync
18:39 hashar@deploy2002: hashar: Backport for logging: reorder wmgMonologProcessors entries (T349086) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:38 hashar@deploy2002: Started scap: Backport for logging: reorder wmgMonologProcessors entries (T349086)
18:25 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.1 refs T348354
18:18 brennen: train 1.42.0-wmf.1 (T348354): blockers resolved, rolling to group0
18:16 brennen@deploy2002: Finished scap: Backport for Pass full content to Parsoid for redirect pages (T349087) (duration: 07m 42s)
18:11 brennen@deploy2002: brennen: Continuing with sync
18:09 brennen@deploy2002: brennen: Backport for Pass full content to Parsoid for redirect pages (T349087) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:08 brennen@deploy2002: Started scap: Backport for Pass full content to Parsoid for redirect pages (T349087)
17:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
17:05 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
16:22 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:22 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:50 sukhe: running authdns-update for CR 966564
15:09 brennen@deploy2002: Finished deploy [phabricator/deployment@745d703]: deploy to phab1004 for T349038 (duration: 00m 57s)
15:08 brennen@deploy2002: Started deploy [phabricator/deployment@745d703]: deploy to phab1004 for T349038
15:07 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
15:07 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@745d703]: test deploy to phab2002 for T349038 (duration: 00m 33s)
15:06 brennen@deploy2002: Started deploy [phabricator/deployment@745d703]: test deploy to phab2002 for T349038
15:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
15:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
15:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator maintenance
15:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator maintenance
15:03 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
15:02 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
14:59 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
14:58 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
14:28 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
14:28 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
14:24 denisse@deploy2002: Finished deploy [performance/navtiming@2e17c67]: (no justification provided) (duration: 00m 05s)
14:24 denisse@deploy2002: Started deploy [performance/navtiming@2e17c67]: (no justification provided)
14:11 jdrewniak@deploy2002: Finished scap: Backport for ParserOutputAccess: Fix local cache when page is edited within the process (T349033) (duration: 15m 56s)
14:05 jdrewniak@deploy2002: jdrewniak: Continuing with sync
14:03 tchin@deploy2002: Finished deploy [airflow-dags/analytics_test@be05071]: Regular analytics weekly train (duration: 00m 06s)
14:03 tchin@deploy2002: Started deploy [airflow-dags/analytics_test@be05071]: Regular analytics weekly train
14:01 tchin@deploy2002: Finished deploy [airflow-dags/analytics@fae5764]: (no justification provided) (duration: 01m 22s)
13:59 tchin@deploy2002: Started deploy [airflow-dags/analytics@fae5764]: (no justification provided)
13:56 jdrewniak@deploy2002: jdrewniak: Backport for ParserOutputAccess: Fix local cache when page is edited within the process (T349033) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:55 jdrewniak@deploy2002: Started scap: Backport for ParserOutputAccess: Fix local cache when page is edited within the process (T349033)
13:52 tchin@deploy2002: Finished deploy [analytics/refinery@0d09fbd] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d09fbdc] (duration: 02m 59s)
13:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1225.eqiad.wmnet with reason: db1225 downtime for restoration
13:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1225.eqiad.wmnet with reason: db1225 downtime for restoration
13:49 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
13:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
13:49 tchin@deploy2002: Started deploy [analytics/refinery@0d09fbd] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0d09fbdc]
13:49 tchin@deploy2002: Finished deploy [analytics/refinery@0d09fbd] (thin): Regular analytics weekly train THIN [analytics/refinery@0d09fbdc] (duration: 00m 07s)
13:49 tchin@deploy2002: Started deploy [analytics/refinery@0d09fbd] (thin): Regular analytics weekly train THIN [analytics/refinery@0d09fbdc]
13:48 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
13:48 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
13:48 tchin@deploy2002: Finished deploy [analytics/refinery@0d09fbd]: Regular analytics weekly train [analytics/refinery@0d09fbdc] (duration: 07m 24s)
13:47 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2160.codfw.wmnet with OS bookworm
13:46 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
13:46 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
13:40 tchin@deploy2002: Started deploy [analytics/refinery@0d09fbd]: Regular analytics weekly train [analytics/refinery@0d09fbdc]
13:40 jdrewniak@deploy2002: Finished scap: Backport for Enable Vector readability survey on select wikis (T347208) (duration: 09m 50s)
13:34 jdrewniak@deploy2002: jdrewniak: Continuing with sync
13:32 jdrewniak@deploy2002: jdrewniak: Backport for Enable Vector readability survey on select wikis (T347208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:30 jdrewniak@deploy2002: Started scap: Backport for Enable Vector readability survey on select wikis (T347208)
13:26 jdrewniak@deploy2002: Backport cancelled.
13:15 jdrewniak@deploy2002: Backport cancelled.
12:59 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2160.codfw.wmnet with OS bookworm
12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119 T339185', diff saved to https://phabricator.wikimedia.org/P52995 and previous config saved to /var/cache/conftool/dbconfig/20231017-124916-root.json
12:28 urandom: Starting Cassandra decommission(s) of restbase1017 —
11:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52994 and previous config saved to /var/cache/conftool/dbconfig/20231017-115217-arnaudb.json
11:39 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depool db1126 T349077', diff saved to https://phabricator.wikimedia.org/P52993 and previous config saved to /var/cache/conftool/dbconfig/20231017-113809-arnaudb.json
11:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P52992 and previous config saved to /var/cache/conftool/dbconfig/20231017-113711-arnaudb.json
11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db1126 with weight 275 T349077', diff saved to https://phabricator.wikimedia.org/P52991 and previous config saved to /var/cache/conftool/dbconfig/20231017-113432-arnaudb.json
11:29 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
11:27 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
11:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P52990 and previous config saved to /var/cache/conftool/dbconfig/20231017-112204-arnaudb.json
11:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Promote db1209 to s8 primary T349077', diff saved to https://phabricator.wikimedia.org/P52989 and previous config saved to /var/cache/conftool/dbconfig/20231017-111720-arnaudb.json
11:12 arnaudb: Starting s8 eqiad failover from db1126 to db1209 - T349077
11:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52988 and previous config saved to /var/cache/conftool/dbconfig/20231017-110658-arnaudb.json
11:00 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:59 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Set db1209 with weight 0 T349077', diff saved to https://phabricator.wikimedia.org/P52987 and previous config saved to /var/cache/conftool/dbconfig/20231017-104839-arnaudb.json
10:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349077
10:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349077
10:28 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:28 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
09:59 hashar: Deleted operations-puppet-catalog-compiler Jenkins job to replace it with a new job letting one picks the Puppet version(s) to compile against | T236373
09:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
09:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
09:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet
09:58 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet
09:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
09:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
09:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
09:42 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host an-airflow1007.eqiad.wmnet
09:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
09:36 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@b010dae]: (no justification provided) (duration: 00m 46s)
09:35 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@b010dae]: (no justification provided)
09:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1007.eqiad.wmnet
09:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet
09:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet
09:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1005.eqiad.wmnet
09:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
09:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
09:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1005.eqiad.wmnet
09:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1004.eqiad.wmnet
09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
09:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
09:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1004.eqiad.wmnet
09:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1006.eqiad.wmnet
09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
09:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
09:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1006.eqiad.wmnet
09:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
09:12 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on an-airflow[1002,1004-1006].eqiad.wmnet,an-launcher1002.eqiad.wmnet with reason: Rebooting Airflow instances for T344671
08:38 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
08:35 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
08:32 XioNoX: push pfw policies - T348576
07:26 hashar@deploy2002: Finished deploy [gerrit/gerrit@578be93]: wm-checks-api: filter out Zuul start messages | T348920 (duration: 00m 07s)
07:26 hashar@deploy2002: Started deploy [gerrit/gerrit@578be93]: wm-checks-api: filter out Zuul start messages | T348920
07:23 hashar@deploy2002: Finished deploy [gerrit/gerrit@1153a16]: wm-checks-api: filter out Zuul start messages | T348920 (duration: 00m 05s)
07:22 hashar@deploy2002: Started deploy [gerrit/gerrit@1153a16]: wm-checks-api: filter out Zuul start messages | T348920
06:06 isaranto@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2161 T349053', diff saved to https://phabricator.wikimedia.org/P52986 and previous config saved to /var/cache/conftool/dbconfig/20231017-060214-root.json
06:06 isaranto@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
06:02 isaranto@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2165 to s8 primary and set section read-write T349053', diff saved to https://phabricator.wikimedia.org/P52985 and previous config saved to /var/cache/conftool/dbconfig/20231017-060047-root.json
06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s8 codfw as read-only for maintenance - T349053', diff saved to https://phabricator.wikimedia.org/P52984 and previous config saved to /var/cache/conftool/dbconfig/20231017-060021-root.json
06:00 marostegui: Starting s8 codfw failover from db2161 to db2165 - T349053
05:59 kart_: Update MinT to 2023-10-16-101614-production (T333969, T336683, T348097)
05:36 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
05:31 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
05:29 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
05:19 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053
05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2165 with weight 0 T349053', diff saved to https://phabricator.wikimedia.org/P52983 and previous config saved to /var/cache/conftool/dbconfig/20231017-051723-root.json
05:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T349053
03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.29 (duration: 02m 15s)
03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.1 refs T348354 (duration: 50m 15s)
03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.1 refs T348354
02:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T343198)', diff saved to https://phabricator.wikimedia.org/P52982 and previous config saved to /var/cache/conftool/dbconfig/20231017-021040-arnaudb.json
02:10 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
02:10 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
02:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52981 and previous config saved to /var/cache/conftool/dbconfig/20231017-021018-arnaudb.json
01:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P52980 and previous config saved to /var/cache/conftool/dbconfig/20231017-015511-arnaudb.json
01:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P52979 and previous config saved to /var/cache/conftool/dbconfig/20231017-014005-arnaudb.json
01:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52978 and previous config saved to /var/cache/conftool/dbconfig/20231017-012459-arnaudb.json

2023-10-16

22:04 maryum: deployed security patch for T347742
21:53 maryum: deployed security patch for T347708
21:40 maryum: deployed security patch for T348343
21:04 sbassett: deployed security mitigation for T348828
20:55 cjming: end of UTC late backport window
20:53 cjming@deploy2002: Finished scap: Backport for wordmarks/taglines for Wiktionary projects (T341257) (duration: 07m 17s)
20:47 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
20:46 cjming@deploy2002: jdlrobson and cjming: Backport for wordmarks/taglines for Wiktionary projects (T341257) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:45 cjming@deploy2002: Started scap: Backport for wordmarks/taglines for Wiktionary projects (T341257)
20:44 cjming@deploy2002: Finished scap: Backport for Update logos for remaining Wikisource projects (T343753) (duration: 07m 50s)
20:39 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
20:37 cjming@deploy2002: jdlrobson and cjming: Backport for Update logos for remaining Wikisource projects (T343753) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:36 cjming@deploy2002: Started scap: Backport for Update logos for remaining Wikisource projects (T343753)
20:35 cjming@deploy2002: Finished scap: Backport for Fixes Thai Wikinews wordmark and sewikimedia (T348757 T347534) (duration: 07m 08s)
20:30 cjming@deploy2002: cjming and jdlrobson: Continuing with sync
20:29 cjming@deploy2002: cjming and jdlrobson: Backport for Fixes Thai Wikinews wordmark and sewikimedia (T348757 T347534) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:28 cjming@deploy2002: Started scap: Backport for Fixes Thai Wikinews wordmark and sewikimedia (T348757 T347534)
20:26 cjming@deploy2002: Finished scap: Backport for Merge ReplyWidget[Plain/Visual] modules (T348834) (duration: 07m 23s)
20:21 cjming@deploy2002: kemayo and cjming: Continuing with sync
20:20 cjming@deploy2002: kemayo and cjming: Backport for Merge ReplyWidget[Plain/Visual] modules (T348834) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:19 cjming@deploy2002: Started scap: Backport for Merge ReplyWidget[Plain/Visual] modules (T348834)
20:18 cjming@deploy2002: Finished scap: Backport for Enable display of Client Hints data on all wikis (T341110 T337942) (duration: 08m 17s)
20:13 cjming@deploy2002: dreamyjazz and cjming: Continuing with sync
20:11 cjming@deploy2002: dreamyjazz and cjming: Backport for Enable display of Client Hints data on all wikis (T341110 T337942) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:10 cjming@deploy2002: Started scap: Backport for Enable display of Client Hints data on all wikis (T341110 T337942)
19:55 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:55 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:42 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:42 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:30 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:30 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:27 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:27 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts aqs1010.eqiad.wmnet
19:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts aqs1010.eqiad.wmnet
19:20 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:20 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:17 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts aqs1010.eqiad.wmnet
19:13 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:12 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:09 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts aqs1010.eqiad.wmnet
18:51 sukhe: exiqgrep -i -r <redacted> | xargs exim -Mrm
18:41 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
18:27 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
18:27 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
18:27 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
18:20 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:19 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:06 ejegg: fundraising python tools upgraded from 7c6a28e0 to e56ae8ae
17:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
17:59 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-airflow1007.eqiad.wmnet with reason: Downtime as we setup the new WMDE Airflow instance
17:55 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:55 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:41 denisse: Upgrading navtiming on the webperf hosts in the beta cluster
17:14 ejegg: fundraising python tools upgraded from 0c17296c to 7c6a28e0
16:48 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
16:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
16:43 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:42 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:23 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52975 and previous config saved to /var/cache/conftool/dbconfig/20231016-161829-arnaudb.json
16:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
16:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
16:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T343198)', diff saved to https://phabricator.wikimedia.org/P52974 and previous config saved to /var/cache/conftool/dbconfig/20231016-161806-arnaudb.json
16:10 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
16:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P52973 and previous config saved to /var/cache/conftool/dbconfig/20231016-160300-arnaudb.json
15:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P52972 and previous config saved to /var/cache/conftool/dbconfig/20231016-154754-arnaudb.json
15:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T343198)', diff saved to https://phabricator.wikimedia.org/P52971 and previous config saved to /var/cache/conftool/dbconfig/20231016-153247-arnaudb.json
15:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for sessionstore2001.codfw.wmnet
15:10 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for sessionstore2001.codfw.wmnet
15:08 sukhe: running authdns-update
15:03 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
14:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4003.wikimedia.org with OS bookworm
14:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sessionstore2001.codfw.wmnet with reason: Moving host — T348142
14:54 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on sessionstore2001.codfw.wmnet with reason: Moving host — T348142
14:42 ejegg: Standalone (IPN listener) SmashPig upgraded from 211284b9 to e27dfbce
14:35 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:34 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:34 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:33 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:33 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:33 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:30 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:30 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
14:26 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
14:25 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
14:23 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:22 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:22 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:21 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:20 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:20 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:18 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:17 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:17 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:16 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:16 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:15 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:10 ladsgroup@deploy2002: Finished scap: Backport for Disable DoubleWiki extension everywhere (T344544) (duration: 08m 09s)
14:05 ladsgroup@deploy2002: ladsgroup: Continuing with sync
14:03 ladsgroup@deploy2002: ladsgroup: Backport for Disable DoubleWiki extension everywhere (T344544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bookworm
14:02 ladsgroup@deploy2002: Started scap: Backport for Disable DoubleWiki extension everywhere (T344544)
13:53 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
13:52 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
13:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
13:52 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
13:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
13:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
13:42 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:41 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:39 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:38 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
13:37 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
13:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
13:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
13:36 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
13:36 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
13:36 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
13:35 jayme@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
13:34 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:34 TheresNoTime: close UTC afternoon backport window
13:34 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:34 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:34 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
13:34 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
13:33 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
13:33 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:33 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:33 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
13:33 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
13:33 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:33 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:30 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
13:30 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
13:30 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:30 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
13:14 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:14 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
13:12 samtar@deploy2002: Finished scap: Backport for fix incubatorwiki wordmark (T348577), update throttle rule for UIUC Wikipedia edit-a-thon November 13, 2023 and remove old throttle rules (T346043) (duration: 08m 08s)
13:07 samtar@deploy2002: samtar and anzx: Continuing with sync
13:05 samtar@deploy2002: samtar and anzx: Backport for fix incubatorwiki wordmark (T348577), update throttle rule for UIUC Wikipedia edit-a-thon November 13, 2023 and remove old throttle rules (T346043) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:05 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:04 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
13:04 samtar@deploy2002: Started scap: Backport for fix incubatorwiki wordmark (T348577), update throttle rule for UIUC Wikipedia edit-a-thon November 13, 2023 and remove old throttle rules (T346043)
12:35 ladsgroup@deploy2002: Finished scap: Backport for Switch ES cluster to cluster28 and cluster29 (T342685) (duration: 18m 52s)
12:29 ladsgroup@deploy2002: ladsgroup: Continuing with sync
12:17 ladsgroup@deploy2002: ladsgroup: Backport for Switch ES cluster to cluster28 and cluster29 (T342685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:16 ladsgroup@deploy2002: Started scap: Backport for Switch ES cluster to cluster28 and cluster29 (T342685)
11:15 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
11:12 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
11:10 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
11:07 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
11:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1001.eqiad.wmnet
10:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-coord1001.eqiad.wmnet
10:18 ladsgroup@deploy2002: Finished scap: Backport for Change default of pagelinks to write both (T345732) (duration: 07m 44s)
10:12 ladsgroup@deploy2002: ladsgroup: Continuing with sync
10:11 ladsgroup@deploy2002: ladsgroup: Backport for Change default of pagelinks to write both (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:10 ladsgroup@deploy2002: Started scap: Backport for Change default of pagelinks to write both (T345732)
10:06 ladsgroup@deploy2002: Finished scap: Backport for Enable pagelinks migration WRITE BOTH on some more wikis (T345732) (duration: 09m 19s)
10:01 ladsgroup@deploy2002: ladsgroup: Continuing with sync
09:58 ladsgroup@deploy2002: ladsgroup: Backport for Enable pagelinks migration WRITE BOTH on some more wikis (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:57 ladsgroup@deploy2002: Started scap: Backport for Enable pagelinks migration WRITE BOTH on some more wikis (T345732)
09:52 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:52 phuedx@deploy2002: Finished scap: Backport for Revert "Introduce Web Accessibility Features and Submodule" (duration: 10m 04s)
09:52 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:51 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:47 phuedx@deploy2002: phuedx: Continuing with sync
09:43 phuedx@deploy2002: phuedx: Backport for Revert "Introduce Web Accessibility Features and Submodule" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:42 phuedx@deploy2002: Started scap: Backport for Revert "Introduce Web Accessibility Features and Submodule"
09:38 phuedx@deploy2002: backport Cancelled
09:00 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-test-master1001.eqiad.wmnet
08:56 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
08:52 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
08:51 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
08:48 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
08:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1001.eqiad.wmnet
08:44 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
08:44 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
08:44 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
08:43 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
08:43 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
08:42 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
08:41 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
08:40 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
08:40 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
08:39 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
08:38 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
08:38 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
08:36 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
08:35 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
08:35 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
08:35 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
08:35 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:34 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
08:34 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:34 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
08:34 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:10 hashar@deploy2002: Finished scap: Backport for Don't try to lock to serialize m3u8 file writes (T348689 T348667 T348375 T348753) (duration: 27m 04s)
07:58 hashar@deploy2002: jforrester and hashar: Continuing with sync
07:57 hashar@deploy2002: jforrester and hashar: Backport for Don't try to lock to serialize m3u8 file writes (T348689 T348667 T348375 T348753) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:55 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
07:54 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
07:54 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
07:53 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
07:43 hashar@deploy2002: Started scap: Backport for Don't try to lock to serialize m3u8 file writes (T348689 T348667 T348375 T348753)
07:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T343198)', diff saved to https://phabricator.wikimedia.org/P52968 and previous config saved to /var/cache/conftool/dbconfig/20231016-073731-arnaudb.json
07:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
07:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
07:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
07:36 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
07:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52967 and previous config saved to /var/cache/conftool/dbconfig/20231016-073653-arnaudb.json
07:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P52966 and previous config saved to /var/cache/conftool/dbconfig/20231016-072147-arnaudb.json
07:17 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
07:17 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
07:15 aqu@deploy2002: Finished deploy [analytics/refinery@1baf3be] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1baf3be2] (duration: 02m 51s)
07:12 aqu@deploy2002: Started deploy [analytics/refinery@1baf3be] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1baf3be2]
07:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P52965 and previous config saved to /var/cache/conftool/dbconfig/20231016-070640-arnaudb.json
06:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52964 and previous config saved to /var/cache/conftool/dbconfig/20231016-065134-arnaudb.json
05:41 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
05:41 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
05:40 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
05:40 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
05:39 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
05:38 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
05:36 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
05:35 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
05:34 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
05:33 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
05:33 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
05:33 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
05:32 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
05:32 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
05:32 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
05:32 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
05:31 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
05:31 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .

2023-10-15

22:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52963 and previous config saved to /var/cache/conftool/dbconfig/20231015-222435-arnaudb.json
22:24 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:24 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
22:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52962 and previous config saved to /var/cache/conftool/dbconfig/20231015-222414-arnaudb.json
22:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P52961 and previous config saved to /var/cache/conftool/dbconfig/20231015-220907-arnaudb.json
21:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P52960 and previous config saved to /var/cache/conftool/dbconfig/20231015-215401-arnaudb.json
21:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52959 and previous config saved to /var/cache/conftool/dbconfig/20231015-213855-arnaudb.json
19:10 urandom: starting Cassandra decommission of restbase1016-b — T328490
14:35 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
14:32 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
14:31 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
14:31 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
14:31 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
14:30 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
14:30 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
13:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T343198)', diff saved to https://phabricator.wikimedia.org/P52958 and previous config saved to /var/cache/conftool/dbconfig/20231015-130027-arnaudb.json
13:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
13:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
13:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T343198)', diff saved to https://phabricator.wikimedia.org/P52957 and previous config saved to /var/cache/conftool/dbconfig/20231015-130005-arnaudb.json
12:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P52956 and previous config saved to /var/cache/conftool/dbconfig/20231015-124459-arnaudb.json
12:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P52955 and previous config saved to /var/cache/conftool/dbconfig/20231015-122953-arnaudb.json
12:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T343198)', diff saved to https://phabricator.wikimedia.org/P52954 and previous config saved to /var/cache/conftool/dbconfig/20231015-121446-arnaudb.json
11:03 hashar@deploy2002: Finished deploy [integration/docroot@096f637]: (no justification provided) (duration: 00m 05s)
11:03 hashar@deploy2002: Started deploy [integration/docroot@096f637]: (no justification provided)
03:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T343198)', diff saved to https://phabricator.wikimedia.org/P52953 and previous config saved to /var/cache/conftool/dbconfig/20231015-035420-arnaudb.json
03:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
03:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
03:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T343198)', diff saved to https://phabricator.wikimedia.org/P52952 and previous config saved to /var/cache/conftool/dbconfig/20231015-035347-arnaudb.json
03:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P52951 and previous config saved to /var/cache/conftool/dbconfig/20231015-033841-arnaudb.json
03:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P52950 and previous config saved to /var/cache/conftool/dbconfig/20231015-032335-arnaudb.json
03:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T343198)', diff saved to https://phabricator.wikimedia.org/P52949 and previous config saved to /var/cache/conftool/dbconfig/20231015-030828-arnaudb.json

2023-10-14

18:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T343198)', diff saved to https://phabricator.wikimedia.org/P52948 and previous config saved to /var/cache/conftool/dbconfig/20231014-184517-arnaudb.json
18:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
18:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
18:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T343198)', diff saved to https://phabricator.wikimedia.org/P52947 and previous config saved to /var/cache/conftool/dbconfig/20231014-184455-arnaudb.json
18:30 urandom: starting Cassandra decommission of restbase1016-a — T328490
18:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P52946 and previous config saved to /var/cache/conftool/dbconfig/20231014-182949-arnaudb.json
18:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P52945 and previous config saved to /var/cache/conftool/dbconfig/20231014-181442-arnaudb.json
17:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T343198)', diff saved to https://phabricator.wikimedia.org/P52944 and previous config saved to /var/cache/conftool/dbconfig/20231014-175936-arnaudb.json
17:34 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
17:34 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
17:33 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
17:33 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
17:32 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
17:32 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
09:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T343198)', diff saved to https://phabricator.wikimedia.org/P52943 and previous config saved to /var/cache/conftool/dbconfig/20231014-091542-arnaudb.json
09:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
02:29 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
02:22 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
02:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
02:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52942 and previous config saved to /var/cache/conftool/dbconfig/20231014-022208-arnaudb.json
02:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P52941 and previous config saved to /var/cache/conftool/dbconfig/20231014-020701-arnaudb.json
01:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P52940 and previous config saved to /var/cache/conftool/dbconfig/20231014-015154-arnaudb.json
01:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52939 and previous config saved to /var/cache/conftool/dbconfig/20231014-013648-arnaudb.json
00:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)

2023-10-13

23:56 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
22:21 ejegg: fundraising civicrm upgraded from c5f54d97 to e71ccffb
21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1107.eqiad.wmnet with OS bullseye
21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1106.eqiad.wmnet with OS bullseye
21:29 hashar@deploy2002: Finished deploy [integration/docroot@096f637]: Expand Purtle doc card (duration: 00m 05s)
21:29 hashar@deploy2002: Started deploy [integration/docroot@096f637]: Expand Purtle doc card
21:29 hashar@deploy2002: Finished deploy [integration/docroot@504d455]: Fix php-session-serializer tagline (duration: 00m 06s)
21:28 hashar@deploy2002: Started deploy [integration/docroot@504d455]: Fix php-session-serializer tagline
20:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1111.eqiad.wmnet with OS bullseye
20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1114.eqiad.wmnet with OS bullseye
20:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1113.eqiad.wmnet with OS bullseye
20:29 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1108']
20:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1108']
20:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1108.eqiad.wmnet with OS bullseye
20:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1107.eqiad.wmnet with OS bullseye
20:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
20:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
20:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS bullseye
20:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1112.eqiad.wmnet with OS bullseye
20:11 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:10 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
20:07 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
20:07 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:06 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
20:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1109.eqiad.wmnet with OS bullseye
20:04 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:03 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
19:57 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS bullseye
19:56 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
19:56 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
19:55 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
19:54 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
19:53 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
19:53 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
19:52 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS bullseye
19:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
19:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS bullseye
19:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1102']
19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
19:39 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp1113']
19:38 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
19:37 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
19:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1102
19:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1102
19:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS bullseye
19:28 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS bullseye
19:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
19:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
19:25 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
19:24 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
19:24 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1102']
19:24 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
19:24 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
19:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp1114']
19:23 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
19:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
19:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1112']
19:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
19:22 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1113']
19:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
19:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1110']
19:20 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
19:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
19:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
19:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
19:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
19:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1113']
19:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
19:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
19:14 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@520fa55]: (no justification provided) (duration: 00m 23s)
19:14 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1113']
19:14 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@520fa55]: (no justification provided)
19:14 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
19:08 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
19:08 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1112']
19:07 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
19:07 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
19:04 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
19:03 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
19:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
18:58 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1109']
18:52 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1109']
18:06 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1108']
18:06 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
18:03 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1115.eqiad.wmnet with OS bullseye
18:03 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:02 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1107']
18:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1108']
18:00 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@520fa55]: (no justification provided) (duration: 00m 59s)
17:59 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@520fa55]: (no justification provided)
17:56 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1107']
17:55 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1106']
17:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1106']
17:54 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
17:53 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
17:53 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
17:52 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1103']
17:50 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1101']
17:48 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1100']
17:46 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1103']
17:46 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
17:46 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
17:45 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
17:44 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1101']
17:42 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
17:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
17:26 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
17:26 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@520fa55]: (no justification provided) (duration: 01m 01s)
17:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
17:25 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@520fa55]: (no justification provided)
17:16 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
17:16 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1105']
17:15 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
17:14 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1105']
17:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
17:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1115']
17:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1115']
16:58 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
16:57 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
16:49 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
16:43 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
16:43 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
16:42 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1105']
16:42 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
16:41 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1105']
16:41 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1105']
16:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
16:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1008.wikimedia.org with OS bullseye
16:29 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52936 and previous config saved to /var/cache/conftool/dbconfig/20231013-162902-arnaudb.json
16:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
16:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
16:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T343198)', diff saved to https://phabricator.wikimedia.org/P52935 and previous config saved to /var/cache/conftool/dbconfig/20231013-162840-arnaudb.json
16:24 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1114']
16:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P52934 and previous config saved to /var/cache/conftool/dbconfig/20231013-161333-arnaudb.json
16:12 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1114']
16:11 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1113']
16:10 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1115']
16:06 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1114.mgmt.eqiad.wmnet with reboot policy FORCED
16:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1115']
15:59 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1113']
15:59 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1115.mgmt.eqiad.wmnet with reboot policy FORCED
15:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P52933 and previous config saved to /var/cache/conftool/dbconfig/20231013-155827-arnaudb.json
15:55 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1111']
15:55 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1113.mgmt.eqiad.wmnet with reboot policy FORCED
15:55 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
15:54 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1112']
15:45 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1111']
15:45 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1110']
15:44 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
15:44 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1112']
15:44 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1112']
15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T343198)', diff saved to https://phabricator.wikimedia.org/P52932 and previous config saved to /var/cache/conftool/dbconfig/20231013-154321-arnaudb.json
15:43 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1112.mgmt.eqiad.wmnet with reboot policy FORCED
15:41 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1115.mgmt.eqiad.wmnet with reboot policy FORCED
15:41 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1114.mgmt.eqiad.wmnet with reboot policy FORCED
15:40 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1115
15:40 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1114
15:39 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1115
15:39 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1114
15:37 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1113.mgmt.eqiad.wmnet with reboot policy FORCED
15:35 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1111']
15:35 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1111.mgmt.eqiad.wmnet with reboot policy FORCED
15:35 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1110']
15:33 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1109']
15:32 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1113
15:32 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
15:32 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1108']
15:32 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1107']
15:31 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1113
15:25 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1112.mgmt.eqiad.wmnet with reboot policy FORCED
15:23 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1109']
15:23 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1106']
15:22 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1108']
15:21 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1107']
15:20 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1109.mgmt.eqiad.wmnet with reboot policy FORCED
15:19 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
15:19 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1108.mgmt.eqiad.wmnet with reboot policy FORCED
15:18 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
15:16 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1112
15:16 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
15:15 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1112
15:15 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1111.mgmt.eqiad.wmnet with reboot policy FORCED
15:12 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1106']
15:12 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
15:10 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:10 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1111
15:08 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1110.mgmt.eqiad.wmnet with reboot policy FORCED
15:07 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1111
15:07 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1110
15:06 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1110
15:02 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1109.mgmt.eqiad.wmnet with reboot policy FORCED
15:01 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1109
14:59 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1109
14:58 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1108.mgmt.eqiad.wmnet with reboot policy FORCED
14:56 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1108
14:55 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1107.mgmt.eqiad.wmnet with reboot policy FORCED
14:55 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1107
14:54 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1108
14:53 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1107
14:51 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
14:51 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1106
14:51 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1106
14:43 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bookworm
14:30 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1106
14:30 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1106
14:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1008.wikimedia.org with reason: host reimage
14:28 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
14:26 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1008.wikimedia.org with reason: host reimage
14:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
14:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
14:18 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
14:17 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
14:17 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1105
14:17 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1105
14:12 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
14:06 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
14:03 sukhe: remove redundant 208.80.154.238/32 dev from /e/n/i on A:dns-rec and A:eqiad (superseded by label lo:anycast): T348041
13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
13:20 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
13:07 mvernon@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2003.codfw.wmnet with OS bookworm
13:04 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
13:04 mvernon@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2003.codfw.wmnet with OS bookworm
13:04 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
12:53 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided) (duration: 00m 39s)
12:52 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided)
12:52 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided) (duration: 01m 26s)
12:50 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@9a8cfd2]: (no justification provided)
12:47 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
11:53 urandom: starting decommission of restbase2012-c — T328490
11:07 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
10:29 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
09:10 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
08:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
07:54 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
06:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 150552
06:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 150552
06:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T343198)', diff saved to https://phabricator.wikimedia.org/P52925 and previous config saved to /var/cache/conftool/dbconfig/20231013-064400-arnaudb.json
06:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
06:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
06:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T343198)', diff saved to https://phabricator.wikimedia.org/P52924 and previous config saved to /var/cache/conftool/dbconfig/20231013-064328-arnaudb.json
06:43 moritzm: installing Linux 5.10.197 updates from Bullseye point release (no reboots, just installing the new kernels)
06:39 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync
06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: setup in progress
06:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: setup in progress
06:38 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: sync
06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2002.codfw.wmnet with reason: setup in progress
06:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2002.codfw.wmnet with reason: setup in progress
06:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P52923 and previous config saved to /var/cache/conftool/dbconfig/20231013-062821-arnaudb.json
06:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P52922 and previous config saved to /var/cache/conftool/dbconfig/20231013-061315-arnaudb.json
05:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T343198)', diff saved to https://phabricator.wikimedia.org/P52921 and previous config saved to /var/cache/conftool/dbconfig/20231013-055809-arnaudb.json
03:20 TimStarling: on non-CentralAuth wikis, created the loginnotify_seen_net table T346989
03:08 TimStarling: on x1 wikishared, created loginnotify_seen_net table T346989
01:11 cstone: payments-wiki upgraded from aa5cd24d to 7f4da789

2023-10-12

21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye
21:27 thcipriani@deploy2002: Finished scap: Backport for Set UseParserMigration true in wmf-config (T333179) (duration: 15m 20s)
21:22 thcipriani@deploy2002: sbailey and thcipriani: Continuing with sync
21:13 thcipriani@deploy2002: sbailey and thcipriani: Backport for Set UseParserMigration true in wmf-config (T333179) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:12 thcipriani@deploy2002: Started scap: Backport for Set UseParserMigration true in wmf-config (T333179)
21:10 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
21:10 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
21:10 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
21:10 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
21:09 bking@cumin1001: START - Cookbook sre.wdqs.data-reload
21:09 thcipriani: mwmaint2002:foreachwikiindblist 'group2 & s6' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230613000000 | tee /tmp/persistentRevisionThreadItems-s6.log
21:09 thcipriani: mwmaint2002:foreachwikiindblist 'group2 & s7' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230613000000 | tee /tmp/persistentRevisionThreadItems-s7.log
21:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T343198)', diff saved to https://phabricator.wikimedia.org/P52920 and previous config saved to /var/cache/conftool/dbconfig/20231012-210646-arnaudb.json
21:06 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
21:06 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2103.codfw.wmnet with reason: Maintenance
21:06 thcipriani@deploy2002: Finished scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s7 group2 (T315353) (duration: 07m 55s)
21:00 thcipriani@deploy2002: thcipriani and matmarex: Continuing with sync
20:59 thcipriani@deploy2002: thcipriani and matmarex: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s7 group2 (T315353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:58 thcipriani@deploy2002: Started scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s7 group2 (T315353)
20:50 dr0ptp4kt@deploy2002: Finished scap: Backport for Revert "Growth: Enable Welcome survey user research for enwiki" (T342353) (duration: 08m 32s)
20:45 dr0ptp4kt@deploy2002: dr0ptp4kt and urbanecm: Continuing with sync
20:43 dr0ptp4kt@deploy2002: dr0ptp4kt and urbanecm: Backport for Revert "Growth: Enable Welcome survey user research for enwiki" (T342353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:41 dr0ptp4kt@deploy2002: Started scap: Backport for Revert "Growth: Enable Welcome survey user research for enwiki" (T342353)
20:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
20:37 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
20:37 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
20:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.wikimedia.org with OS bullseye
20:26 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.wikimedia.org with OS bullseye
20:25 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:22 dr0ptp4kt@deploy2002: Finished scap: Backport for Allow FundraiseUp scripts in Donatewiki CSP (T345379) (duration: 07m 40s)
20:21 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:17 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:17 dr0ptp4kt@deploy2002: dr0ptp4kt and ejegg: Continuing with sync
20:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
20:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
20:16 dr0ptp4kt@deploy2002: dr0ptp4kt and ejegg: Backport for Allow FundraiseUp scripts in Donatewiki CSP (T345379) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
20:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
20:15 dr0ptp4kt@deploy2002: Started scap: Backport for Allow FundraiseUp scripts in Donatewiki CSP (T345379)
20:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
20:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
20:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
20:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
20:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
20:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
20:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
20:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
20:06 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
20:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
20:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
20:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
20:05 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
20:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
20:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.wikimedia.org with reason: host reimage
20:04 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
20:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
20:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.wikimedia.org with reason: host reimage
20:00 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.wikimedia.org with reason: host reimage
19:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.wikimedia.org with reason: host reimage
19:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
19:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
19:58 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:57 jclark@cumin1001: START - Cookbook sre.dns.netbox
19:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
19:56 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
19:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
19:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
19:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
19:47 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
19:47 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
19:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
19:46 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
19:45 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
19:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
19:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
19:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
19:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1008']
19:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1009']
19:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
19:38 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1010']
19:38 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
19:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
19:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
19:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
19:35 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
19:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
19:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
19:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
19:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009']
19:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010']
19:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye
19:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1009.wikimedia.org with OS bullseye
19:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
19:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: new graph split hosts T347505
19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: new graph split hosts T347505
17:57 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1106.mgmt.eqiad.wmnet with reboot policy FORCED
17:53 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1102']
17:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
17:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
17:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
17:37 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010
17:36 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010
17:35 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009
17:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.wikimedia.org with OS bullseye
17:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1010.wikimedia.org with OS bullseye
17:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1009.wikimedia.org with OS bullseye
17:34 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009
17:33 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
17:32 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:32 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:31 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
17:27 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1106
17:26 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1106
17:23 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1105.mgmt.eqiad.wmnet with reboot policy FORCED
17:22 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1105
17:21 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1105
17:19 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1102']
17:13 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1067.eqiad.wmnet with OS bullseye
17:13 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
17:12 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1104']
17:12 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
16:57 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
16:55 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1064.eqiad.wmnet with OS bullseye
16:55 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
16:53 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
16:50 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
16:41 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
16:35 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
16:33 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1103']
16:31 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
16:28 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bullseye
16:28 pt1979@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
16:27 pt1979@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1001"
16:27 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
16:26 sukhe: enable puppet on A:dns-rec and force agent run: T348041
16:25 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
16:24 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
16:22 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1103']
16:19 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
16:19 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1103.mgmt.eqiad.wmnet with reboot policy FORCED
16:17 sukhe: disable puppet on A:dns-rec to roll out CR: 965187 T348041
16:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1008.wikimedia.org with OS bullseye
16:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1010.wikimedia.org with OS bullseye
16:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1009.wikimedia.org with OS bullseye
16:14 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:14 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
16:13 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
16:12 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
16:11 jclark@cumin1001: START - Cookbook sre.dns.netbox
16:09 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
16:09 moritzm: installing batik security updates
16:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
16:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
16:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
15:57 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:56 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
15:56 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
15:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2002.codfw.wmnet
15:48 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1102.mgmt.eqiad.wmnet with reboot policy FORCED
15:48 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2002.codfw.wmnet
15:46 moritzm: restart FPM on mediawiki canaries to pick up new libxpm
15:44 moritzm: installing libxpm security updates
15:42 Lucas_WMDE: (mostly?) Finished scap: Backport for specials: Use correct title in NewPagesPager (T348665) (duration: 07m 13s) – scap failed in the purgeMessageBlobStore step (php-fpm-restarts finished)
15:35 lucaswerkmeister-wmde@deploy2002: jforrester and lucaswerkmeister-wmde: Continuing with sync
15:34 lucaswerkmeister-wmde@deploy2002: jforrester and lucaswerkmeister-wmde: Backport for specials: Use correct title in NewPagesPager (T348665) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:33 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for specials: Use correct title in NewPagesPager (T348665)
15:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1007.wikimedia.org with OS bullseye
15:31 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:30 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16591
15:16 lucaswerkmeister-wmde@deploy2002: Backport cancelled.
15:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1007.wikimedia.org with reason: host reimage
15:11 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1007.wikimedia.org with reason: host reimage
15:08 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
15:04 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bookworm
15:00 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2003.codfw.wmnet
15:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16591
14:57 sukhe: stopping gdnsd on dns2006 to simulate bird prefix withdrawal
14:57 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.wikimedia.org with OS bullseye
14:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.wikimedia.org with OS bullseye
14:56 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.wikimedia.org with OS bullseye
14:53 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2003.codfw.wmnet
14:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
14:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35008
14:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35008
14:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
14:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12200
14:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28458
14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 28458
14:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 400474
14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 400474
14:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398196
14:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398196
14:47 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:47 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007 - jclark@cumin1001"
14:46 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007 - jclark@cumin1001"
14:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3267
14:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3267
14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 30132
14:45 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
14:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 30132
14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15703
14:44 jclark@cumin1001: START - Cookbook sre.dns.netbox
14:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15703
14:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25542
14:42 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
14:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25542
14:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15435
14:38 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
14:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15435
14:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
14:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46562
14:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
14:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46562
14:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6412
14:34 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6412
14:33 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync
14:33 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: sync
14:32 sukhe: completed restarts of pdns-recursor in doh* and dns*
14:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
14:23 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
14:17 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: sync
14:16 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: sync
14:16 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: sync
14:15 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: sync
14:12 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
14:11 urbanecm: mwmaint2002: stop previous instance of `refreshLinkRecommendations` maintenance job (T348719)
14:07 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
14:04 sukhe: sudo cumin -b1 -s120 'A:dns-rec and not P{dns6002*}' 'systemctl restart pdns-recursor.service'
14:03 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.30 refs T347081
14:00 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
13:50 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:50 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:50 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:49 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:43 sukhe: remove old ns2 IP 91.198.174.239/32 from /e/n/i on A:dns-rec: T329219
13:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 54994
13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 54994
13:35 sukhe: remove redundant 208.80.153.231/32 from /e/n/i on A:dns-rec and A:codfw (superseded by label lo:anycast): T348041
13:34 kartik@deploy2002: Finished scap: Backport for Add Akan language (T333765) (duration: 09m 39s)
13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 139901
13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 139901
13:28 kartik@deploy2002: kartik and srishakatux: Continuing with sync
13:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
13:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
13:25 kartik@deploy2002: kartik and srishakatux: Backport for Add Akan language (T333765) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:25 kartik@deploy2002: Started scap: Backport for Add Akan language (T333765)
13:24 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
13:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15133
13:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40317
13:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 40317
13:18 hashar@deploy2002: Finished scap: Backport for LinkRecommendationUpdater: Update $linkRecommendationTaskType declaration (T348719) (duration: 06m 51s)
13:13 hashar@deploy2002: phuedx and hashar: Continuing with sync
13:13 hashar@deploy2002: phuedx and hashar: Backport for LinkRecommendationUpdater: Update $linkRecommendationTaskType declaration (T348719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:11 hashar@deploy2002: Started scap: Backport for LinkRecommendationUpdater: Update $linkRecommendationTaskType declaration (T348719)
12:26 jayme: re-enable puppet on A:cp - T347544
12:18 jayme: disable puppet on A:cp - T347544
12:16 jayme: disable puppet on A:cp-text - T347544
11:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
11:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
11:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
11:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
11:37 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
11:36 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
11:34 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
11:33 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
11:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: testing
11:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: testing
11:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
11:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
11:21 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
11:20 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
10:52 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: sync
10:51 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: sync
10:50 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
10:49 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
10:26 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
10:26 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
10:26 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
10:15 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
10:13 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: sync
10:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: sync
09:40 fabfur: repooling cp4040 (depooled for T347837 and forgot)
09:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1002.eqiad.wmnet
09:31 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1002.eqiad.wmnet
09:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-master1002.eqiad.wmnet
09:31 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for an-master1002.eqiad.wmnet
09:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-master1002.eqiad.wmnet with reason: Rebooting misbehaving an-master1002
09:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on an-master1002.eqiad.wmnet with reason: Rebooting misbehaving an-master1002
08:53 hashar@deploy2002: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.41.0-wmf.30" # T347081
08:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 56099
08:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 56099
08:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 38195
08:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38195
08:40 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 38195
08:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38195
08:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
08:38 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
08:38 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
08:38 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
08:35 godog: add 200G to prometheus/ops in eqiad
08:28 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.30 refs T347081
08:15 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
06:59 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Arturo Borrero Gonzalez out of all services on: 2156 hosts
06:58 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Arturo Borrero Gonzalez out of all services on: 2156 hosts
06:46 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
00:09 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1063.eqiad.wmnet with OS bullseye

2023-10-11

23:23 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
23:22 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
23:09 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
23:05 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2003.codfw.wmnet with OS bullseye
22:47 eileen: civicrm upgraded from f2f1e23e to ceaeaa19
22:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
22:18 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:18 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:15 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:15 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
22:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir1001.eqiad.wmnet with OS bookworm
21:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
21:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
21:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for apifeatureusage2001.codfw.wmnet,apifeatureusage1001.eqiad.wmnet
21:30 ryankemper@cumin1001: START - Cookbook sre.hosts.remove-downtime for apifeatureusage2001.codfw.wmnet,apifeatureusage1001.eqiad.wmnet
21:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1001.eqiad.wmnet with OS bookworm
21:26 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir1002.eqiad.wmnet with OS bookworm
21:20 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on apifeatureusage2001.codfw.wmnet with reason: reboot T348418
21:20 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on apifeatureusage2001.codfw.wmnet with reason: reboot T348418
21:11 ryankemper: T348418 Rebooting `apifeatureusage1001.eqiad.wmnet`
21:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
21:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
21:06 taavi@deploy2002: Finished scap: Backport for Set WRITE_NEW for CA wikis on OATHAuth multiple devices (T242031) (duration: 10m 33s)
21:01 taavi@deploy2002: taavi: Continuing with sync
20:57 taavi@deploy2002: taavi: Backport for Set WRITE_NEW for CA wikis on OATHAuth multiple devices (T242031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:55 taavi@deploy2002: Started scap: Backport for Set WRITE_NEW for CA wikis on OATHAuth multiple devices (T242031)
20:54 cstone: payments-wiki upgraded from d6ad0376 to aa5cd24d
20:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1002.eqiad.wmnet with OS bookworm
20:45 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:45 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
20:44 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:43 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
20:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir2001.codfw.wmnet with OS bookworm
20:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
20:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
20:19 samtar@deploy2002: Finished scap: Backport for Remove override to allow mobile edit notices to display on all wikis (T316178) (duration: 08m 18s)
20:14 samtar@deploy2002: kemayo and samtar: Continuing with sync
20:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
20:13 samtar@deploy2002: kemayo and samtar: Backport for Remove override to allow mobile edit notices to display on all wikis (T316178) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:12 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
20:11 samtar@deploy2002: Started scap: Backport for Remove override to allow mobile edit notices to display on all wikis (T316178)
20:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
20:09 samtar@deploy2002: Finished scap: Backport for Enable Edit Check on initial partner wikis (T347908) (duration: 07m 32s)
20:07 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:07 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
20:04 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
20:04 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
20:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir2001.codfw.wmnet with OS bookworm
20:04 samtar@deploy2002: samtar and kemayo: Continuing with sync
20:04 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
20:04 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
20:03 samtar@deploy2002: samtar and kemayo: Backport for Enable Edit Check on initial partner wikis (T347908) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:03 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
20:03 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
20:03 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
20:02 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
20:02 samtar@deploy2002: Started scap: Backport for Enable Edit Check on initial partner wikis (T347908)
20:00 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1104']
20:00 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1104']
19:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir2002.codfw.wmnet with OS bookworm
19:52 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1103.mgmt.eqiad.wmnet with reboot policy FORCED
19:44 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1102.mgmt.eqiad.wmnet with reboot policy FORCED
19:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
19:37 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
19:12 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir2002.codfw.wmnet with OS bookworm
19:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3003.esams.wmnet with OS bookworm
19:08 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1101']
19:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52914 and previous config saved to /var/cache/conftool/dbconfig/20231011-190408-arnaudb.json
18:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host stat1011.eqiad.wmnet with OS bullseye
18:49 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P52913 and previous config saved to /var/cache/conftool/dbconfig/20231011-184902-arnaudb.json
18:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3003.esams.wmnet with reason: host reimage
18:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3003.esams.wmnet with reason: host reimage
18:36 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
18:36 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
18:36 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
18:36 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
18:35 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:35 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P52911 and previous config saved to /var/cache/conftool/dbconfig/20231011-183355-arnaudb.json
18:33 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
18:33 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
18:33 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
18:32 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
18:31 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:31 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:25 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:24 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
18:24 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
18:23 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
18:23 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
18:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stat1011.eqiad.wmnet with reason: host reimage
18:22 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
18:21 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:21 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:19 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on stat1011.eqiad.wmnet with reason: host reimage
18:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52910 and previous config saved to /var/cache/conftool/dbconfig/20231011-181849-arnaudb.json
18:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3003.esams.wmnet with OS bookworm
18:08 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
18:07 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
18:07 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
18:07 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
18:05 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:56 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
17:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3004.esams.wmnet with OS bookworm
17:55 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:47 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
17:47 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3004.esams.wmnet with reason: host reimage
17:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3004.esams.wmnet with reason: host reimage
17:27 sukhe: repool cp2030 for service=cdn
17:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3004.esams.wmnet with OS bookworm
16:57 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:57 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
16:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host stat1011.eqiad.wmnet with OS bullseye
16:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
16:47 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['stat1011']
16:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1011']
16:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host stat1011.eqiad.wmnet with OS bullseye
16:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
16:43 taavi@deploy2002: Finished scap: Backport for Don't double-escape link contents (T348669) (duration: 07m 35s)
16:38 taavi@deploy2002: taavi: Continuing with sync
16:37 taavi@deploy2002: taavi: Backport for Don't double-escape link contents (T348669) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:36 taavi@deploy2002: Started scap: Backport for Don't double-escape link contents (T348669)
16:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir5001.eqsin.wmnet with OS bookworm
15:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
15:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
15:53 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mw-wikifunctions.discovery.wmnet on codfw recursors
15:53 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache mw-wikifunctions.discovery.wmnet on codfw recursors
15:53 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mw-wikifunctions.discovery.wmnet on eqiad recursors
15:53 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache mw-wikifunctions.discovery.wmnet on eqiad recursors
15:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host stat1011.eqiad.wmnet with OS bullseye
15:52 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
15:25 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
15:25 vgutierrez: depool ncredir5001
15:23 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:22 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:22 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
15:21 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
15:20 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:20 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
15:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on apt1002.wikimedia.org with reason: setup in progress
15:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on apt1002.wikimedia.org with reason: setup in progress
14:55 jayme: restarting pybal on lvs1019 and lvs2013
14:52 jayme: restarting pybal on lvs1020 and lvs2014
14:49 jayme: running puppet on 'O:lvs::balancer'
14:45 jayme: disabling puppet on 'P{O:lvs::balancer} and (A:codfw or A:eqiad)'
14:28 claime: Running authdns-update - T348631
14:25 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
14:25 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
14:25 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
14:25 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
14:24 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
14:23 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
14:22 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1101']
14:21 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
14:21 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
14:21 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
14:18 moritzm: installing curl security updates on bullseye/bookworm
14:17 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
14:15 jayme@deploy2002: Finished scap: (no justification provided) (duration: 02m 15s)
14:13 jayme@deploy2002: Started scap: (no justification provided)
14:07 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
14:06 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Edit check: Simplify "experience" config to "maximumEditcount" (duration: 07m 13s)
14:05 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
14:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kemayo: Continuing with sync
14:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and kemayo: Backport for Edit check: Simplify "experience" config to "maximumEditcount" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:58 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Edit check: Simplify "experience" config to "maximumEditcount"
13:58 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
13:58 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
13:50 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:45 elukey: restart kube-apiserver on ml-serve-ctrl1002
13:42 elukey: restart kube-apiserver on ml-serve-ctrl1001 as attempt to clear a weird golang/protobuf issue while retrieving secrets
13:40 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:40 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:39 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
13:39 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:38 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:38 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:37 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:37 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
13:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 150552
13:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 150552
13:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38628
13:36 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
13:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38628
13:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40317
13:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 40317
13:34 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38195
13:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38195
13:28 sukhe: disable puppet on P:bird::anycast: T348041
13:28 sukhe: disable puppet on P:bird::anycast
13:27 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
13:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9031
13:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9031
13:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6368
13:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6368
13:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2497
13:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2497
13:24 urandom: starting decommission of restbase2012-a — T328490
13:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
13:23 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:16 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
13:16 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
13:16 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
13:15 jbond@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
13:14 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
13:14 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
13:02 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:59 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
12:56 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:55 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:53 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:53 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:52 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:52 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:51 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Cleanup decommissioned services apple-search and graphoid - cgoubert@cumin1001"
12:37 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Cleanup decommissioned services apple-search and graphoid - cgoubert@cumin1001"
12:34 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
12:34 cgoubert@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
12:33 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
12:16 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:16 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ORES svc records - elukey@cumin1001"
12:15 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ORES svc records - elukey@cumin1001"
12:12 elukey@cumin1001: START - Cookbook sre.dns.netbox
12:00 kart_: Updated cxserver to 2023-10-11-114410-production (T341478, T347939)
12:00 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
11:59 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
11:58 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
11:57 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
11:55 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
11:54 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
11:28 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on druid1011.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
11:27 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on druid1011.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
11:12 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:12 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
11:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52901 and previous config saved to /var/cache/conftool/dbconfig/20231011-110127-arnaudb.json
11:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
11:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
11:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52900 and previous config saved to /var/cache/conftool/dbconfig/20231011-110105-arnaudb.json
10:52 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:52 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P52899 and previous config saved to /var/cache/conftool/dbconfig/20231011-104558-arnaudb.json
10:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P52898 and previous config saved to /var/cache/conftool/dbconfig/20231011-103052-arnaudb.json
10:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52897 and previous config saved to /var/cache/conftool/dbconfig/20231011-101545-arnaudb.json
10:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
09:52 moritzm: rebuilding RAID after disk replacement T348429
09:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
09:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
09:34 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
09:31 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1001.eqiad.wmnet with OS bullseye
09:23 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:23 jayme@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add VIPs for mw-wikifunction - jayme@cumin1001"
09:23 jayme@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add VIPs for mw-wikifunction - jayme@cumin1001"
09:19 jayme@cumin1001: START - Cookbook sre.dns.netbox
09:15 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
08:53 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
08:44 hashar@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.30 refs T347081 (duration: 06m 00s)
08:38 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.30 refs T347081
08:00 hashar@deploy2002: Synchronized php-1.41.0-wmf.30/skins/Vector: Backports for Vector styling issues T348572 T348530 (duration: 06m 16s)
07:35 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
07:35 sgimeno@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink backend 15th round of wikis (T308141) (duration: 07m 45s)
07:29 sgimeno@deploy2002: sgimeno: Continuing with sync
07:28 sgimeno@deploy2002: sgimeno: Backport for GrowthExperiments: enable AddLink backend 15th round of wikis (T308141) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:27 sgimeno@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink backend 15th round of wikis (T308141)
07:24 sgimeno@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139) (duration: 09m 05s)
07:19 sgimeno@deploy2002: sgimeno: Continuing with sync
07:17 sgimeno@deploy2002: sgimeno: Backport for GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:15 sgimeno@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink frontend 14th round of wikis (T308139)
05:46 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:45 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:45 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:45 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:44 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:24 kart_: Updated cxserver to 2023-10-11-045323-production (T341478, T344982, T338432, T347939)
05:21 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:21 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:19 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:18 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:11 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:10 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
03:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52896 and previous config saved to /var/cache/conftool/dbconfig/20231011-030054-arnaudb.json
03:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
03:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
03:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T343198)', diff saved to https://phabricator.wikimedia.org/P52895 and previous config saved to /var/cache/conftool/dbconfig/20231011-030032-arnaudb.json
02:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P52894 and previous config saved to /var/cache/conftool/dbconfig/20231011-024526-arnaudb.json
02:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P52893 and previous config saved to /var/cache/conftool/dbconfig/20231011-023019-arnaudb.json
02:18 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
02:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T343198)', diff saved to https://phabricator.wikimedia.org/P52892 and previous config saved to /var/cache/conftool/dbconfig/20231011-021513-arnaudb.json
02:03 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1104.mgmt.eqiad.wmnet with reboot policy FORCED
02:02 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1104
02:01 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1104

2023-10-10

22:45 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir5001.eqsin.wmnet with OS bookworm
22:41 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
22:40 cstone: SmashPig upgraded from a78a91d9 to 211284b9
22:13 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
21:45 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f6-eqiad
21:43 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f6-eqiad
21:34 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
21:33 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir5001.eqsin.wmnet with OS bookworm
20:48 taavi@deploy2002: Finished scap: Backport for Set READ_NEW for CA wikis on OATHAuth multiple devices (T242031) (duration: 08m 24s)
20:43 taavi@deploy2002: taavi: Continuing with sync
20:41 taavi@deploy2002: taavi: Backport for Set READ_NEW for CA wikis on OATHAuth multiple devices (T242031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:40 taavi@deploy2002: Started scap: Backport for Set READ_NEW for CA wikis on OATHAuth multiple devices (T242031)
20:19 hmonroy@deploy2002: Finished scap: Backport for diffs: add line number headings to inline diffs (T346460) (duration: 30m 26s)
20:17 eileen: civicrm upgraded from 4329014b to f2f1e23e
20:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
20:13 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ncredir5001.eqsin.wmnet with OS bookworm
20:07 hmonroy@deploy2002: musikanimal and hmonroy: Continuing with sync
20:07 hmonroy@deploy2002: musikanimal and hmonroy: Backport for diffs: add line number headings to inline diffs (T346460) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:49 hmonroy@deploy2002: Started scap: Backport for diffs: add line number headings to inline diffs (T346460)
19:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T343198)', diff saved to https://phabricator.wikimedia.org/P52890 and previous config saved to /var/cache/conftool/dbconfig/20231010-194311-arnaudb.json
19:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
19:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
19:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52889 and previous config saved to /var/cache/conftool/dbconfig/20231010-194249-arnaudb.json
19:33 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
19:33 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
19:33 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
19:32 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
19:32 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
19:31 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
19:29 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: changing bgp rr config
19:29 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: changing bgp rr config
19:29 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: changing bgp rr config
19:29 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: changing bgp rr config
19:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P52888 and previous config saved to /var/cache/conftool/dbconfig/20231010-192742-arnaudb.json
19:26 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
19:26 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
19:26 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
19:25 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
19:24 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
19:23 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:22 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
19:22 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5001.eqsin.wmnet with OS bookworm
19:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P52887 and previous config saved to /var/cache/conftool/dbconfig/20231010-191236-arnaudb.json
18:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52886 and previous config saved to /var/cache/conftool/dbconfig/20231010-185730-arnaudb.json
18:15 bvibber: brion running TimedMediaHandler requeueTranscodes.php batch jobs on mwmaint2002. expect many deletions & new file stores on swift
18:11 ejegg: fundraising python tools upgraded from 2e19cd39 to 0c17296c
18:10 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: changing bgp rr config
18:09 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: changing bgp rr config
18:07 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: changing bgp rr config
18:06 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: changing bgp rr config
18:01 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
17:59 cmooney@cumin1001: START - Cookbook sre.dns.netbox
17:56 topranks: disable BGP RR_CLIENT peerings on lsw1-e1-eqiad
17:52 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f5-eqiad
17:50 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f5-eqiad
17:46 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e6-eqiad
17:44 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e6-eqiad
17:41 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e5-eqiad
17:39 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e5-eqiad
17:23 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f7-eqiad
17:22 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f7-eqiad
17:21 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e7-eqiad
17:21 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e7-eqiad
17:15 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
17:14 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
17:14 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
17:13 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add eqiad new row switches - cmooney@cumin1001"
16:32 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:18 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
16:17 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cp1101 - jclark@cumin1001"
16:14 jclark@cumin1001: START - Cookbook sre.dns.netbox
16:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:09 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:06 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:03 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:03 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:02 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:00 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
16:00 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
15:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
15:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
15:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
15:54 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
15:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
15:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
15:46 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
15:34 vriley@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp1100']
15:23 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
15:23 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
15:06 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
14:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
14:10 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
14:06 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
14:06 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
14:05 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
14:05 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
14:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
13:58 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
13:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1063.eqiad.wmnet with OS bullseye
13:57 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
13:54 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
13:54 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
13:52 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
13:52 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
13:50 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
13:49 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:48 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:44 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:44 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:40 urbanecm@deploy2002: Finished scap: Backport for Growth: Enable Welcome survey user research for enwiki (T342353) (duration: 13m 19s)
13:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:37 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:36 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:35 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
13:33 urbanecm@deploy2002: urbanecm: Continuing with sync
13:32 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
13:32 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
13:29 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
13:28 urbanecm@deploy2002: urbanecm: Backport for Growth: Enable Welcome survey user research for enwiki (T342353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
13:26 urbanecm@deploy2002: Started scap: Backport for Growth: Enable Welcome survey user research for enwiki (T342353)
13:26 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:25 urbanecm@deploy2002: Finished scap: Backport for cswiki: Remove engineer group (T348279) (duration: 07m 24s)
13:24 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:24 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
13:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
13:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
13:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:20 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:19 urbanecm@deploy2002: urbanecm: Continuing with sync
13:19 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
13:19 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:19 urbanecm@deploy2002: urbanecm: Backport for cswiki: Remove engineer group (T348279) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:18 urbanecm@deploy2002: Started scap: Backport for cswiki: Remove engineer group (T348279)
13:17 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:17 urbanecm@deploy2002: Finished scap: Backport for growth: Enable section-image recommendations on 10 new wikis (T345940) (duration: 09m 59s)
13:16 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:15 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:11 urbanecm@deploy2002: urbanecm: Continuing with sync
13:08 urbanecm@deploy2002: urbanecm: Backport for growth: Enable section-image recommendations on 10 new wikis (T345940) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:07 urbanecm@deploy2002: Started scap: Backport for growth: Enable section-image recommendations on 10 new wikis (T345940)
13:02 fnegri@cumin1001: START - Cookbook sre.dns.netbox
12:19 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100.eqiad.wmnet']
12:18 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100.eqiad.wmnet']
12:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
12:01 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
11:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T343198)', diff saved to https://phabricator.wikimedia.org/P52885 and previous config saved to /var/cache/conftool/dbconfig/20231010-114024-arnaudb.json
11:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
11:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
11:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T343198)', diff saved to https://phabricator.wikimedia.org/P52884 and previous config saved to /var/cache/conftool/dbconfig/20231010-114002-arnaudb.json
11:33 volans: installed spicerack 7.4.1 on the cumin hosts
11:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.mediawiki.restart-appservers (exit_code=0)
11:32 cgoubert@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
11:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.mediawiki.restart-appservers (exit_code=0)
11:29 cgoubert@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
11:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P52883 and previous config saved to /var/cache/conftool/dbconfig/20231010-112456-arnaudb.json
11:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P52882 and previous config saved to /var/cache/conftool/dbconfig/20231010-110950-arnaudb.json
10:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T343198)', diff saved to https://phabricator.wikimedia.org/P52880 and previous config saved to /var/cache/conftool/dbconfig/20231010-105443-arnaudb.json
10:52 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100.eqiad.wmnet']
10:52 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100.eqiad.wmnet']
09:56 ladsgroup@deploy2002: Finished scap: Backport for Set pagelinks migration stage of cebwiki to write both (T345732) (duration: 09m 10s)
09:50 ladsgroup@deploy2002: ladsgroup: Continuing with sync
09:48 ladsgroup@deploy2002: ladsgroup: Backport for Set pagelinks migration stage of cebwiki to write both (T345732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:47 ladsgroup@deploy2002: Started scap: Backport for Set pagelinks migration stage of cebwiki to write both (T345732)
09:33 volans: uploaded spicerack_7.4.1 to apt.wikimedia.org bullseye-wikimedia
08:35 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.30 refs T347081
08:24 taavi: wikitech-static: cleanup image archive directory: T348503
08:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T343198)', diff saved to https://phabricator.wikimedia.org/P52879 and previous config saved to /var/cache/conftool/dbconfig/20231010-080924-arnaudb.json
08:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
08:09 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
08:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
08:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
08:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T343198)', diff saved to https://phabricator.wikimedia.org/P52878 and previous config saved to /var/cache/conftool/dbconfig/20231010-080847-arnaudb.json
08:00 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
07:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P52877 and previous config saved to /var/cache/conftool/dbconfig/20231010-075340-arnaudb.json
07:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P52876 and previous config saved to /var/cache/conftool/dbconfig/20231010-073834-arnaudb.json
07:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T343198)', diff saved to https://phabricator.wikimedia.org/P52875 and previous config saved to /var/cache/conftool/dbconfig/20231010-072327-arnaudb.json
07:19 kostajh: UTC morning deploys done
07:18 kharlan@deploy2002: Finished scap: Backport for ReportIncident: Set developer mode to false (duration: 10m 17s)
07:12 kharlan@deploy2002: kharlan: Continuing with sync
07:09 kharlan@deploy2002: kharlan: Backport for ReportIncident: Set developer mode to false synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:08 kharlan@deploy2002: Started scap: Backport for ReportIncident: Set developer mode to false
06:42 moritzm: installing qemu security updates on bookworm
03:54 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.28 (duration: 02m 08s)
03:52 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.30 refs T347081 (duration: 49m 56s)
03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.30 refs T347081

2023-10-09

22:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T343198)', diff saved to https://phabricator.wikimedia.org/P52873 and previous config saved to /var/cache/conftool/dbconfig/20231009-225429-arnaudb.json
22:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
22:54 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
22:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T343198)', diff saved to https://phabricator.wikimedia.org/P52872 and previous config saved to /var/cache/conftool/dbconfig/20231009-225407-arnaudb.json
22:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P52871 and previous config saved to /var/cache/conftool/dbconfig/20231009-223900-arnaudb.json
22:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P52870 and previous config saved to /var/cache/conftool/dbconfig/20231009-222354-arnaudb.json
22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T343198)', diff saved to https://phabricator.wikimedia.org/P52869 and previous config saved to /var/cache/conftool/dbconfig/20231009-220848-arnaudb.json
20:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1156.eqiad.wmnet
20:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1156.eqiad.wmnet
20:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1155.eqiad.wmnet
20:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1155.eqiad.wmnet
20:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1154.eqiad.wmnet
20:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1154.eqiad.wmnet
20:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1153.eqiad.wmnet
20:09 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1153.eqiad.wmnet
20:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1152.eqiad.wmnet
20:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1152.eqiad.wmnet
20:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1151.eqiad.wmnet
19:54 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1151.eqiad.wmnet
19:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1150.eqiad.wmnet
19:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1150.eqiad.wmnet
19:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1149.eqiad.wmnet
19:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1149.eqiad.wmnet
19:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1148.eqiad.wmnet
19:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1148.eqiad.wmnet
19:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1147.eqiad.wmnet
19:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T343198)', diff saved to https://phabricator.wikimedia.org/P52868 and previous config saved to /var/cache/conftool/dbconfig/20231009-193219-arnaudb.json
19:32 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
19:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
19:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1147.eqiad.wmnet
19:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1146.eqiad.wmnet
19:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1146.eqiad.wmnet
19:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1145.eqiad.wmnet
19:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1145.eqiad.wmnet
19:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1144.eqiad.wmnet
19:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1144.eqiad.wmnet
19:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1143.eqiad.wmnet
18:55 ladsgroup@deploy2002: Finished scap: Backport for Update interwiki cache (duration: 100m 07s)
18:54 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1143.eqiad.wmnet
18:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1142.eqiad.wmnet
18:49 ladsgroup@deploy2002: ladsgroup: Continuing with sync
18:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1142.eqiad.wmnet
18:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1141.eqiad.wmnet
18:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1141.eqiad.wmnet
18:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1140.eqiad.wmnet
18:36 mforns@deploy2002: Finished deploy [airflow-dags/analytics@c334eaf]: (no justification provided) (duration: 01m 12s)
18:35 mforns@deploy2002: Started deploy [airflow-dags/analytics@c334eaf]: (no justification provided)
18:33 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1140.eqiad.wmnet
18:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1139.eqiad.wmnet
18:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1139.eqiad.wmnet
18:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1138.eqiad.wmnet
18:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1138.eqiad.wmnet
18:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1137.eqiad.wmnet
18:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1137.eqiad.wmnet
18:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1136.eqiad.wmnet
17:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1136.eqiad.wmnet
17:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1135.eqiad.wmnet
17:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1135.eqiad.wmnet
17:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1134.eqiad.wmnet
17:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1134.eqiad.wmnet
17:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1133.eqiad.wmnet
17:35 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1133.eqiad.wmnet
17:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1132.eqiad.wmnet
17:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1132.eqiad.wmnet
17:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1131.eqiad.wmnet
17:24 ladsgroup@deploy2002: ladsgroup: Backport for Update interwiki cache synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1131.eqiad.wmnet
17:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1130.eqiad.wmnet
17:15 ladsgroup@deploy2002: Started scap: Backport for Update interwiki cache
17:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1130.eqiad.wmnet
17:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1129.eqiad.wmnet
17:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1129.eqiad.wmnet
17:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1128.eqiad.wmnet
16:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1128.eqiad.wmnet
16:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1127.eqiad.wmnet
16:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1127.eqiad.wmnet
16:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1126.eqiad.wmnet
16:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1126.eqiad.wmnet
16:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1125.eqiad.wmnet
16:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1125.eqiad.wmnet
16:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1124.eqiad.wmnet
16:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1124.eqiad.wmnet
16:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1123.eqiad.wmnet
16:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1123.eqiad.wmnet
16:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1122.eqiad.wmnet
16:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1122.eqiad.wmnet
16:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1121.eqiad.wmnet
16:11 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:11 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
16:03 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1121.eqiad.wmnet
16:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1120.eqiad.wmnet
15:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1120.eqiad.wmnet
15:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1119.eqiad.wmnet
15:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1119.eqiad.wmnet
15:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1118.eqiad.wmnet
15:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1118.eqiad.wmnet
15:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1117.eqiad.wmnet
15:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1117.eqiad.wmnet
15:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1116.eqiad.wmnet
15:31 moritzm: installing qemu security updates on bookworm
15:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1116.eqiad.wmnet
15:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1115.eqiad.wmnet
15:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1115.eqiad.wmnet
15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1114.eqiad.wmnet
15:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1114.eqiad.wmnet
15:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1113.eqiad.wmnet
15:09 volans: installed spicerack 7.4.0 to cumin2002
15:08 moritzm: installing nftables bugfix updates from Bookworm point release
15:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1113.eqiad.wmnet
15:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1112.eqiad.wmnet
14:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1112.eqiad.wmnet
14:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1111.eqiad.wmnet
14:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1111.eqiad.wmnet
14:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1110.eqiad.wmnet
14:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1110.eqiad.wmnet
14:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1109.eqiad.wmnet
14:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1109.eqiad.wmnet
14:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1108.eqiad.wmnet
14:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1108.eqiad.wmnet
14:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1107.eqiad.wmnet
14:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1107.eqiad.wmnet
14:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1106.eqiad.wmnet
14:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1106.eqiad.wmnet
14:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1105.eqiad.wmnet
14:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1105.eqiad.wmnet
14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1104.eqiad.wmnet
13:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
13:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
13:54 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1104.eqiad.wmnet
13:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1103.eqiad.wmnet
13:48 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
13:48 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
13:48 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
13:47 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
13:47 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
13:46 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
13:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1103.eqiad.wmnet
13:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1102.eqiad.wmnet
13:46 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
13:46 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
13:46 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
13:43 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
13:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1102.eqiad.wmnet
13:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
13:35 volans: uploaded spicerack_7.4.0 to apt.wikimedia.org bullseye-wikimedia
13:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
13:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
13:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
13:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
13:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
13:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
13:06 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
13:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
12:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
12:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1095.eqiad.wmnet
12:52 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1095.eqiad.wmnet
12:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1094.eqiad.wmnet
12:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1094.eqiad.wmnet
12:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1093.eqiad.wmnet
12:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1093.eqiad.wmnet
12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1092.eqiad.wmnet
12:35 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1092.eqiad.wmnet
12:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1091.eqiad.wmnet
12:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1091.eqiad.wmnet
12:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1090.eqiad.wmnet
12:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1090.eqiad.wmnet
12:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1089.eqiad.wmnet
12:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1089.eqiad.wmnet
12:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1088.eqiad.wmnet
12:10 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1088.eqiad.wmnet
12:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1087.eqiad.wmnet
12:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1087.eqiad.wmnet
12:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1086.eqiad.wmnet
11:51 godog: restart k8s-aux in eqiad to pick up new certs - T343529
11:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1086.eqiad.wmnet
11:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1085.eqiad.wmnet
11:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1085.eqiad.wmnet
11:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1084.eqiad.wmnet
11:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1084.eqiad.wmnet
11:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1083.eqiad.wmnet
11:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1083.eqiad.wmnet
11:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1082.eqiad.wmnet
11:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1082.eqiad.wmnet
11:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1081.eqiad.wmnet
11:06 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1081.eqiad.wmnet
11:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet
11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
11:00 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet
11:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1079.eqiad.wmnet
10:59 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
10:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
10:53 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1079.eqiad.wmnet
10:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1078.eqiad.wmnet
10:50 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:48 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1078.eqiad.wmnet
10:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1077.eqiad.wmnet
10:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1077.eqiad.wmnet
10:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1076.eqiad.wmnet
10:29 moritzm: installing Linux 6.1.55 on Bookworm hosts
10:29 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1076.eqiad.wmnet
10:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1075.eqiad.wmnet
10:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1075.eqiad.wmnet
10:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1074.eqiad.wmnet
10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
10:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1074.eqiad.wmnet
10:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1073.eqiad.wmnet
10:10 ladsgroup@deploy2002: Finished scap: Backport for Set virtual domain mapping for url shortener (T330590) (duration: 15m 35s)
10:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
10:05 ladsgroup@deploy2002: ladsgroup: Continuing with sync
10:04 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1002:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.40.0-wmf.17/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.40.0-wmf.17/cache/l10n/ /srv/mediawiki/php-1.40.0-wmf.17/cache/ /srv/mediawiki/php-1.40.0-wmf.17/ # clean up old l10n cache'
10:03 ladsgroup@deploy2002: ladsgroup: Backport for Set virtual domain mapping for url shortener (T330590) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:55 ladsgroup@deploy2002: Started scap: Backport for Set virtual domain mapping for url shortener (T330590)
09:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1073.eqiad.wmnet
09:49 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host analytics1072.eqiad.wmnet
09:07 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1072.eqiad.wmnet
09:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1071.eqiad.wmnet
09:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1071.eqiad.wmnet
09:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1070.eqiad.wmnet
08:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host analytics1070.eqiad.wmnet
08:53 moritzm: rebuilt bookworm d-i image for the Bookworm 12.2 point release T348326
08:23 moritzm: rebuilt bullseye d-i image for the Bullseye 11.9 point release T348327
07:06 taavi: kill stuck updateSpecialPages.php process on mwmaint2002 which was trying to re-connect to an unreachable db host
07:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on db2109.codfw.wmnet with reason: investigating db2109
07:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on db2109.codfw.wmnet with reason: investigating db2109

2023-10-08

22:58 ryankemper: [WDQS] Depooled `wdqs1014` while it catches up on a day of lag
22:57 ryankemper: [WDQS] Restarted `wdqs1014`; blazegraph has been deadlocked since `2023-10-07 12:30:00`

2023-10-07

09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T343198)', diff saved to https://phabricator.wikimedia.org/P52863 and previous config saved to /var/cache/conftool/dbconfig/20231007-092249-arnaudb.json
09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P52862 and previous config saved to /var/cache/conftool/dbconfig/20231007-090742-arnaudb.json
08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P52861 and previous config saved to /var/cache/conftool/dbconfig/20231007-085236-arnaudb.json
08:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T343198)', diff saved to https://phabricator.wikimedia.org/P52860 and previous config saved to /var/cache/conftool/dbconfig/20231007-083729-arnaudb.json
02:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1030.eqiad.wmnet
02:33 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1030.eqiad.wmnet

2023-10-06

23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2054.codfw.wmnet with OS bullseye
23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2054.codfw.wmnet with reason: host reimage
22:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2054.codfw.wmnet with reason: host reimage
22:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T343198)', diff saved to https://phabricator.wikimedia.org/P52859 and previous config saved to /var/cache/conftool/dbconfig/20231006-224306-arnaudb.json
22:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
22:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
22:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52858 and previous config saved to /var/cache/conftool/dbconfig/20231006-224245-arnaudb.json
22:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P52857 and previous config saved to /var/cache/conftool/dbconfig/20231006-222738-arnaudb.json
22:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2054.codfw.wmnet with OS bullseye
22:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P52856 and previous config saved to /var/cache/conftool/dbconfig/20231006-221232-arnaudb.json
21:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52855 and previous config saved to /var/cache/conftool/dbconfig/20231006-215725-arnaudb.json
20:45 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:45 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
20:35 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:34 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
20:29 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:29 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
20:11 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:10 bking@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
19:46 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
19:45 bking@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
19:44 bking@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:43 bking@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:43 bking@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
19:41 bking@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
19:40 bking@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
19:39 bking@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
18:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3b7df78]: Update rdf-spark-tools to 0.3.135 to fix query mapping job failure (duration: 00m 29s)
18:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3b7df78]: Update rdf-spark-tools to 0.3.135 to fix query mapping job failure
18:42 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
18:32 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
18:31 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
18:31 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
18:30 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp1101
18:30 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp1101
17:10 pt1979@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
17:10 pt1979@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
17:08 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100.eqiad.wmnet']
17:08 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100.eqiad.wmnet']
17:05 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
17:05 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
17:03 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
17:03 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
17:02 vriley@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp1100']
17:02 vriley@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp1100']
16:54 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
16:41 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
16:37 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
16:28 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
16:27 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
16:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1063.eqiad.wmnet with OS bullseye
16:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1067.eqiad.wmnet with OS bullseye
16:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1064.eqiad.wmnet with OS bullseye
16:13 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
15:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
14:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
14:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
14:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
14:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
14:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
14:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
14:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2054.codfw.wmnet with OS bullseye
14:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-master1003.eqiad.wmnet with OS bullseye
14:44 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:42 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
14:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1003.eqiad.wmnet with reason: host reimage
14:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
14:22 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1003.eqiad.wmnet with reason: host reimage
14:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
14:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:02 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-master1004.eqiad.wmnet with OS bullseye
13:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:53 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
13:52 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2054.codfw.wmnet with OS bullseye
13:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
13:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
13:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "ganeti-test2004 - ayounsi@cumin1001"
13:26 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "ganeti-test2004 - ayounsi@cumin1001"
13:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
13:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1004.eqiad.wmnet with reason: host reimage
13:18 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1004.eqiad.wmnet with reason: host reimage
13:17 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:17 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:03 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti-test2004.codfw.wmnet with OS bullseye
13:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
12:29 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:29 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52852 and previous config saved to /var/cache/conftool/dbconfig/20231006-122022-arnaudb.json
12:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
12:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
12:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52851 and previous config saved to /var/cache/conftool/dbconfig/20231006-122000-arnaudb.json
12:17 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:15 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:15 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:15 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:14 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:13 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
12:13 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti-test2004.codfw.wmnet with OS bullseye
12:11 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:10 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P52850 and previous config saved to /var/cache/conftool/dbconfig/20231006-120454-arnaudb.json
12:02 moritzm: rebalancing ganeti row D/eqiad
11:55 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
11:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P52848 and previous config saved to /var/cache/conftool/dbconfig/20231006-114947-arnaudb.json
11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52847 and previous config saved to /var/cache/conftool/dbconfig/20231006-113441-arnaudb.json
10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
10:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
10:21 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2023.codfw.wmnet to cluster codfw and group A
10:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2023.codfw.wmnet to cluster codfw and group A
10:13 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
10:13 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
10:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host apt1002.wikimedia.org
10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apt1002.wikimedia.org with OS bookworm
09:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apt1002.wikimedia.org with reason: host reimage
09:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on apt1002.wikimedia.org with reason: host reimage
09:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host apt1002.wikimedia.org with OS bookworm
09:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt1002.wikimedia.org - jmm@cumin2002"
09:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt1002.wikimedia.org - jmm@cumin2002"
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt1002.wikimedia.org on all recursors
09:26 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache apt1002.wikimedia.org on all recursors
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt1002.wikimedia.org - jmm@cumin2002"
09:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt1002.wikimedia.org - jmm@cumin2002"
09:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:22 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host apt1002.wikimedia.org
09:19 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host apt2002.wikimedia.org
09:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt2002.wikimedia.org on all recursors
09:19 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache apt2002.wikimedia.org on all recursors
09:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:19 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM apt2002.wikimedia.org - jmm@cumin2002"
09:18 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM apt2002.wikimedia.org - jmm@cumin2002"
09:11 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt2002.wikimedia.org on all recursors
09:11 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache apt2002.wikimedia.org on all recursors
09:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt2002.wikimedia.org - jmm@cumin2002"
09:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt2002.wikimedia.org - jmm@cumin2002"
09:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:05 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host apt2002.wikimedia.org
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
09:03 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
09:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
09:03 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
08:43 moritzm: installing vim security updates
08:26 elukey@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
08:24 elukey@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
08:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2023.codfw.wmnet to cluster codfw and group A
08:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2023.codfw.wmnet to cluster codfw and group A
08:18 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2023.codfw.wmnet with OS bullseye
07:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2023.codfw.wmnet with reason: host reimage
07:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2023.codfw.wmnet with reason: host reimage
06:53 moritzm: installing bind9 security updates (client side libs/tools only)
06:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2023.codfw.wmnet with OS bullseye
02:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T343198)', diff saved to https://phabricator.wikimedia.org/P52843 and previous config saved to /var/cache/conftool/dbconfig/20231006-020509-arnaudb.json
02:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
02:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
02:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52842 and previous config saved to /var/cache/conftool/dbconfig/20231006-020447-arnaudb.json
01:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P52841 and previous config saved to /var/cache/conftool/dbconfig/20231006-014941-arnaudb.json
01:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P52840 and previous config saved to /var/cache/conftool/dbconfig/20231006-013434-arnaudb.json
01:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52839 and previous config saved to /var/cache/conftool/dbconfig/20231006-011928-arnaudb.json
00:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
00:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
00:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
00:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
00:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
00:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
00:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
00:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti-test2004.codfw.wmnet with OS bullseye

2023-10-05

23:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
23:22 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-master1003.mgmt.eqiad.wmnet with reboot policy FORCED
23:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1003.mgmt.eqiad.wmnet with reboot policy FORCED
23:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
23:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
23:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
22:59 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
22:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
22:37 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
21:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
21:17 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2020.codfw.wmnet: Maybe cleanup leaked file descriptors(?) - eevans@cumin1001
21:07 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2020.codfw.wmnet: Maybe cleanup leaked file descriptors(?) - eevans@cumin1001
21:03 thcipriani@deploy2002: Finished scap: Backport for [foundationwiki] Add Endowment, Agenda, Committee, and Memory namespaces (T347762 T347822 T348268), [foundationwiki] Provide 'translationadmin' group with 'edit-legal' right (T346187) (duration: 09m 56s)
20:58 thcipriani@deploy2002: thcipriani and varnent: Continuing with sync
20:57 eileen: civicrm upgraded from 05545fbc to 4329014b
20:55 thcipriani@deploy2002: thcipriani and varnent: Backport for [foundationwiki] Add Endowment, Agenda, Committee, and Memory namespaces (T347762 T347822 T348268), [foundationwiki] Provide 'translationadmin' group with 'edit-legal' right (T346187) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:54 thcipriani@deploy2002: Started scap: Backport for [foundationwiki] Add Endowment, Agenda, Committee, and Memory namespaces (T347762 T347822 T348268), [foundationwiki] Provide 'translationadmin' group with 'edit-legal' right (T346187)
20:49 thcipriani@deploy2002: Finished scap: Backport for [Prototype] Add screen resolution to Typography prototype, [Prototype] Edit project link page on reading prototype (duration: 23m 57s)
20:39 thcipriani@deploy2002: jdrewniak and thcipriani: Continuing with sync
20:37 thcipriani@deploy2002: jdrewniak and thcipriani: Backport for [Prototype] Add screen resolution to Typography prototype, [Prototype] Edit project link page on reading prototype synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:25 thcipriani@deploy2002: Started scap: Backport for [Prototype] Add screen resolution to Typography prototype, [Prototype] Edit project link page on reading prototype
20:22 thcipriani@deploy2002: Finished scap: Backport for Enable Minerva site notice for Nepali Wikipedia (newiki) (T347814) (duration: 08m 57s)
20:16 thcipriani@deploy2002: ammarpad and thcipriani: Continuing with sync
20:14 thcipriani@deploy2002: ammarpad and thcipriani: Backport for Enable Minerva site notice for Nepali Wikipedia (newiki) (T347814) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
20:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
20:13 thcipriani@deploy2002: Started scap: Backport for Enable Minerva site notice for Nepali Wikipedia (newiki) (T347814)
18:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
18:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti-test2004']
18:47 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test2004']
18:45 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox
18:34 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.29 refs T347080
18:17 sukhe: running authdns-update: T347054
18:15 jhuneidi@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.29 refs T347080 (duration: 06m 12s)
18:08 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.29 refs T347080
17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti-test2004']
17:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test2004']
17:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
17:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
17:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
17:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:57 bvibber: scaling back batch jobs for T312153 and T312152, will run these in further chunks as the new config rolls out
16:47 bvibber: brion running requeueTranscodes.php on mwmaint2002 for VP9 transcode cleanup for T312153
16:22 volans: installed 7.3.1 on cumin1001
16:19 jbond@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling reboot on A:puppetboard
16:15 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
16:15 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
16:12 dcausse: cleaning up rdf-streaming-updater-staging swift bucket
16:11 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
16:10 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
16:10 jbond@cumin2002: START - Cookbook sre.puppetboard.restart-reboot rolling reboot on A:puppetboard
16:10 jbond@cumin2002: END (ERROR) - Cookbook sre.puppet.renew-cert (exit_code=97) for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
16:09 jbond@cumin2002: START - Cookbook sre.puppet.renew-cert for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
16:07 cgoubert@deploy2002: Finished scap: Testing mw-on-k8s deployment for T348228 (duration: 02m 15s)
16:06 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
16:05 cgoubert@deploy2002: Started scap: Testing mw-on-k8s deployment for T348228
16:05 jbond@cumin2002: END (PASS) - Cookbook sre.puppetboard.restart-reboot (exit_code=0) rolling reboot on A:puppetboard
16:05 jbond@cumin2002: START - Cookbook sre.puppet.renew-cert for sretest1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin2002
16:01 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet. on all recursors
16:01 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet. on all recursors
16:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52837 and previous config saved to /var/cache/conftool/dbconfig/20231005-160030-arnaudb.json
16:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
16:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
16:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52836 and previous config saved to /var/cache/conftool/dbconfig/20231005-160009-arnaudb.json
15:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
15:37 volans: installed 7.3.1 on cumin2002
15:36 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
15:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:31 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
15:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti-test2004.codfw.wmnet with OS bullseye
15:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster2001.codfw.wmnet
15:30 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster2001.codfw.wmnet
15:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P52834 and previous config saved to /var/cache/conftool/dbconfig/20231005-152956-arnaudb.json
15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
15:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2023.codfw.wmnet with reason: reimage to bullseye
15:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2023.codfw.wmnet with reason: reimage to bullseye
15:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster2001.codfw.wmnet with reason: Pick up vcpu change
15:25 claime: rebooting kubemaster2001.codfw.wmnet - T348228
15:25 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster2001.codfw.wmnet with reason: Pick up vcpu change
15:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster2002.codfw.wmnet
15:24 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster2002.codfw.wmnet
15:20 claime: rebooting kubemaster2002.codfw.wmnet - T348228
15:20 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster2002.codfw.wmnet with reason: Pick up vcpu change
15:19 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster2002.codfw.wmnet with reason: Pick up vcpu change
15:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd2004.codfw.wmnet
15:16 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd2004.codfw.wmnet
15:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52832 and previous config saved to /var/cache/conftool/dbconfig/20231005-151450-arnaudb.json
15:13 claime: rebooting kubetcd2004.codfw.wmnet - T348228
15:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd2004.codfw.wmnet with reason: Pick up vcpu change
15:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2023.codfw.wmnet
15:12 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd2004.codfw.wmnet with reason: Pick up vcpu change
15:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd2005.codfw.wmnet
15:10 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd2005.codfw.wmnet
15:10 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd2005.codfw.wmnet with reason: Pick up vcpu change
15:09 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd2005.codfw.wmnet with reason: Pick up vcpu change
15:09 claime: rebooting kubetcd2005.codfw.wmnet - T348228
15:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd2006.codfw.wmnet
15:08 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd2006.codfw.wmnet
15:07 claime: rebooting kubetcd2006.codfw.wmnet - T348228
15:07 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd2006.codfw.wmnet with reason: Pick up vcpu change
15:07 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd2006.codfw.wmnet with reason: Pick up vcpu change
15:06 claime: Bumping kubetcd200[4-6].eqiad.wmnet vcpu to 2 - T348228
15:04 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster1001.eqiad.wmnet
15:03 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster1001.eqiad.wmnet
15:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
15:03 claime: rebooting kubemaster1001.eqiad.wmnet - T348228
15:03 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster1001.eqiad.wmnet with reason: Pick up vcpu change
14:59 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster1001.eqiad.wmnet with reason: Pick up vcpu change
14:57 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubemaster1002.eqiad.wmnet
14:57 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubemaster1002.eqiad.wmnet
14:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
14:53 claime: rebooting kubemaster1002.eqiad.wmnet - T348228
14:53 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubemaster1002.eqiad.wmnet with reason: Pick up vcpu change
14:53 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubemaster1002.eqiad.wmnet with reason: Pick up vcpu change
14:52 claime: Bumping kubemaster100[1-2].eqiad.wmnet vcpu to 2, ram to 4G - T348228
14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd1004.eqiad.wmnet
14:50 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd1004.eqiad.wmnet
14:47 claime: rebooting kubetcd1004.eqiad.wmnet - T348228
14:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd1004.eqiad.wmnet with reason: Pick up vcpu change
14:47 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd1004.eqiad.wmnet with reason: Pick up vcpu change
14:46 claime: rebooted kubetcd1005.eqiad.wmnet - T348228
14:46 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd1005.eqiad.wmnet
14:46 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd1005.eqiad.wmnet
14:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd1005.eqiad.wmnet with reason: Pick up vcpu change
14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd1005.eqiad.wmnet with reason: Pick up vcpu change
14:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubetcd1006.eqiad.wmnet
14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubetcd1006.eqiad.wmnet
14:41 claime: rebooting kubetcd1006.eqiad.wmnet - T348228
14:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kubetcd1006.eqiad.wmnet with reason: Pick up vcpu change
14:41 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kubetcd1006.eqiad.wmnet with reason: Pick up vcpu change
14:38 claime: Bumping kubetcd100[4-6].eqiad.wmnet vcpu to 2 - T348228
14:38 claime: Bumping kubectd100[4-6].eqiad.wmnet vcpu to 2 - T348228
14:33 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:33 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
14:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2004.codfw.wmnet with OS bullseye
14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti-test2004']
14:22 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test2004']
14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
14:18 Lucas_WMDE: UTC afternoon backport+config window done
14:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
14:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Revert "Use HookHandlers for core hooks" (T348181) (duration: 08m 50s)
14:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
14:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
14:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
14:09 lucaswerkmeister-wmde@deploy2002: umherirrender and lucaswerkmeister-wmde: Continuing with sync
14:09 lucaswerkmeister-wmde@deploy2002: umherirrender and lucaswerkmeister-wmde: Backport for Revert "Use HookHandlers for core hooks" (T348181) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
14:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
14:08 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Revert "Use HookHandlers for core hooks" (T348181)
14:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
14:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
14:04 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
13:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
13:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
13:49 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Drop old VP8 video transcodes, enable HLS on testwiki (T312152 T309823) (duration: 12m 07s)
13:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:44 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:42 lucaswerkmeister-wmde@deploy2002: brion and lucaswerkmeister-wmde: Continuing with sync
13:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
13:38 lucaswerkmeister-wmde@deploy2002: brion and lucaswerkmeister-wmde: Backport for Drop old VP8 video transcodes, enable HLS on testwiki (T312152 T309823) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:36 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Drop old VP8 video transcodes, enable HLS on testwiki (T312152 T309823)
13:36 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:36 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
13:32 urandom: starting Cassandra rebuild, restbase1030-c — T346803
13:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
13:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
13:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
13:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
13:15 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
13:14 urbanecm@deploy2002: Finished scap: Backport for [Growth] enwiki: Enable mentorship for 50% of new users (T341399) (duration: 10m 08s)
13:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1002.eqiad.wmnet
13:08 claime: respawning two misbehaving thumbor pods in codfw
13:08 urbanecm@deploy2002: urbanecm: Continuing with sync
13:05 urbanecm@deploy2002: urbanecm: Backport for [Growth] enwiki: Enable mentorship for 50% of new users (T341399) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
13:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
13:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host matomo1002.eqiad.wmnet
13:04 urbanecm@deploy2002: Started scap: Backport for [Growth] enwiki: Enable mentorship for 50% of new users (T341399)
12:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
12:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
12:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
12:51 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:50 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1001.wikimedia.org
12:42 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
12:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1001.wikimedia.org
12:38 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
12:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet
12:27 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:27 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
12:27 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts puppetboard2002.codfw.wmnet,puppetboard1002.eqiad.wmnet
12:27 jbond@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
12:26 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
12:24 jbond@cumin1001: START - Cookbook sre.dns.netbox
12:22 jbond@cumin1001: START - Cookbook sre.dns.netbox
12:13 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetdb2002.codfw.wmnet,puppetdb1002.eqiad.wmnet
12:10 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetboard2002.codfw.wmnet,puppetboard1002.eqiad.wmnet
12:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1063']
12:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
12:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1063']
12:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1064']
12:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1064']
12:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
12:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
12:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
12:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
11:57 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet2005-dev
11:57 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2005-dev
11:46 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab1004.wikimedia.org to gitlab2002.wikimedia.org
11:36 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
11:36 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
11:24 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
11:24 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
11:23 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
11:23 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
11:23 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
11:23 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab.wikimedia.org/ https://gitlab-replica.wikimedia.org/ on all recursors
10:23 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet
10:23 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:23 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
10:21 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
10:16 klausman@cumin1001: START - Cookbook sre.dns.netbox
10:09 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts orespoolcounter[2003-2004].codfw.wmnet,orespoolcounter[1003-1004].eqiad.wmnet
10:09 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores1001.eqiad.wmnet
10:09 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:09 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
10:08 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
10:00 klausman@cumin1001: START - Cookbook sre.dns.netbox
09:59 moritzm: installing python2.7 security updates
09:55 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores1001.eqiad.wmnet
09:01 jelto@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab1004.wikimedia.org to gitlab2002.wikimedia.org
07:59 moritzm: installing jetty9 security updates
07:51 godog: bounce vopsbot on alert1001
05:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52831 and previous config saved to /var/cache/conftool/dbconfig/20231005-055637-arnaudb.json
05:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
05:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
05:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T343198)', diff saved to https://phabricator.wikimedia.org/P52830 and previous config saved to /var/cache/conftool/dbconfig/20231005-055615-arnaudb.json
05:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P52829 and previous config saved to /var/cache/conftool/dbconfig/20231005-054109-arnaudb.json
05:29 denisse: Deleting old Jenkins builds on pcc-worker1002 to free disk space
05:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P52828 and previous config saved to /var/cache/conftool/dbconfig/20231005-052602-arnaudb.json
05:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T343198)', diff saved to https://phabricator.wikimedia.org/P52827 and previous config saved to /var/cache/conftool/dbconfig/20231005-051056-arnaudb.json
02:50 eileen: civicrm upgraded from 44800fc0 to 05545fbc

2023-10-04

23:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
23:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
23:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
22:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1065.eqiad.wmnet with OS bullseye
22:40 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
22:39 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
22:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
22:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
22:21 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
22:18 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
22:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2054.codfw.wmnet with OS bullseye
22:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
22:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
22:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
22:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
22:05 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
22:02 urandom: starting Cassandra rebuild, restbase1030-b — T346803
22:02 brennen@deploy2002: Finished scap: Backport for Revert "Deprecate TOC mutation in OutputPageParserOutput hook" (T348134) (duration: 09m 13s)
21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
21:58 volans: uploaded spicerack_7.3.1 to apt.wikimedia.org bullseye-wikimedia
21:56 brennen@deploy2002: brennen and ssastry: Continuing with sync
21:54 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1066.eqiad.wmnet with OS bullseye
21:54 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:54 brennen@deploy2002: brennen and ssastry: Backport for Revert "Deprecate TOC mutation in OutputPageParserOutput hook" (T348134) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:53 brennen@deploy2002: Started scap: Backport for Revert "Deprecate TOC mutation in OutputPageParserOutput hook" (T348134)
21:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
21:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
21:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1062.eqiad.wmnet with OS bullseye
21:40 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
21:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
21:34 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
21:23 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
21:20 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
21:02 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
20:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1062.eqiad.wmnet with OS bullseye
20:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2054.codfw.wmnet with OS bullseye
20:54 urbanecm@deploy2002: Finished scap: Backport for SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), Fix phan for GrowthExperiments (T347571) (duration: 07m 49s)
20:48 urbanecm@deploy2002: urbanecm: Continuing with sync
20:48 urbanecm@deploy2002: urbanecm: Backport for SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), Fix phan for GrowthExperiments (T347571) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:46 urbanecm@deploy2002: Started scap: Backport for SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), SpecialManageMentors: Skip OOUI initialization when transcluding (T346760), Fix phan for GrowthExperiments (T347571)
20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bullseye
20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
20:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
20:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
20:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
20:21 eileen: re-enable process control (more better hopefully) config revision changed from 89231b1b to d66626f6
19:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T343198)', diff saved to https://phabricator.wikimedia.org/P52826 and previous config saved to /var/cache/conftool/dbconfig/20231004-195023-arnaudb.json
19:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
19:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
19:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
19:49 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
19:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T343198)', diff saved to https://phabricator.wikimedia.org/P52825 and previous config saved to /var/cache/conftool/dbconfig/20231004-194946-arnaudb.json
19:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P52824 and previous config saved to /var/cache/conftool/dbconfig/20231004-193439-arnaudb.json
19:33 eileen: config revision changed from 89231b1b to d66626f6
19:30 eileen: civicrm upgraded from 169c3288 to 44800fc0
19:29 eileen: config revision changed from 4ae7bd71 to 89231b1b
19:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P52823 and previous config saved to /var/cache/conftool/dbconfig/20231004-191933-arnaudb.json
19:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
19:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
19:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T343198)', diff saved to https://phabricator.wikimedia.org/P52822 and previous config saved to /var/cache/conftool/dbconfig/20231004-190427-arnaudb.json
18:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
18:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
18:19 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.29 refs T347080
18:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1066.eqiad.wmnet with OS bullseye
18:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1062.eqiad.wmnet with OS bullseye
18:09 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.29 refs T347080
17:52 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
17:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testreduce1002.eqiad.wmnet
17:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1067.eqiad.wmnet with OS bullseye
17:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testreduce1002.eqiad.wmnet
17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bullseye
17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1065']
17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1064']
17:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
17:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1062']
17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1063.eqiad.wmnet with OS bullseye
17:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1064.eqiad.wmnet with OS bullseye
17:22 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:22 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1066.eqiad.wmnet with OS bullseye
17:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1065.eqiad.wmnet with OS bullseye
17:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1062.eqiad.wmnet with OS bullseye
16:59 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963326 (T347837). `purged` daemon will be restarted by puppet in esams in the next 30m
16:54 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
16:49 taavi: taavi@mwmaint2002 ~ $ mwscript extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php metawiki | tee T242031-sul.log # T242031
16:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
16:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
16:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1067']
16:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1066']
16:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
16:28 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
16:27 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
16:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
16:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
16:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
16:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
16:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
16:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
16:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
16:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
16:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
16:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
16:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
16:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
16:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
16:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
16:23 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
16:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bullseye
16:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bullseye
16:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
16:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
16:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
16:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
16:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1067']
16:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1066']
16:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1067']
16:21 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1066']
16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1063']
16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1065']
16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1064']
16:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1062']
16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1065']
16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1064']
16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1063']
16:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1062']
16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bullseye
16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bullseye
16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bullseye
16:07 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bullseye
15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
15:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
15:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:45 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1062.mgmt.eqiad.wmnet with reboot policy FORCED
15:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1065.mgmt.eqiad.wmnet with reboot policy FORCED
15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1066.mgmt.eqiad.wmnet with reboot policy FORCED
15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
15:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
15:39 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
15:37 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:32 hashar@deploy2002: Finished deploy [integration/docroot@b3b712f]: (no justification provided) (duration: 00m 06s)
15:32 hashar@deploy2002: Started deploy [integration/docroot@b3b712f]: (no justification provided)
15:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
15:17 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
15:13 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1067.mgmt.eqiad.wmnet with reboot policy FORCED
15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1066.mgmt.eqiad.wmnet with reboot policy FORCED
15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1065.mgmt.eqiad.wmnet with reboot policy FORCED
15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1064.mgmt.eqiad.wmnet with reboot policy FORCED
15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1063.mgmt.eqiad.wmnet with reboot policy FORCED
15:12 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirt1062.mgmt.eqiad.wmnet with reboot policy FORCED
15:08 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:08 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudvirt1062-67 - jclark@cumin1001"
15:07 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudvirt1062-67 - jclark@cumin1001"
15:05 jclark@cumin1001: START - Cookbook sre.dns.netbox
15:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
14:59 taavi: revoke a bot password, https://phabricator.wikimedia.org/T348132
14:56 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
14:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
14:39 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ores[2001-2004].codfw.wmnet
14:39 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:39 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores[2001-2004].codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
14:38 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ores[1002-1009].eqiad.wmnet
14:38 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:38 Lucas_WMDE: spontaneously extended UTC afternoon backport+config window done now
14:37 klausman@cumin1001: START - Cookbook sre.dns.netbox
14:36 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores[2001-2004].codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
14:34 klausman@cumin1001: START - Cookbook sre.dns.netbox
14:31 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for prod: Enable wgCampaignEventsEnableEmail in meta and officewiki (T347065) (duration: 18m 26s)
14:25 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Continuing with sync
14:24 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963321 (T347837). `purged` daemon will be restarted by puppet in drmrs in the next 30m
14:22 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
14:22 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores[2001-2004].codfw.wmnet
14:21 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2009.codfw.wmnet
14:21 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:21 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
14:20 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
14:18 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores[1002-1009].eqiad.wmnet
14:18 klausman@cumin1001: START - Cookbook sre.dns.netbox
14:18 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2006.codfw.wmnet
14:17 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:17 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2006.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
14:17 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2005.codfw.wmnet
14:17 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:17 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2006.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
14:16 urandom: starting Cassandra rebuild, restbase1030-a — T346803
14:16 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ores2007.codfw.wmnet
14:16 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:16 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2007.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
14:15 klausman@cumin1001: START - Cookbook sre.dns.netbox
14:14 klausman@cumin1001: START - Cookbook sre.dns.netbox
14:14 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Backport for prod: Enable wgCampaignEventsEnableEmail in meta and officewiki (T347065) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:13 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2007.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
14:12 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2009.codfw.wmnet
14:12 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for prod: Enable wgCampaignEventsEnableEmail in meta and officewiki (T347065)
14:10 klausman@cumin1001: START - Cookbook sre.dns.netbox
14:10 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2005.codfw.wmnet
14:09 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ores2008.codfw.wmnet
14:09 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:08 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
14:08 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2006.codfw.wmnet
14:07 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ores2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - klausman@cumin1001"
14:05 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2007.codfw.wmnet
14:04 klausman@cumin1001: START - Cookbook sre.dns.netbox
14:00 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for beta: Explicitly assign campaignevents-email-participants to all users (T336939), metawiki: Restrict campaignevents-email-participants right (T336939) (duration: 10m 40s)
13:57 klausman@cumin1001: START - Cookbook sre.hosts.decommission for hosts ores2008.codfw.wmnet
13:54 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Continuing with sync
13:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
13:51 lucaswerkmeister-wmde@deploy2002: daimona and lucaswerkmeister-wmde: Backport for beta: Explicitly assign campaignevents-email-participants to all users (T336939), metawiki: Restrict campaignevents-email-participants right (T336939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:49 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for beta: Explicitly assign campaignevents-email-participants to all users (T336939), metawiki: Restrict campaignevents-email-participants right (T336939)
13:47 Lucas_WMDE: mwscript namespaceDupes fonwiki --fix # T347939 – 0 pages to fix, 0 resolvable; 0 links to fix, 0 resolvable, 0 deleted
13:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for fonwiki: add wgSiteName, wgMetaNamespace and timezone (T347939) (duration: 13m 46s)
13:34 lucaswerkmeister-wmde@deploy2002: anzx and lucaswerkmeister-wmde: Backport for fonwiki: add wgSiteName, wgMetaNamespace and timezone (T347939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:33 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for fonwiki: add wgSiteName, wgMetaNamespace and timezone (T347939)
13:27 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963147 (T347837). `purged` daemon will be restarted by puppet in eqiad in the next 30m
13:25 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
13:25 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
13:24 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
13:24 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
13:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
13:23 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
13:20 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for fonwiki: add logos (T347939) (duration: 11m 43s)
13:19 rook@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
13:14 urandom: Cassandra bootstrap, restbase1030-a (`auto_bootstrap: false`) — T346803
13:14 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and anzx: Continuing with sync
13:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and anzx: Backport for fonwiki: add logos (T347939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:09 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for fonwiki: add logos (T347939)
13:03 rook@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
13:00 rook@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
12:56 klausman: powering off orespoolcounter{1004,2003,2004}.{eqiad,codfw}.wmnet (1003 is kept powered-on in case we need access to files from the old install). The machines have a 90d downtime already put in.
12:53 klausman: powering off ores200{2..9}.codfw.wmnet (2001 is kept powered-on in case we need access to files from the old install). The machines have a 90d downtime already put in.
12:51 klausman: powering off ores100{2..9}.eqiad.wmnet (1001 is kept powered-on in case we need access to files from the old install). The machines have a 90d downtime already put in
12:46 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 90 days, 0:00:00 on 22 hosts with reason: Downtime for graceful shutdown and later decom
12:46 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 90 days, 0:00:00 on 22 hosts with reason: Downtime for graceful shutdown and later decom
12:43 rook@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
11:45 moritzm: installing exim4 security updates
11:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2004.codfw.wmnet with OS bullseye
11:30 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:20 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
11:14 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
11:14 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:03 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Zsoo out of all services on: 2175 hosts
11:02 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Zsoo out of all services on: 2175 hosts
10:58 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
10:29 filippo@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['thanos-fe2004']
10:29 filippo@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe2004']
10:21 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
10:20 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe2004.codfw.wmnet with OS bullseye
10:20 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
10:20 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
10:20 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
10:20 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
10:20 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
10:02 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
10:02 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe2004.codfw.wmnet with OS bullseye
09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T343198)', diff saved to https://phabricator.wikimedia.org/P52817 and previous config saved to /var/cache/conftool/dbconfig/20231004-094320-arnaudb.json
09:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
09:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
09:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T343198)', diff saved to https://phabricator.wikimedia.org/P52816 and previous config saved to /var/cache/conftool/dbconfig/20231004-094258-arnaudb.json
09:39 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
09:39 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
09:39 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
09:39 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
09:39 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
09:38 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
09:38 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
09:38 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
09:38 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
09:38 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
09:38 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
09:38 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
09:38 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
09:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
09:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
09:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
09:37 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
09:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
09:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
09:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
09:35 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
09:33 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe2004.codfw.wmnet with OS bullseye
09:28 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging TsepoThoabala out of all services on: 2175 hosts
09:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P52815 and previous config saved to /var/cache/conftool/dbconfig/20231004-092752-arnaudb.json
09:27 jmm@cumin2002: START - Cookbook sre.idm.logout Logging TsepoThoabala out of all services on: 2175 hosts
09:26 sg912@deploy2002: Finished deploy [airflow-dags/analytics@3b374a9]: (no justification provided) (duration: 00m 45s)
09:25 sg912@deploy2002: Started deploy [airflow-dags/analytics@3b374a9]: (no justification provided)
09:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P52814 and previous config saved to /var/cache/conftool/dbconfig/20231004-091245-arnaudb.json
09:08 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging KMorgan out of all services on: 2175 hosts
09:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging KMorgan out of all services on: 2175 hosts
09:02 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
09:01 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
08:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T343198)', diff saved to https://phabricator.wikimedia.org/P52813 and previous config saved to /var/cache/conftool/dbconfig/20231004-085739-arnaudb.json
08:44 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging EllenR out of all services on: 2175 hosts
08:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging EllenR out of all services on: 2175 hosts
08:19 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
08:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2003.codfw.wmnet with OS bullseye
08:01 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Eigyan out of all services on: 2176 hosts
08:00 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Eigyan out of all services on: 2176 hosts
07:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: host reimage
07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: host reimage
07:34 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2003.codfw.wmnet with OS bullseye
07:19 XioNoX: Remove static routes for anycast prefixes - T347494
06:30 moritzm: installing glibc security updates
06:19 Surbhi_: Deployed refinery using scap, then deployed onto hdfs
05:54 sg912@deploy2002: Finished deploy [analytics/refinery@e954b12] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@e954b12a] (duration: 03m 00s)
05:51 sg912@deploy2002: Started deploy [analytics/refinery@e954b12] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@e954b12a]
05:50 sg912@deploy2002: Finished deploy [analytics/refinery@e954b12] (thin): Regular analytics weekly train THIN [analytics/refinery@e954b12a] (duration: 00m 06s)
05:50 sg912@deploy2002: Started deploy [analytics/refinery@e954b12] (thin): Regular analytics weekly train THIN [analytics/refinery@e954b12a]
05:49 sg912@deploy2002: Finished deploy [analytics/refinery@e954b12]: Regular analytics weekly train [analytics/refinery@e954b12a] (duration: 06m 02s)
05:43 sg912@deploy2002: Started deploy [analytics/refinery@e954b12]: Regular analytics weekly train [analytics/refinery@e954b12a]
03:56 kart_: Updated cxserver to 2023-09-28-043003-production (T343450, T347389, T338689)
03:56 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
03:55 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
03:51 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
03:51 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
03:48 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
03:48 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-10-03

23:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T343198)', diff saved to https://phabricator.wikimedia.org/P52812 and previous config saved to /var/cache/conftool/dbconfig/20231003-234343-arnaudb.json
23:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
23:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
23:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T343198)', diff saved to https://phabricator.wikimedia.org/P52811 and previous config saved to /var/cache/conftool/dbconfig/20231003-234322-arnaudb.json
23:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P52810 and previous config saved to /var/cache/conftool/dbconfig/20231003-232815-arnaudb.json
23:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P52809 and previous config saved to /var/cache/conftool/dbconfig/20231003-231309-arnaudb.json
22:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T343198)', diff saved to https://phabricator.wikimedia.org/P52808 and previous config saved to /var/cache/conftool/dbconfig/20231003-225803-arnaudb.json
22:22 jdrewniak@deploy2002: Finished scap: Backport for Web typography prototype survey (T347208), Correct a recently-added message, [Prototype] Change i18n message (T347208) (duration: 39m 08s)
22:11 jdrewniak@deploy2002: jdrewniak: Continuing with sync
22:01 jdrewniak@deploy2002: jdrewniak: Backport for Web typography prototype survey (T347208), Correct a recently-added message, [Prototype] Change i18n message (T347208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:43 jdrewniak@deploy2002: Started scap: Backport for Web typography prototype survey (T347208), Correct a recently-added message, [Prototype] Change i18n message (T347208)
21:32 jdrewniak@deploy2002: Finished scap: Backport for Promote several Wikipedias to Vector 2022 as default skin (T347321) (duration: 09m 26s)
21:26 jdrewniak@deploy2002: jdlrobson and jdrewniak: Continuing with sync
21:24 jdrewniak@deploy2002: jdlrobson and jdrewniak: Backport for Promote several Wikipedias to Vector 2022 as default skin (T347321) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:23 jdrewniak@deploy2002: Started scap: Backport for Promote several Wikipedias to Vector 2022 as default skin (T347321)
20:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
20:56 eileen: tools upgraded from 130ca87e to 2e19cd39
20:50 jdrewniak@deploy2002: Finished scap: Backport for Re-enable Extension:ParserMigration on labs (T333179) (duration: 38m 52s)
20:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
20:35 jdrewniak@deploy2002: jdrewniak and sbailey: Continuing with sync
20:34 jdrewniak@deploy2002: jdrewniak and sbailey: Backport for Re-enable Extension:ParserMigration on labs (T333179) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:16 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963081 (T347837). `purged` daemon will be restarted by puppet in eqsin in the next 30m
20:11 jdrewniak@deploy2002: Started scap: Backport for Re-enable Extension:ParserMigration on labs (T333179)
19:41 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
19:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
19:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
19:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
19:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
19:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
19:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
19:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
18:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
18:25 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.29 refs T347080
18:13 jhuneidi@deploy2002: Pruned MediaWiki: 1.41.0-wmf.27 (duration: 02m 14s)
18:11 jhuneidi@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.29 refs T347080 (duration: 43m 24s)
17:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
17:34 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
17:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
17:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
17:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1027.eqiad.wmnet
17:28 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1027.eqiad.wmnet
17:27 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.29 refs T347080
17:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1027.eqiad.wmnet with OS bullseye
17:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
17:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
17:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
17:02 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:59 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
16:59 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked kubernetes2054 hosts in codfw - jhancock@cumin2002"
16:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1027.eqiad.wmnet with reason: host reimage
16:50 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1027.eqiad.wmnet with reason: host reimage
16:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1027.eqiad.wmnet with OS bullseye
16:36 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1027.eqiad.wmnet
16:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1027.eqiad.wmnet
16:24 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1027.eqiad.wmnet
16:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1027.eqiad.wmnet
16:20 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
16:20 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
16:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
16:19 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
16:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
16:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
16:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1026.eqiad.wmnet with OS bullseye
16:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
16:06 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
16:04 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
16:03 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
16:03 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
16:03 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
16:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
16:01 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
15:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
15:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
15:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
15:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
15:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
15:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
15:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1026.eqiad.wmnet with reason: host reimage
15:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
15:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1026.eqiad.wmnet with reason: host reimage
15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:26 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:26 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
15:24 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1026.eqiad.wmnet with OS bullseye
15:24 ottomata: mw-page-content-change-enrich - backfill is done, set replicas to 2 in eqiad and codfw
15:23 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:23 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1026.eqiad.wmnet
15:23 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1026.eqiad.wmnet
15:22 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
15:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
15:11 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1026.eqiad.wmnet
15:10 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1026.eqiad.wmnet
15:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1033.eqiad.wmnet
15:10 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1033.eqiad.wmnet
15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@6f19600]: deploy to phab1004 for T348007 (duration: 00m 44s)
15:06 brennen@deploy2002: Started deploy [phabricator/deployment@6f19600]: deploy to phab1004 for T348007
15:06 brennen@deploy2002: Finished deploy [phabricator/deployment@6f19600]: test deploy to phab2002 for T348007 (duration: 00m 32s)
15:06 brennen@deploy2002: Started deploy [phabricator/deployment@6f19600]: test deploy to phab2002 for T348007
15:05 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
15:05 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab1004.eqiad.wmnet with reason: Phabricator deploys
14:55 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1004']
14:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - ayounsi@cumin1001
14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:48 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.4 - ayounsi@cumin1001
14:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
14:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:46 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1003']
14:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
14:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
14:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
14:45 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
14:44 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
14:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
14:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:42 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['an-master1003']
14:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
14:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
14:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
14:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
14:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004']
14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004']
14:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
14:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
14:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
14:38 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
14:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003']
14:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
14:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
14:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
14:37 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
14:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
14:37 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
14:37 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
14:37 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
14:37 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:36 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:36 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
14:36 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
14:36 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
14:36 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
14:36 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
14:36 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
14:36 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:35 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:35 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:35 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
14:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1033.eqiad.wmnet with OS bullseye
14:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2002.codfw.wmnet with OS bullseye
14:01 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963020 (T347837). `purged` daemon will be restarted by puppet in codfw in the next 30m
14:01 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
13:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
13:58 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-master1003
13:57 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-master1003
13:57 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
13:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-master1004
13:50 klausman@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:50 klausman@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Revert allocation of LVS VIPs for recommendation-api-ng - klausman@cumin1001"
13:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host an-master1004
13:49 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1001.eqiad.wmnet
13:49 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:49 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: host reimage
13:48 klausman@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Revert allocation of LVS VIPs for recommendation-api-ng - klausman@cumin1001"
13:48 taavi@cumin1001: START - Cookbook sre.dns.netbox
13:46 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: host reimage
13:44 klausman@cumin1001: START - Cookbook sre.dns.netbox
13:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1033.eqiad.wmnet with OS bullseye
13:43 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
13:43 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
13:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1025.eqiad.wmnet
13:42 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1025.eqiad.wmnet
13:41 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
13:38 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:38 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
13:38 ottomata: mw-page-content-change-enrich codfw - bump to 1.27.0 and set replicas to 12 while processing backlog - T347676
13:34 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:34 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
13:34 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1025.eqiad.wmnet with OS bullseye
13:34 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1003.eqiad.wmnet
13:34 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:34 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
13:34 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1026.eqiad.wmnet
13:33 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1026.eqiad.wmnet
13:33 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
13:30 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:30 otto@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
13:30 taavi@cumin1001: START - Cookbook sre.dns.netbox
13:27 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2002.codfw.wmnet with OS bullseye
13:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T343198)', diff saved to https://phabricator.wikimedia.org/P52807 and previous config saved to /var/cache/conftool/dbconfig/20231003-132733-arnaudb.json
13:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
13:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
13:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T343198)', diff saved to https://phabricator.wikimedia.org/P52806 and previous config saved to /var/cache/conftool/dbconfig/20231003-132700-arnaudb.json
13:23 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1003.eqiad.wmnet
13:23 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1002.eqiad.wmnet
13:23 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:23 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
13:22 samtar@deploy2002: Finished scap: Backport for arwiki: add importsources (T347563), add throttle rules for Ada Lovelace Day October 10, 2023 and fix throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T347719) (duration: 09m 03s)
13:21 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
13:19 taavi@cumin1001: START - Cookbook sre.dns.netbox
13:16 samtar@deploy2002: anzx and samtar: Continuing with sync
13:14 samtar@deploy2002: anzx and samtar: Backport for arwiki: add importsources (T347563), add throttle rules for Ada Lovelace Day October 10, 2023 and fix throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T347719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:13 samtar@deploy2002: Started scap: Backport for arwiki: add importsources (T347563), add throttle rules for Ada Lovelace Day October 10, 2023 and fix throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T347719)
13:12 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
13:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P52805 and previous config saved to /var/cache/conftool/dbconfig/20231003-131154-arnaudb.json
13:10 samtar@deploy2002: Finished scap: Backport for New donor experience stream for apps event schema (duration: 08m 26s)
13:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1025.eqiad.wmnet with reason: host reimage
13:04 samtar@deploy2002: sharvaniharan and samtar: Continuing with sync
13:03 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1025.eqiad.wmnet with reason: host reimage
13:03 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2001.codfw.wmnet with OS bullseye
13:03 samtar@deploy2002: sharvaniharan and samtar: Backport for New donor experience stream for apps event schema synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:01 samtar@deploy2002: Started scap: Backport for New donor experience stream for apps event schema
12:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P52804 and previous config saved to /var/cache/conftool/dbconfig/20231003-125647-arnaudb.json
12:50 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1025.eqiad.wmnet with OS bullseye
12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: host reimage
12:42 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: host reimage
12:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T343198)', diff saved to https://phabricator.wikimedia.org/P52803 and previous config saved to /var/cache/conftool/dbconfig/20231003-124141-arnaudb.json
12:23 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe2001.codfw.wmnet with OS bullseye
11:54 fabfur: merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/963004 (T347837). `purged` daemon will be restarted by puppet in ulsfo in the next 30m
11:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1004.eqiad.wmnet with OS bullseye
11:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1004.eqiad.wmnet with reason: host reimage
11:29 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:29 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
11:29 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1004.eqiad.wmnet with reason: host reimage
11:29 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
11:26 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:11 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
10:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1003.eqiad.wmnet with OS bullseye
10:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix katran-test.svc.eqiad.wmnet IP allocation - vgutierrez@cumin1001"
10:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: host reimage
10:34 vgutierrez@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix katran-test.svc.eqiad.wmnet IP allocation - vgutierrez@cumin1001"
10:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: host reimage
10:32 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
10:30 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
10:19 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
10:15 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1003.eqiad.wmnet with OS bullseye
09:50 claime: Uncordoned kubernetes2010.codfw.wmnet
09:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2010.codfw.wmnet
09:49 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2010.codfw.wmnet
09:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1002.eqiad.wmnet with OS bullseye
09:42 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
09:42 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
09:38 ladsgroup@deploy2002: Finished scap: Creating fonwiki (T347935) (duration: 07m 34s)
09:30 ladsgroup@deploy2002: Started scap: Creating fonwiki (T347935)
09:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: BIOS setting change
09:28 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: BIOS setting change
09:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: host reimage
09:26 claime: Draining kubernetes2010.codfw.wmnet for reboot to change BIOS setting
09:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: host reimage
09:07 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1002.eqiad.wmnet with OS bullseye
09:06 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:06 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:05 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
08:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Check chassis internals for GPU hosting
08:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ml-staging2001.codfw.wmnet with reason: Check chassis internals for GPU hosting
08:26 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1001.eqiad.wmnet with OS bullseye
08:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
08:14 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:13 cmooney@cumin1001: START - Cookbook sre.dns.netbox
08:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
08:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
08:03 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: host reimage
08:01 taavi: taavi@mwmaint2002 ~ $ mwscript resetAuthenticationThrottle.php --wiki=enwiki --signup --ip=155.232.7.202 # T347874
07:59 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: host reimage
07:57 taavi@deploy2002: Finished scap: T347874 and T347069 (duration: 29m 22s)
07:42 taavi@deploy2002: taavi: Continuing with sync
07:42 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1001.eqiad.wmnet with OS bullseye
07:40 taavi@deploy2002: taavi: T347874 and T347069 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:27 taavi@deploy2002: Started scap: T347874 and T347069
07:03 kart_: Updated MinT to 2023-09-28-043052-production (T343450, T341478)
07:03 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
06:59 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
06:56 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
06:51 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
06:45 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:42 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
06:42 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
06:42 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
05:52 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on druid1009.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
05:52 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on druid1009.eqiad.wmnet with reason: Downtime as we setup the host to join the druid and zookeper cluster
04:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
04:20 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
04:20 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
04:13 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
04:12 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
04:11 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
04:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
04:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
04:09 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
04:08 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
04:08 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
04:07 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
04:05 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
04:05 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
04:05 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
03:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T343198)', diff saved to https://phabricator.wikimedia.org/P52802 and previous config saved to /var/cache/conftool/dbconfig/20231003-034640-arnaudb.json
03:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
03:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
02:41 krinkle@deploy2002: Finished scap: (no justification provided) (duration: 07m 34s)
02:33 krinkle@deploy2002: Started scap: (no justification provided)
02:17 krinkle@deploy2002: Synchronized docroot/noc/: (no justification provided) (duration: 08m 03s)
01:48 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1033.eqiad.wmnet
01:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1033.eqiad.wmnet
01:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1033.eqiad.wmnet
01:34 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1033.eqiad.wmnet
01:33 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1025.eqiad.wmnet
01:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1025.eqiad.wmnet
01:21 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1025.eqiad.wmnet
01:18 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet
01:06 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1025.eqiad.wmnet
01:06 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet
00:39 ejegg: fundraising civicrm upgraded from c1b28287 to 995a3d5b
00:38 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1033.eqiad.wmnet
00:29 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1033.eqiad.wmnet
00:28 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1033.eqiad.wmnet
00:28 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1033.eqiad.wmnet
00:28 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1025.eqiad.wmnet
00:28 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet

2023-10-02

23:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1032.eqiad.wmnet with OS bullseye
22:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
22:43 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
22:30 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS bullseye
22:30 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1032.eqiad.wmnet with OS bullseye
22:16 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS bullseye
22:09 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1032.eqiad.wmnet
22:01 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host restbase1032.eqiad.wmnet
22:01 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1032.eqiad.wmnet
22:00 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
21:53 maryum: Deployed patch for T347704
21:32 kindrobot: end UTC late backport window
21:28 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1029.eqiad.wmnet
21:28 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1032.eqiad.wmnet
21:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
21:22 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1032.eqiad.wmnet
21:22 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
21:21 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1032.eqiad.wmnet
21:21 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
21:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1029.eqiad.wmnet with OS bullseye
21:17 kindrobot@deploy2002: Finished scap: Backport for Ignore only site notices (T347645), HookUtils: Fix checking page props (T347878), Fix diff title escaping (T347578), Diff: Add missing .mw-diff-inline-moved selector (duration: 10m 06s)
21:11 kindrobot@deploy2002: kindrobot and matmarex: Continuing with sync
21:09 kindrobot@deploy2002: kindrobot and matmarex: Backport for Ignore only site notices (T347645), HookUtils: Fix checking page props (T347878), Fix diff title escaping (T347578), Diff: Add missing .mw-diff-inline-moved selector synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:07 kindrobot@deploy2002: Started scap: Backport for Ignore only site notices (T347645), HookUtils: Fix checking page props (T347878), Fix diff title escaping (T347578), Diff: Add missing .mw-diff-inline-moved selector
20:59 ottomata: mw-page-content-change-enrich - CORRECTION - increase replicas to 20 to process backlog - T347676
20:58 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:58 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
20:57 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1029.eqiad.wmnet with reason: host reimage
20:57 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:57 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
20:56 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:56 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
20:56 ottomata: mw-page-content-change-enrich - increase replicas to 24 to process backlog - T347676
20:54 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1029.eqiad.wmnet with reason: host reimage
20:42 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
20:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1029.eqiad.wmnet with OS bullseye
20:40 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
20:37 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1029.eqiad.wmnet with OS bullseye
20:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
20:36 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1029.eqiad.wmnet with OS bullseye
20:35 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
20:32 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1029.eqiad.wmnet with OS bullseye
20:31 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
20:27 ottomata: mw-page-content-change-enrich - increase replicas to 12 to process backlog - T347676
20:27 kindrobot@deploy2002: Finished scap: Backport for Undeploy Reader Demographics 2 pilot survey (T345951), DiscussionTools: Disable timestamp links in production initially (duration: 08m 49s)
20:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
20:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
20:21 kindrobot@deploy2002: esanders and dani and kindrobot: Continuing with sync
20:19 kindrobot@deploy2002: esanders and dani and kindrobot: Backport for Undeploy Reader Demographics 2 pilot survey (T345951), DiscussionTools: Disable timestamp links in production initially synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:18 kindrobot@deploy2002: Started scap: Backport for Undeploy Reader Demographics 2 pilot survey (T345951), DiscussionTools: Disable timestamp links in production initially
20:13 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:13 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
20:12 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:12 eileen: process control revision changed from b370644b to 9760851c
20:12 otto@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
20:12 eileen: revision changed from b370644b to 9760851c
20:11 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1029.eqiad.wmnet with OS bullseye
20:11 kindrobot@deploy2002: Backport cancelled.
20:01 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
20:01 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1029.eqiad.wmnet with OS bullseye
19:54 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1029.eqiad.wmnet with OS bullseye
19:53 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1029.eqiad.wmnet
19:53 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1029.eqiad.wmnet
19:53 moritzm: installing libvpx security updates
19:41 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1029.eqiad.wmnet
19:40 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
19:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1024.eqiad.wmnet with OS bullseye
19:38 eileen: civicrm upgraded from 7406cdf3 to c1b28287
19:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1024.eqiad.wmnet with reason: host reimage
19:16 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1024.eqiad.wmnet with reason: host reimage
19:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1003']
19:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003']
19:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
19:02 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1024.eqiad.wmnet with OS bullseye
19:02 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1029.eqiad.wmnet
19:02 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
19:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
19:00 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1003.eqiad.wmnet']
19:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003.eqiad.wmnet']
19:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1004.eqiad.wmnet with OS bullseye
19:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-master1003.eqiad.wmnet with OS bullseye
18:56 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1024.eqiad.wmnet
18:56 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1024.eqiad.wmnet
18:44 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1024.eqiad.wmnet
18:44 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1024.eqiad.wmnet
18:42 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1023.eqiad.wmnet
18:42 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1023.eqiad.wmnet
18:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1023.eqiad.wmnet with OS bullseye
18:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1023.eqiad.wmnet with reason: host reimage
18:13 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1023.eqiad.wmnet with reason: host reimage
17:59 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1023.eqiad.wmnet with OS bullseye
17:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1022.eqiad.wmnet
17:59 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1022.eqiad.wmnet
17:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
17:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
17:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1004.eqiad.wmnet with OS bullseye
17:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host an-master1003.eqiad.wmnet with OS bullseye
17:38 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1022.eqiad.wmnet with OS bullseye
17:30 sukhe: A:dns-rec enable puppet and run agent
17:24 sukhe: sudo cumin "A:dns-rec" "disable-puppet 'merging CR 962648'"
17:18 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
17:18 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
17:17 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
17:17 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
17:17 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
17:17 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
17:12 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1022.eqiad.wmnet with reason: host reimage
17:09 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1022.eqiad.wmnet with reason: host reimage
17:00 fabfur: upgrade purged package to version 0.21+deb12u1 cp4052 (bookworm) (T347837)
16:56 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1022.eqiad.wmnet with OS bullseye
16:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1031.eqiad.wmnet with OS bullseye
16:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T347624, testing new cookbook changes) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet, repooling both afterwards w/ encryption
16:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer (T347624, testing new cookbook changes) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet, repooling both afterwards w/ encryption
16:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
16:26 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
16:13 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1031.eqiad.wmnet with OS bullseye
16:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1028.eqiad.wmnet with OS bullseye
16:06 fabfur: importing into bookworm-wikimedia package purged_0.21+deb12u1_amd64 (T347837)
15:44 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
15:43 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
15:43 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1028.eqiad.wmnet with reason: host reimage
15:40 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1028.eqiad.wmnet with reason: host reimage
15:29 sukhe: enable puppet on A:dns-rec and force agent run
15:28 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:28 joal@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
15:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1028.eqiad.wmnet with OS bullseye
15:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1028.eqiad.wmnet
15:27 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1028.eqiad.wmnet
15:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1021.eqiad.wmnet
15:24 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1021.eqiad.wmnet
15:23 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab1003.wikimedia.org to gitlab2002.wikimedia.org
15:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
15:20 jelto@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
15:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1021.eqiad.wmnet with OS bullseye
15:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
14:55 elukey: restart kubelet on ml-serve1001 (high latencies registered)
14:51 fabfur: upgrade purged package to version 0.21+deb11u1 on all cp hosts (T347837)
14:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
14:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
14:47 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host ganeti-test2004 - jhancock@cumin2002"
14:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding new host ganeti-test2004 - jhancock@cumin2002"
14:44 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:40 stevemunene@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid public cluster: Roll restart of Druid jvm daemons.
14:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1021.eqiad.wmnet with reason: host reimage
14:34 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1021.eqiad.wmnet with reason: host reimage
14:23 fabfur: importing into bullseye-wikimedia package purged_0.21+deb11u1_amd64 (T347837)
14:20 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1021.eqiad.wmnet with OS bullseye
14:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:18 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1020.eqiad.wmnet with OS bullseye
14:17 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:17 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:15 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:09 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
14:09 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
14:03 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:01 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
13:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1228.eqiad.wmnet with OS bullseye
13:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
13:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS bullseye
13:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
13:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
13:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1020.eqiad.wmnet with reason: host reimage
13:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
13:52 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1020.eqiad.wmnet with reason: host reimage
13:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
13:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1228.eqiad.wmnet with reason: host reimage
13:40 taavi@deploy2002: Finished scap: Backport for Add 'testwikis' DB list to MWMultiVersion::DB_LISTS (T341110) (duration: 11m 15s)
13:39 stevemunene@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
13:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
13:39 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
13:38 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
13:38 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1020.eqiad.wmnet with OS bullseye
13:38 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
13:38 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
13:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
13:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1228.eqiad.wmnet with reason: host reimage
13:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
13:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
13:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
13:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1228.eqiad.wmnet with OS bullseye
13:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS bullseye
13:34 taavi@deploy2002: taavi and dreamyjazz: Continuing with sync
13:30 taavi@deploy2002: taavi and dreamyjazz: Backport for Add 'testwikis' DB list to MWMultiVersion::DB_LISTS (T341110) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:29 taavi@deploy2002: Started scap: Backport for Add 'testwikis' DB list to MWMultiVersion::DB_LISTS (T341110)
13:27 taavi@deploy2002: Sync cancelled.
13:19 taavi@deploy2002: taavi and dreamyjazz: Backport for clienthints: Enable display on testwikis and four production wikis (T341110) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:15 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:13 sukhe: disable puppet on A:dns-rec to merge CR 961818
13:11 taavi@deploy2002: Started scap: Backport for clienthints: Enable display on testwikis and four production wikis (T341110)
13:01 jelto@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab1003.wikimedia.org to gitlab2002.wikimedia.org
12:39 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
12:39 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
12:34 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
12:31 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
12:31 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
12:29 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:29 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack codfw1dev - aborrero@cumin1001"
12:25 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack codfw1dev - aborrero@cumin1001"
12:22 aborrero@cumin1001: START - Cookbook sre.dns.netbox
12:18 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
12:12 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
12:12 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
12:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
12:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
11:56 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
11:55 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
11:55 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
11:55 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
11:55 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
11:54 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
11:51 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/media-analytics: apply
11:49 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:47 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
11:47 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache bastion.bastioninfra-codfw1dev.codfw1dev.wmcloud.org on all recursors
11:46 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:45 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:42 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:42 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:40 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:38 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:35 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
10:58 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
10:58 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
10:55 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
10:54 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
10:49 fabfur: swap purged on cp4040 to use UDS instead of TCP for Varnish (T347837)
10:43 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
10:43 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
10:34 fabfur: depool cp4040 to test new purged version (T347837)
09:48 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
09:47 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
09:06 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1006.eqiad.wmnet with OS bullseye
08:34 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
08:31 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
08:24 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
08:21 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1006
08:21 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1006
08:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
08:17 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
07:49 godog: +150G to prometheus@k8s in codfw
07:47 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.eqiad.wmnet with OS bullseye
07:46 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1006 - taavi@cumin1001"
07:45 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1006 - taavi@cumin1001"
07:37 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
07:37 joal@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
07:36 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:36 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
07:35 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
07:32 taavi@cumin1001: START - Cookbook sre.dns.netbox
07:31 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:31 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
07:30 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1006 - taavi@cumin1001"
07:28 taavi@cumin1001: START - Cookbook sre.dns.netbox
05:11 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
05:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .

2023-10-01

01:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T343198)', diff saved to https://phabricator.wikimedia.org/P52799 and previous config saved to /var/cache/conftool/dbconfig/20231001-013851-arnaudb.json
01:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P52798 and previous config saved to /var/cache/conftool/dbconfig/20231001-012344-arnaudb.json
01:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P52797 and previous config saved to /var/cache/conftool/dbconfig/20231001-010838-arnaudb.json
00:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T343198)', diff saved to https://phabricator.wikimedia.org/P52796 and previous config saved to /var/cache/conftool/dbconfig/20231001-005332-arnaudb.json

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s

.