Server Admin Log/Archive 69

2023-08-15

23:26 hmonroy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Set wikidiff2 maxSplitSize = 10 on group0 wikis T341754 (duration: 07m 39s)
22:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1012.eqiad.wmnet with OS bullseye
22:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1013.eqiad.wmnet with OS bullseye
21:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: host reimage
21:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: host reimage
21:47 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1013.eqiad.wmnet with reason: host reimage
21:47 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: host reimage
21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1013.eqiad.wmnet with OS bullseye
21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1012.eqiad.wmnet with OS bullseye
21:16 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:16 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pdus - robh@cumin1001"
21:15 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pdus - robh@cumin1001"
21:09 robh@cumin1001: START - Cookbook sre.dns.netbox
21:07 robh@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
21:07 robh@cumin1001: START - Cookbook sre.dns.netbox
20:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3010.esams.wmnet with OS bullseye
20:55 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
20:54 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
20:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3010.esams.wmnet with reason: host reimage
20:36 ebernhardson: T342444 start cirrussearch reindex of all wikis to enable new text analysis components from mwmaint1002
20:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3010.esams.wmnet with reason: host reimage
20:20 ryankemper@deploy1002: Finished scap: Backport for elastic: allow only 1 enwiki_content per host (T343820) (duration: 09m 25s)
20:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3008.esams.wmnet with OS bullseye
20:20 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
20:19 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
20:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3080.esams.wmnet with OS bullseye
20:19 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
20:18 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
20:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3072.esams.wmnet with OS bullseye
20:17 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
20:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
20:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3010.esams.wmnet with OS bullseye
20:13 ryankemper@deploy1002: ryankemper: Continuing with sync
20:12 ryankemper@deploy1002: ryankemper: Backport for elastic: allow only 1 enwiki_content per host (T343820) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:11 ryankemper@deploy1002: Started scap: Backport for elastic: allow only 1 enwiki_content per host (T343820)
20:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3008.esams.wmnet with reason: host reimage
20:01 sukhe: running dummy authdns-update
19:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3008.esams.wmnet with reason: host reimage
19:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3080.esams.wmnet with reason: host reimage
19:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
19:51 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3080.esams.wmnet with reason: host reimage
19:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
19:45 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3004.wikimedia.org with OS bullseye
19:45 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
19:44 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
19:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3008.esams.wmnet with OS bullseye
19:32 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "manual trigger - cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002 - brett@cumin2002"
19:32 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "manual trigger - cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002 - brett@cumin2002"
19:31 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:31 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge flink-zk2002 DNS changes - sukhe@cumin2002"
19:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3078.esams.wmnet with OS bullseye
19:31 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
19:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3070.esams.wmnet with OS bullseye
19:30 brett@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
19:30 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge flink-zk2002 DNS changes - sukhe@cumin2002"
19:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3080.esams.wmnet with OS bullseye
19:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3072.esams.wmnet with OS bullseye
19:28 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
19:26 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
19:26 sukhe@cumin2002: START - Cookbook sre.dns.netbox
19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3078.esams.wmnet with reason: host reimage
19:03 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
19:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3070.esams.wmnet with reason: host reimage
18:58 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
18:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3078.esams.wmnet with reason: host reimage
18:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3070.esams.wmnet with reason: host reimage
18:37 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3004.wikimedia.org with OS bullseye
18:36 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns3004.wikimedia.org with OS bullseye
18:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3078.esams.wmnet with OS bullseye
18:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3070.esams.wmnet with OS bullseye
18:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3076.esams.wmnet with OS bullseye
18:32 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
18:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3068.esams.wmnet with OS bullseye
18:30 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
18:30 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
18:28 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
18:26 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3004.wikimedia.org with OS bullseye
18:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3003.wikimedia.org with OS bullseye
18:17 fabfur@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3074.esams.wmnet with OS bullseye
18:16 sukhe@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3066.esams.wmnet with OS bullseye
18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
18:11 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
18:10 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
18:09 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
18:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3076.esams.wmnet with reason: host reimage
18:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3068.esams.wmnet with reason: host reimage
18:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3076.esams.wmnet with reason: host reimage
18:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3068.esams.wmnet with reason: host reimage
17:54 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
17:53 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3074.esams.wmnet with reason: host reimage
17:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3066.esams.wmnet with reason: host reimage
17:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3074.esams.wmnet with reason: host reimage
17:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3066.esams.wmnet with reason: host reimage
17:42 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
17:42 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3076.esams.wmnet with OS bullseye
17:40 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3068.esams.wmnet with OS bullseye
17:39 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
17:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3074.esams.wmnet with OS bullseye
17:21 brett: Upload libvmod-netmapper 1.9-4 (bookworm) to archive - T342154
17:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3066.esams.wmnet with OS bullseye
17:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bullseye
17:02 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3070']
17:01 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3066']
17:01 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3068']
17:00 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3072']
16:57 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
16:57 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
16:56 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3074']
16:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3066']
16:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3068']
16:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3070']
16:54 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3072']
16:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3076']
16:53 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3010']
16:52 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3078']
16:51 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3080']
16:50 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3074']
16:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3066.mgmt.esams.wmnet with reboot policy FORCED
16:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1098.eqiad.wmnet with OS bullseye
16:48 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3074']
16:48 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3074']
16:47 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2002.codfw.wmnet
16:47 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3076']
16:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3078']
16:45 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3080']
16:45 bking@cumin1001: START - Cookbook sre.dns.netbox
16:45 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2002.codfw.wmnet
16:45 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3010']
16:44 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3006']
16:43 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns3004']
16:43 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3010']
16:42 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3008']
16:42 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3008']
16:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns3004']
16:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3006']
16:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3008']
16:36 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3008']
16:33 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3010']
16:32 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3066.mgmt.esams.wmnet with reboot policy FORCED
16:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3070.mgmt.esams.wmnet with reboot policy FORCED
16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3068.mgmt.esams.wmnet with reboot policy FORCED
16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3072.mgmt.esams.wmnet with reboot policy FORCED
16:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3074.mgmt.esams.wmnet with reboot policy FORCED
16:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3078.mgmt.esams.wmnet with reboot policy FORCED
16:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3076.mgmt.esams.wmnet with reboot policy FORCED
16:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1098.eqiad.wmnet with reason: host reimage
16:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:21 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1098.eqiad.wmnet with reason: host reimage
16:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
16:20 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
16:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3068.mgmt.esams.wmnet with reboot policy FORCED
16:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3070.mgmt.esams.wmnet with reboot policy FORCED
16:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3072.mgmt.esams.wmnet with reboot policy FORCED
16:10 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3074.mgmt.esams.wmnet with reboot policy FORCED
16:10 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3076.mgmt.esams.wmnet with reboot policy FORCED
16:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3080.mgmt.esams.wmnet with reboot policy FORCED
16:09 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3078.mgmt.esams.wmnet with reboot policy FORCED
16:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3010.mgmt.esams.wmnet with reboot policy FORCED
16:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns3004.mgmt.esams.wmnet with reboot policy FORCED
16:08 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3008.mgmt.esams.wmnet with reboot policy FORCED
16:05 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1098.eqiad.wmnet with OS bullseye
16:02 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3006.mgmt.esams.wmnet with reboot policy FORCED
16:02 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3008.mgmt.esams.wmnet with reboot policy FORCED
16:00 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
15:59 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
15:58 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 15s)
15:58 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
15:56 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 14s)
15:56 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
15:51 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3080.mgmt.esams.wmnet with reboot policy FORCED
15:51 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns3004.mgmt.esams.wmnet with reboot policy FORCED
15:50 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3006.mgmt.esams.wmnet with reboot policy FORCED
15:50 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3008.mgmt.esams.wmnet with reboot policy FORCED
15:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3009.esams.wmnet with OS bullseye
15:50 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
15:50 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3008.mgmt.esams.wmnet with reboot policy FORCED
15:49 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
15:49 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3010.mgmt.esams.wmnet with reboot policy FORCED
15:44 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:44 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns3004 - robh@cumin1001"
15:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1015.eqiad.wmnet with OS bullseye
15:44 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns3004 - robh@cumin1001"
15:42 robh@cumin1001: START - Cookbook sre.dns.netbox
15:41 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
15:41 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
15:40 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
15:40 bking@cumin1001: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) flink-zk2001.codfw.wmnet on all recursors
15:40 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
15:40 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:40 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
15:40 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3066
15:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
15:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3066
15:39 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3068
15:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3068
15:39 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
15:39 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3070
15:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3070
15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3072
15:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3072
15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3074
15:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3074
15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3076
15:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3076
15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3078
15:37 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3078
15:37 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns3004
15:37 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dns3004
15:37 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3080
15:37 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3080
15:36 bking@cumin1001: START - Cookbook sre.dns.netbox
15:36 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
15:35 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs3008
15:35 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs3008
15:35 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs3010
15:35 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs3010
15:34 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
15:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:33 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack bw27 hosts - robh@cumin1001"
15:32 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack bw27 hosts - robh@cumin1001"
15:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3009.esams.wmnet with reason: host reimage
15:30 bking@cumin1001: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) flink-zk2001.codfw.wmnet on all recursors
15:30 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
15:30 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:30 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
15:29 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
15:29 robh@cumin1001: START - Cookbook sre.dns.netbox
15:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
15:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1014.eqiad.wmnet with OS bullseye
15:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3009.esams.wmnet with reason: host reimage
15:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1097.eqiad.wmnet with OS bullseye
15:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3009.esams.wmnet with OS bullseye
15:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
14:56 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
14:55 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
14:54 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
14:54 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
14:54 bking@cumin1001: START - Cookbook sre.dns.netbox
14:52 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
14:52 bking@cumin1001: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) flink-zk2001.codfw.wmnet on all recursors
14:52 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
14:52 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:52 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
14:51 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs3009.esams.wmnet with OS bullseye
14:51 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
14:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3005.mgmt.esams.wmnet with reboot policy GRACEFUL
14:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
14:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
14:45 bking@cumin1001: START - Cookbook sre.dns.netbox
14:45 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
14:43 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti3005.esams.wmnet
14:39 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3005.mgmt.esams.wmnet with reboot policy GRACEFUL
14:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1097.eqiad.wmnet with reason: host reimage
14:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3007.esams.wmnet with OS bullseye
14:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
14:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1097.eqiad.wmnet with reason: host reimage
14:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
14:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS bullseye
14:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS bullseye
14:26 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3081.esams.wmnet with OS bullseye
14:23 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti3005.esams.wmnet
14:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3009.esams.wmnet with OS bullseye
14:17 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1097.eqiad.wmnet with OS bullseye
14:14 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns3003.wikimedia.org with OS bullseye
14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
14:07 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti3005.esams.wmnet
14:03 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3081.esams.wmnet with reason: host reimage
14:00 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3081.esams.wmnet with reason: host reimage
13:51 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bullseye
13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3007.esams.wmnet with reason: host reimage
13:47 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
13:46 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
13:45 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
13:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3007.esams.wmnet with reason: host reimage
13:45 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
13:45 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
13:44 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns3003.wikimedia.org with OS bullseye
13:44 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
13:38 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
13:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
13:29 urbanecm@deploy1002: Finished scap: Backport for Remove knwiktionary tagline (T343662) (duration: 10m 20s)
13:25 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3007.esams.wmnet with OS bullseye
13:24 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:23 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3007']
13:23 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bullseye
13:23 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
13:22 urbanecm@deploy1002: urbanecm and anzx: Continuing with sync
13:20 urbanecm@deploy1002: urbanecm and anzx: Backport for Remove knwiktionary tagline (T343662) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:19 urbanecm@deploy1002: Started scap: Backport for Remove knwiktionary tagline (T343662)
13:18 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3007']
13:17 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3007']
13:17 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: enable AddLink backend 13th round of wikis (T308138) (duration: 10m 47s)
13:10 urbanecm@deploy1002: sgimeno and urbanecm: Continuing with sync
13:07 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: enable AddLink backend 13th round of wikis (T308138) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:06 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3007']
13:06 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: enable AddLink backend 13th round of wikis (T308138)
13:05 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3005']
12:56 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3005']
12:47 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:36 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
12:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sw - ayounsi@cumin1001"
12:12 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sw - ayounsi@cumin1001"
12:04 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti3007.esams.wmnet with OS bullseye
12:02 sukhe: sukhe@contint2002:~$ sudo systemctl restart zuul: T344238
12:02 sukhe: sukhe@contint2002:~$ sudo systemctl restart zuul
12:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3077.esams.wmnet with OS bullseye
12:02 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
11:54 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
11:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3069.esams.wmnet with OS bullseye
11:54 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
11:53 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3067.esams.wmnet with OS bullseye
11:53 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
11:52 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
11:51 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
11:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3075.esams.wmnet with OS bullseye
11:50 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
11:49 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
11:31 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3077.esams.wmnet with reason: host reimage
11:29 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3069.esams.wmnet with reason: host reimage
11:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3075.esams.wmnet with reason: host reimage
11:26 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3077.esams.wmnet with reason: host reimage
11:26 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3067.esams.wmnet with reason: host reimage
11:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3069.esams.wmnet with reason: host reimage
11:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti3005.esams.wmnet
11:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
11:23 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3075.esams.wmnet with reason: host reimage
11:22 sukhe: sukhe@contint2002:~$ sudo systemctl restart zuul
11:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti3005.esams.wmnet
11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
11:22 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3067.esams.wmnet with reason: host reimage
11:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-db1001.eqiad.wmnet
11:04 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3077.esams.wmnet with OS bullseye
11:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3069.esams.wmnet with OS bullseye
11:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3075.esams.wmnet with OS bullseye
11:00 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3067.esams.wmnet with OS bullseye
10:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
10:56 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1096.eqiad.wmnet with OS bullseye
10:54 sukhe: zuul@contint1002:/srv/zuul/git/operations/puppet$ git fetch --force --tags -v origin
10:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3007.esams.wmnet with OS bullseye
10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3005.esams.wmnet with OS bullseye
10:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
10:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
10:42 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3079.esams.wmnet with OS bullseye
10:42 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
10:41 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
10:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3071.esams.wmnet with OS bullseye
10:37 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
10:36 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
10:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3073.esams.wmnet with OS bullseye
10:32 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
10:31 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
10:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti3005.esams.wmnet with reason: host reimage
10:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3005.esams.wmnet with reason: host reimage
10:20 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
10:16 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
10:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3071.esams.wmnet with reason: host reimage
10:09 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3073.esams.wmnet with reason: host reimage
10:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3071.esams.wmnet with reason: host reimage
10:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3073.esams.wmnet with reason: host reimage
10:01 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bullseye
09:54 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
09:52 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3079.esams.wmnet with OS bullseye
09:52 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
09:51 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti3005.esams.wmnet with OS bullseye
09:50 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3079.esams.wmnet with OS bullseye
09:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3071.esams.wmnet with OS bullseye
09:41 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
09:37 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
09:34 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1096.eqiad.wmnet with OS bullseye
09:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3073.esams.wmnet with OS bullseye
09:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bullseye
09:17 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
09:15 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3079.esams.wmnet with OS bullseye
09:11 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti3005.esams.wmnet with OS bullseye
09:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bullseye
09:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
09:05 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
08:55 klausman: Draining ml2003 for kubelet partition resize
08:46 klausman: Draining ml2002 for kubelet partition resize
08:42 zabe@deploy1002: Finished scap: Backport for Add messages for Pa'O Wiktionary (blkwiktionary) (T343540), Add messages for Sundanese Wikisource (suwikisource) (T343539) (duration: 33m 26s)
08:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
08:36 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
08:31 zabe@deploy1002: zabe: Continuing with sync
08:30 zabe@deploy1002: zabe: Backport for Add messages for Pa'O Wiktionary (blkwiktionary) (T343540), Add messages for Sundanese Wikisource (suwikisource) (T343539) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
08:28 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:16 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
08:09 zabe@deploy1002: Started scap: Backport for Add messages for Pa'O Wiktionary (blkwiktionary) (T343540), Add messages for Sundanese Wikisource (suwikisource) (T343539)
07:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cr2-esams mgmt - ayounsi@cumin1001"
07:55 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cr2-esams mgmt - ayounsi@cumin1001"
07:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
07:49 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
07:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
07:33 taavi@deploy1002: Finished scap: Backport for Enable EditInSequence on all wikisources (T308098) (duration: 13m 29s)
07:29 gehel: restarting wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph on wdqs2012
07:27 taavi@deploy1002: soda and taavi: Continuing with sync
07:21 taavi@deploy1002: soda and taavi: Backport for Enable EditInSequence on all wikisources (T308098) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
07:20 taavi@deploy1002: Started scap: Backport for Enable EditInSequence on all wikisources (T308098)
07:18 taavi@deploy1002: Finished scap: Backport for jawiki: reassign the changetags user right (T344150) (duration: 11m 05s)
07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cp3081 - ayounsi@cumin1001"
07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet
07:15 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cp3081 - ayounsi@cumin1001"
07:12 taavi@deploy1002: anzx and taavi: Continuing with sync
07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
07:08 taavi@deploy1002: anzx and taavi: Backport for jawiki: reassign the changetags user right (T344150) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:07 taavi@deploy1002: Started scap: Backport for jawiki: reassign the changetags user right (T344150)
07:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists2001.codfw.wmnet
07:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:05 taavi@deploy1002: Finished scap: Backport for clienthints: Collect Client Hints data on group0 wikis (T341110) (duration: 15m 23s)
07:04 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
07:03 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists2001.codfw.wmnet
06:59 taavi@deploy1002: taavi and dreamyjazz: Continuing with sync
06:57 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
06:52 taavi@deploy1002: taavi and dreamyjazz: Backport for clienthints: Collect Client Hints data on group0 wikis (T341110) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
06:50 taavi@deploy1002: Started scap: Backport for clienthints: Collect Client Hints data on group0 wikis (T341110)
04:34 taavi@deploy1002: Finished scap: Backport for Add a comment why PdfHandler does not use Shellbox (duration: 08m 24s)
04:28 taavi@deploy1002: taavi: Continuing with sync
04:28 taavi@deploy1002: taavi: Backport for Add a comment why PdfHandler does not use Shellbox synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
04:26 taavi@deploy1002: Started scap: Backport for Add a comment why PdfHandler does not use Shellbox
03:58 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.19 (duration: 02m 13s)
03:56 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.22 refs T343724 (duration: 53m 42s)
03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.22 refs T343724
01:54 eileen: config revision changed from a61171bc to a05a2a82
01:51 eileen: civicrm upgraded from 16c2e58a to 5e631101
01:39 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
01:39 eileen: config revision changed from 2d598716 to a61171bc
01:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
01:01 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
00:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye

2023-08-14

23:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
23:31 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
22:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
22:41 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3009']
22:35 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3009']
22:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
22:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
22:23 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
22:19 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:19 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs3009 - robh@cumin1001"
22:18 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs3009 - robh@cumin1001"
22:09 robh@cumin1001: START - Cookbook sre.dns.netbox
22:02 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3005']
22:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3007.mgmt.esams.wmnet with reboot policy FORCED
21:59 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3075']
21:56 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3005']
21:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
21:55 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti3005']
21:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3005']
21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3067']
21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3073']
21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3077']
21:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3005.mgmt.esams.wmnet with reboot policy FORCED
21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3069']
21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns3003']
21:53 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3071']
21:48 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns3003']
21:47 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3067']
21:47 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3069']
21:47 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3071']
21:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3073']
21:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3075']
21:46 urandom: upgrading Cassandra to 4.1.1, restbase10[18,25-27,30,33]-{a,b,c} (eqiad/row D) — T339298
21:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3077']
21:43 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3007.mgmt.esams.wmnet with reboot policy FORCED
21:43 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3005.mgmt.esams.wmnet with reboot policy FORCED
21:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3067.mgmt.esams.wmnet with reboot policy FORCED
21:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns3003.mgmt.esams.wmnet with reboot policy FORCED
21:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3069.mgmt.esams.wmnet with reboot policy FORCED
21:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3071.mgmt.esams.wmnet with reboot policy FORCED
21:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3073.mgmt.esams.wmnet with reboot policy FORCED
21:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3075.mgmt.esams.wmnet with reboot policy FORCED
21:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3077.mgmt.esams.wmnet with reboot policy FORCED
21:35 maryum: security deploy for T341529
21:27 urandom: upgrading Cassandra to 4.1.1, restbase10[17,22-24,29,32]-{a,b,c} (eqiad/row B) — T339298
21:22 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns3003.mgmt.esams.wmnet with reboot policy FORCED
21:21 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3067.mgmt.esams.wmnet with reboot policy FORCED
21:21 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3069.mgmt.esams.wmnet with reboot policy FORCED
21:21 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3071.mgmt.esams.wmnet with reboot policy FORCED
21:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3073.mgmt.esams.wmnet with reboot policy FORCED
21:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3075.mgmt.esams.wmnet with reboot policy FORCED
21:18 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3077.mgmt.esams.wmnet with reboot policy FORCED
21:11 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:11 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new hosts in by27 - robh@cumin1001"
21:10 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new hosts in by27 - robh@cumin1001"
21:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
21:08 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
20:42 urbanecm@deploy1002: Finished scap: Backport for Config changes for new Android schema (duration: 13m 36s)
20:35 urbanecm@deploy1002: urbanecm and sharvaniharan: Continuing with sync
20:33 urandom: upgrading Cassandra to 4.1.1, restbase10[19-21,28,31]-{a,b,c} (eqiad/row A) — T339298
20:30 urbanecm@deploy1002: urbanecm and sharvaniharan: Backport for Config changes for new Android schema synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:28 urbanecm@deploy1002: Started scap: Backport for Config changes for new Android schema
20:25 urbanecm@deploy1002: Finished scap: Backport for NewcomerTasksLogFactory: Use getName(), not getDbKey() (T344163) (duration: 09m 08s)
20:18 urbanecm@deploy1002: urbanecm: Continuing with sync
20:18 urbanecm@deploy1002: urbanecm: Backport for NewcomerTasksLogFactory: Use getName(), not getDbKey() (T344163) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:17 urandom: upgrading Cassandra to 4.1.1, restbase20[12,17-18,23,26-27]-{a,b,c} (codfw/row C) — T339298
20:16 urbanecm@deploy1002: Started scap: Backport for NewcomerTasksLogFactory: Use getName(), not getDbKey() (T344163)
19:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
19:57 urandom: upgrading Cassandra to 4.1.1, restbase20[15,16,20,22,25]-{a,b,c} (codfw/row C) — T339298
19:52 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3079']
19:45 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3079']
19:45 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3079.mgmt.esams.wmnet with reboot policy FORCED
19:44 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3081']
19:43 urandom: upgrading Cassandra to 4.1.1, restbase2024-{a,b,c} — T339298
19:38 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3081']
19:38 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp3081']
19:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3081']
19:34 urandom: upgrading Cassandra to 4.1.1, restbase2021-{a,b,c} — T339298
19:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
19:31 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3079.mgmt.esams.wmnet with reboot policy FORCED
19:24 urandom: upgrading Cassandra to 4.1.1, restbase2019-{a,b,c} — T339298
19:16 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
19:11 urandom: upgrading Cassandra to 4.1.1, restbase2014-{a,b,c} — T339298
18:45 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
18:45 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
18:43 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
18:43 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
18:38 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:38 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge cp3081 and cp3079 - sukhe@cumin2002"
18:37 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge cp3081 and cp3079 - sukhe@cumin2002"
18:23 sukhe@cumin2002: START - Cookbook sre.dns.netbox
17:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1095.eqiad.wmnet with OS bullseye
17:41 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
17:39 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
17:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1095.eqiad.wmnet with reason: host reimage
17:15 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1095.eqiad.wmnet with reason: host reimage
17:02 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1095.eqiad.wmnet with OS bullseye
16:58 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:57 cmooney@cumin1001: START - Cookbook sre.dns.netbox
16:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 100%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50580 and previous config saved to /var/cache/conftool/dbconfig/20230814-164727-root.json
16:42 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1094.eqiad.wmnet with OS bullseye
16:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 75%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50579 and previous config saved to /var/cache/conftool/dbconfig/20230814-163222-root.json
16:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
16:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:28 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename cr3-knams to cr2-esams - cmooney@cumin1001"
16:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 50%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50578 and previous config saved to /var/cache/conftool/dbconfig/20230814-161718-root.json
16:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1094.eqiad.wmnet with reason: host reimage
16:11 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1094.eqiad.wmnet with reason: host reimage
16:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 25%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50577 and previous config saved to /var/cache/conftool/dbconfig/20230814-160213-root.json
16:01 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cr2-esams.wikimedia.org on all recursors
16:00 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache cr2-esams.wikimedia.org on all recursors
15:58 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1094.eqiad.wmnet with OS bullseye
15:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1093.eqiad.wmnet with OS bullseye
15:53 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename cr3-knams to cr2-esams - cmooney@cumin1001"
15:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 10%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50576 and previous config saved to /var/cache/conftool/dbconfig/20230814-154708-root.json
15:47 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
15:46 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 15s)
15:45 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
15:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
15:36 urandom: upgrading Cassandra to 4.1.1, restbase1016-{a,b,c} — T339298
15:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1093.eqiad.wmnet with reason: host reimage
15:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 5%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50575 and previous config saved to /var/cache/conftool/dbconfig/20230814-153203-root.json
15:30 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 43s)
15:29 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
15:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1093.eqiad.wmnet with reason: host reimage
15:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 3%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50574 and previous config saved to /var/cache/conftool/dbconfig/20230814-151659-root.json
15:16 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1093.eqiad.wmnet with OS bullseye
15:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 1%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50572 and previous config saved to /var/cache/conftool/dbconfig/20230814-150154-root.json
14:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2012.codfw.wmnet with OS bullseye
14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS bookworm
14:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
14:34 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided) (duration: 00m 00s)
14:34 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided)
14:33 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided) (duration: 00m 03s)
14:33 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided)
14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
14:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided) (duration: 00m 00s)
14:30 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided)
14:27 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb]: (no justification provided) (duration: 00m 01s)
14:27 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb]: (no justification provided)
14:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
14:26 sukhe: running authdns-update for CR 948195: T344073
14:26 sukhe: running authdns-update for CR 948195
14:25 jgiannelos@deploy1002: deploy aborted: (no justification provided) (duration: 00m 10s)
14:25 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb]: (no justification provided)
14:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
14:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum1002.eqiad.wmnet with OS bookworm
14:05 urandom: upgrading Cassandra to 4.1.1, restbase2013-{a,b,c} — T339298
14:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2012.codfw.wmnet with reason: host reimage
14:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2012.codfw.wmnet with reason: host reimage
13:53 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
13:40 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2012.codfw.wmnet with OS bullseye
13:27 derick@deploy1002: Finished scap: Backport for wmf-config: Remove wgContentTranslationDefaultParsoidClient cleanup (duration: 16m 56s)
13:20 derick@deploy1002: d3r1ck01 and derick: Continuing with sync
13:19 derick@deploy1002: d3r1ck01 and derick: Backport for wmf-config: Remove wgContentTranslationDefaultParsoidClient cleanup synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:10 derick@deploy1002: Started scap: Backport for wmf-config: Remove wgContentTranslationDefaultParsoidClient cleanup
13:08 derick@deploy1002: Backport cancelled.
11:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
11:23 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
11:21 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
11:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
11:16 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
11:13 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
11:09 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1007.eqiad.wmnet
11:09 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-airflow1007.eqiad.wmnet with OS buster
10:54 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1007.eqiad.wmnet with reason: host reimage
10:51 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1007.eqiad.wmnet with reason: host reimage
10:40 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-airflow1007.eqiad.wmnet with OS buster
10:39 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-airflow1007.eqiad.wmnet - stevemunene@cumin1001"
10:39 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:39 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
10:38 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-airflow1007.eqiad.wmnet - stevemunene@cumin1001"
10:38 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
10:38 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1007.eqiad.wmnet on all recursors
10:38 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1007.eqiad.wmnet on all recursors
10:36 cmooney@cumin1001: START - Cookbook sre.dns.netbox
10:34 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:34 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
10:33 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
10:30 cmooney@cumin1001: START - Cookbook sre.dns.netbox
10:26 stevemunene@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
10:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:25 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
10:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
10:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
10:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1009.eqiad.wmnet
10:13 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
10:13 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1007.eqiad.wmnet
10:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1009.eqiad.wmnet
09:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1124.eqiad.wmnet
09:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
09:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1007.eqiad.wmnet
09:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
09:41 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
09:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1007.eqiad.wmnet
09:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1006.eqiad.wmnet
09:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
09:32 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
09:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
09:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1006.eqiad.wmnet
09:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1005.eqiad.wmnet
09:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1124.eqiad.wmnet
09:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1005.eqiad.wmnet
09:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
09:11 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:11 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams dns to mr1-eams-old. - cmooney@cumin1001"
09:10 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams dns to mr1-eams-old. - cmooney@cumin1001"
09:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
09:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet

2023-08-13

16:07 topranks: powering down cr3-esams
16:05 topranks: powering down cr2-esams
15:54 topranks: Disabling esams peering at AMS-IX prior to removing router
15:45 topranks: Disable transport cct cr2-esams to cr2-eqiad prior to disconnect T329219
15:26 topranks: disable transit and peering links on cr2-esams & cr3-esams before decom T329219

2023-08-12

08:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
08:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T342617)', diff saved to https://phabricator.wikimedia.org/P50569 and previous config saved to /var/cache/conftool/dbconfig/20230812-082511-ladsgroup.json
08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P50568 and previous config saved to /var/cache/conftool/dbconfig/20230812-081005-ladsgroup.json
07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P50567 and previous config saved to /var/cache/conftool/dbconfig/20230812-075459-ladsgroup.json
07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T342617)', diff saved to https://phabricator.wikimedia.org/P50566 and previous config saved to /var/cache/conftool/dbconfig/20230812-073953-ladsgroup.json
05:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T342617)', diff saved to https://phabricator.wikimedia.org/P50565 and previous config saved to /var/cache/conftool/dbconfig/20230812-055651-ladsgroup.json
05:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
05:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T342617)', diff saved to https://phabricator.wikimedia.org/P50564 and previous config saved to /var/cache/conftool/dbconfig/20230812-050127-ladsgroup.json
04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P50563 and previous config saved to /var/cache/conftool/dbconfig/20230812-044621-ladsgroup.json
04:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
04:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T342617)', diff saved to https://phabricator.wikimedia.org/P50562 and previous config saved to /var/cache/conftool/dbconfig/20230812-043724-ladsgroup.json
04:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P50561 and previous config saved to /var/cache/conftool/dbconfig/20230812-043115-ladsgroup.json
04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P50560 and previous config saved to /var/cache/conftool/dbconfig/20230812-042217-ladsgroup.json
04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T342617)', diff saved to https://phabricator.wikimedia.org/P50559 and previous config saved to /var/cache/conftool/dbconfig/20230812-041608-ladsgroup.json
04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P50558 and previous config saved to /var/cache/conftool/dbconfig/20230812-040711-ladsgroup.json
03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T342617)', diff saved to https://phabricator.wikimedia.org/P50557 and previous config saved to /var/cache/conftool/dbconfig/20230812-035205-ladsgroup.json
02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T342617)', diff saved to https://phabricator.wikimedia.org/P50556 and previous config saved to /var/cache/conftool/dbconfig/20230812-023441-ladsgroup.json
02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50555 and previous config saved to /var/cache/conftool/dbconfig/20230812-023419-ladsgroup.json
02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P50554 and previous config saved to /var/cache/conftool/dbconfig/20230812-021913-ladsgroup.json
02:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P50553 and previous config saved to /var/cache/conftool/dbconfig/20230812-020407-ladsgroup.json
01:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T342617)', diff saved to https://phabricator.wikimedia.org/P50552 and previous config saved to /var/cache/conftool/dbconfig/20230812-015910-ladsgroup.json
01:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
01:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T342617)', diff saved to https://phabricator.wikimedia.org/P50551 and previous config saved to /var/cache/conftool/dbconfig/20230812-015849-ladsgroup.json
01:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50550 and previous config saved to /var/cache/conftool/dbconfig/20230812-014901-ladsgroup.json
01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P50549 and previous config saved to /var/cache/conftool/dbconfig/20230812-014342-ladsgroup.json
01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P50548 and previous config saved to /var/cache/conftool/dbconfig/20230812-012836-ladsgroup.json
01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T342617)', diff saved to https://phabricator.wikimedia.org/P50547 and previous config saved to /var/cache/conftool/dbconfig/20230812-011330-ladsgroup.json
00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50546 and previous config saved to /var/cache/conftool/dbconfig/20230812-000623-ladsgroup.json
00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T342617)', diff saved to https://phabricator.wikimedia.org/P50545 and previous config saved to /var/cache/conftool/dbconfig/20230812-000602-ladsgroup.json

2023-08-11

23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P50544 and previous config saved to /var/cache/conftool/dbconfig/20230811-235056-ladsgroup.json
23:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P50543 and previous config saved to /var/cache/conftool/dbconfig/20230811-233549-ladsgroup.json
23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T342617)', diff saved to https://phabricator.wikimedia.org/P50542 and previous config saved to /var/cache/conftool/dbconfig/20230811-233320-ladsgroup.json
23:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
23:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T342617)', diff saved to https://phabricator.wikimedia.org/P50541 and previous config saved to /var/cache/conftool/dbconfig/20230811-233259-ladsgroup.json
23:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T342617)', diff saved to https://phabricator.wikimedia.org/P50540 and previous config saved to /var/cache/conftool/dbconfig/20230811-232043-ladsgroup.json
23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P50539 and previous config saved to /var/cache/conftool/dbconfig/20230811-231753-ladsgroup.json
23:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P50538 and previous config saved to /var/cache/conftool/dbconfig/20230811-230247-ladsgroup.json
22:49 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T342617)', diff saved to https://phabricator.wikimedia.org/P50537 and previous config saved to /var/cache/conftool/dbconfig/20230811-224741-ladsgroup.json
22:06 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
22:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
22:03 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
22:02 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
22:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
22:00 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
21:57 cmooney@cumin1001: START - Cookbook sre.dns.netbox
21:49 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:48 cmooney@cumin1001: START - Cookbook sre.dns.netbox
21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T342617)', diff saved to https://phabricator.wikimedia.org/P50536 and previous config saved to /var/cache/conftool/dbconfig/20230811-214142-ladsgroup.json
21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
21:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
21:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50535 and previous config saved to /var/cache/conftool/dbconfig/20230811-214105-ladsgroup.json
21:28 andrewbogott: rebooting wikitech-static-ord via rackspace UI
21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P50534 and previous config saved to /var/cache/conftool/dbconfig/20230811-212559-ladsgroup.json
21:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P50533 and previous config saved to /var/cache/conftool/dbconfig/20230811-211053-ladsgroup.json
21:10 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:10 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
21:08 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
21:06 cmooney@cumin1001: START - Cookbook sre.dns.netbox
21:06 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
21:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T342617)', diff saved to https://phabricator.wikimedia.org/P50532 and previous config saved to /var/cache/conftool/dbconfig/20230811-210102-ladsgroup.json
21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T342617)', diff saved to https://phabricator.wikimedia.org/P50531 and previous config saved to /var/cache/conftool/dbconfig/20230811-210024-ladsgroup.json
20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50530 and previous config saved to /var/cache/conftool/dbconfig/20230811-205546-ladsgroup.json
20:48 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
20:46 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 12s)
20:46 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
20:46 bking@deploy1002: deploy aborted: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 02m 44s)
20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P50529 and previous config saved to /var/cache/conftool/dbconfig/20230811-204517-ladsgroup.json
20:43 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
20:31 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2011.codfw.wmnet with OS bullseye
20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P50528 and previous config saved to /var/cache/conftool/dbconfig/20230811-203011-ladsgroup.json
20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T342617)', diff saved to https://phabricator.wikimedia.org/P50527 and previous config saved to /var/cache/conftool/dbconfig/20230811-201505-ladsgroup.json
20:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2011.codfw.wmnet with reason: host reimage
20:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2011.codfw.wmnet with reason: host reimage
20:02 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
20:02 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
20:02 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 41s)
20:02 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
20:01 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
19:44 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2011.codfw.wmnet with OS bullseye
19:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
19:37 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
19:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
19:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2010.codfw.wmnet with OS bullseye
19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50526 and previous config saved to /var/cache/conftool/dbconfig/20230811-191548-ladsgroup.json
19:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
19:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50525 and previous config saved to /var/cache/conftool/dbconfig/20230811-191527-ladsgroup.json
19:06 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2010.codfw.wmnet with reason: host reimage
19:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS bookworm
19:03 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2010.codfw.wmnet with reason: host reimage
19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
19:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T342617)', diff saved to https://phabricator.wikimedia.org/P50524 and previous config saved to /var/cache/conftool/dbconfig/20230811-190208-ladsgroup.json
19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P50523 and previous config saved to /var/cache/conftool/dbconfig/20230811-190021-ladsgroup.json
18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P50522 and previous config saved to /var/cache/conftool/dbconfig/20230811-184701-ladsgroup.json
18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P50521 and previous config saved to /var/cache/conftool/dbconfig/20230811-184514-ladsgroup.json
18:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2010.codfw.wmnet with OS bullseye
18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T342617)', diff saved to https://phabricator.wikimedia.org/P50520 and previous config saved to /var/cache/conftool/dbconfig/20230811-183431-ladsgroup.json
18:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
18:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T342617)', diff saved to https://phabricator.wikimedia.org/P50519 and previous config saved to /var/cache/conftool/dbconfig/20230811-183410-ladsgroup.json
18:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P50518 and previous config saved to /var/cache/conftool/dbconfig/20230811-183155-ladsgroup.json
18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50517 and previous config saved to /var/cache/conftool/dbconfig/20230811-183008-ladsgroup.json
18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P50516 and previous config saved to /var/cache/conftool/dbconfig/20230811-181904-ladsgroup.json
18:17 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum1002.eqiad.wmnet with OS bookworm
18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T342617)', diff saved to https://phabricator.wikimedia.org/P50515 and previous config saved to /var/cache/conftool/dbconfig/20230811-181649-ladsgroup.json
18:14 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:14 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
18:12 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
18:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
18:08 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:05 cmooney@cumin1001: START - Cookbook sre.dns.netbox
18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P50514 and previous config saved to /var/cache/conftool/dbconfig/20230811-180358-ladsgroup.json
18:02 sukhe: reload icinga on alert1001
17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T342617)', diff saved to https://phabricator.wikimedia.org/P50513 and previous config saved to /var/cache/conftool/dbconfig/20230811-174851-ladsgroup.json
17:43 topranks: removing routing for former ns2.wikimedia.org IP 91.198.174.239 from esams CRs T343942
17:33 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 44s)
17:32 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
17:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=97) rolling restart_daemons on A:wikidough and A:wikidough
17:17 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
17:13 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
17:07 sukhe: running agent on dns-rec to remove old ns2 IP
16:52 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50512 and previous config saved to /var/cache/conftool/dbconfig/20230811-165033-ladsgroup.json
16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T342617)', diff saved to https://phabricator.wikimedia.org/P50511 and previous config saved to /var/cache/conftool/dbconfig/20230811-165013-ladsgroup.json
16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P50510 and previous config saved to /var/cache/conftool/dbconfig/20230811-163506-ladsgroup.json
16:32 sukhe: running dummy authdns-update
16:27 sukhe: running agent on A:dns-rec to remove ns2-v4 IP: T329219
16:23 sukhe: running dummy authdns-update
16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P50508 and previous config saved to /var/cache/conftool/dbconfig/20230811-161959-ladsgroup.json
16:17 sukhe: running agent on A:cumin or A:dns-rec or A:netbox to remove dns300x from authdns_servers: T329219
16:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1001.eqiad.wmnet with OS bookworm
16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T342617)', diff saved to https://phabricator.wikimedia.org/P50507 and previous config saved to /var/cache/conftool/dbconfig/20230811-161025-ladsgroup.json
16:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
16:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T342617)', diff saved to https://phabricator.wikimedia.org/P50506 and previous config saved to /var/cache/conftool/dbconfig/20230811-160953-ladsgroup.json
16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T342617)', diff saved to https://phabricator.wikimedia.org/P50505 and previous config saved to /var/cache/conftool/dbconfig/20230811-160453-ladsgroup.json
15:54 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P50504 and previous config saved to /var/cache/conftool/dbconfig/20230811-155447-ladsgroup.json
15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P50503 and previous config saved to /var/cache/conftool/dbconfig/20230811-153941-ladsgroup.json
15:37 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 22s)
15:37 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
15:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
15:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T342617)', diff saved to https://phabricator.wikimedia.org/P50502 and previous config saved to /var/cache/conftool/dbconfig/20230811-152433-ladsgroup.json
15:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:23 inflatador: bking@deploy1002 'deploying WDQS on newly-reimaged Bullseye hosts T343124'
15:18 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
15:18 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 42s)
15:17 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
15:09 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:08 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
15:08 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
15:07 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2009.codfw.wmnet with OS bullseye
15:05 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:05 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
15:03 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2008.codfw.wmnet with OS bullseye
15:02 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 50s)
15:01 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
15:01 bking@deploy1002: deploy aborted: f1a6177 (duration: 00m 05s)
15:01 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
14:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs[2008-2009].codfw.wmnet with reason: T343124
14:53 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs[2008-2009].codfw.wmnet with reason: T343124
14:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
14:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
14:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
14:41 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
14:40 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
14:31 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:29 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:29 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum1001.eqiad.wmnet with OS bookworm
14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T342617)', diff saved to https://phabricator.wikimedia.org/P50501 and previous config saved to /var/cache/conftool/dbconfig/20230811-142611-ladsgroup.json
14:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
14:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T342617)', diff saved to https://phabricator.wikimedia.org/P50500 and previous config saved to /var/cache/conftool/dbconfig/20230811-142550-ladsgroup.json
14:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2008.codfw.wmnet with OS bullseye
14:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS bullseye
14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P50496 and previous config saved to /var/cache/conftool/dbconfig/20230811-141043-ladsgroup.json
13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P50494 and previous config saved to /var/cache/conftool/dbconfig/20230811-135537-ladsgroup.json
13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T342617)', diff saved to https://phabricator.wikimedia.org/P50493 and previous config saved to /var/cache/conftool/dbconfig/20230811-134804-ladsgroup.json
13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
13:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
13:42 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T342617)', diff saved to https://phabricator.wikimedia.org/P50492 and previous config saved to /var/cache/conftool/dbconfig/20230811-134030-ladsgroup.json
13:22 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:01 fabfur@cumin1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet
12:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
12:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
12:05 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS bullseye
12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T342617)', diff saved to https://phabricator.wikimedia.org/P50490 and previous config saved to /var/cache/conftool/dbconfig/20230811-120211-ladsgroup.json
12:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
12:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T342617)', diff saved to https://phabricator.wikimedia.org/P50489 and previous config saved to /var/cache/conftool/dbconfig/20230811-120150-ladsgroup.json
11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P50486 and previous config saved to /var/cache/conftool/dbconfig/20230811-114644-ladsgroup.json
11:44 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
11:41 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
11:36 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on 29 hosts with reason: Downtime esams hosts prior to migration week.
11:35 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on 29 hosts with reason: Downtime esams hosts prior to migration week.
11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P50485 and previous config saved to /var/cache/conftool/dbconfig/20230811-113138-ladsgroup.json
11:26 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on 16 hosts with reason: Downtime esams network kit prior to migration week.
11:26 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on 16 hosts with reason: Downtime esams network kit prior to migration week.
11:21 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T342617)', diff saved to https://phabricator.wikimedia.org/P50484 and previous config saved to /var/cache/conftool/dbconfig/20230811-111631-ladsgroup.json
11:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
10:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
10:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
10:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T342617)', diff saved to https://phabricator.wikimedia.org/P50482 and previous config saved to /var/cache/conftool/dbconfig/20230811-104210-ladsgroup.json
10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P50481 and previous config saved to /var/cache/conftool/dbconfig/20230811-102704-ladsgroup.json
10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T342617)', diff saved to https://phabricator.wikimedia.org/P50480 and previous config saved to /var/cache/conftool/dbconfig/20230811-102009-ladsgroup.json
10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T342617)', diff saved to https://phabricator.wikimedia.org/P50479 and previous config saved to /var/cache/conftool/dbconfig/20230811-101930-ladsgroup.json
10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P50478 and previous config saved to /var/cache/conftool/dbconfig/20230811-101157-ladsgroup.json
10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P50477 and previous config saved to /var/cache/conftool/dbconfig/20230811-100424-ladsgroup.json
09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T342617)', diff saved to https://phabricator.wikimedia.org/P50476 and previous config saved to /var/cache/conftool/dbconfig/20230811-095651-ladsgroup.json
09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P50475 and previous config saved to /var/cache/conftool/dbconfig/20230811-094918-ladsgroup.json
09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T342617)', diff saved to https://phabricator.wikimedia.org/P50474 and previous config saved to /var/cache/conftool/dbconfig/20230811-094118-ladsgroup.json
09:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T342617)', diff saved to https://phabricator.wikimedia.org/P50473 and previous config saved to /var/cache/conftool/dbconfig/20230811-093412-ladsgroup.json
09:31 topranks: Withdrawing anycast prefixes 198.35.27.0/24 (authdns), 185.71.138.0/24 & 2001:67c:930::/48 (wikidough) from esams/knams in BGP
09:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
09:00 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
09:00 topranks: depool esams site until next week for knams POP migration / rebuild
09:00 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
08:59 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
08:59 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
08:34 moritzm: installing intel-microcode security updates on bookworm/bullseye
08:32 elukey: expand kubelet partition on ml-serve2001 - T339231
08:31 elukey: restart kubelet on ml-serve1001 - T343900
08:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
08:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T342617)', diff saved to https://phabricator.wikimedia.org/P50472 and previous config saved to /var/cache/conftool/dbconfig/20230811-081815-ladsgroup.json
08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T342617)', diff saved to https://phabricator.wikimedia.org/P50471 and previous config saved to /var/cache/conftool/dbconfig/20230811-081139-ladsgroup.json
08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T342617)', diff saved to https://phabricator.wikimedia.org/P50470 and previous config saved to /var/cache/conftool/dbconfig/20230811-081118-ladsgroup.json
08:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve2001.codfw.wmnet with reason: Expand the kubelet disk partition
08:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve2001.codfw.wmnet with reason: Expand the kubelet disk partition
08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P50469 and previous config saved to /var/cache/conftool/dbconfig/20230811-080309-ladsgroup.json
07:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki1001.eqiad.wmnet
07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P50468 and previous config saved to /var/cache/conftool/dbconfig/20230811-075612-ladsgroup.json
07:54 ayounsi@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM rpki1001.eqiad.wmnet
07:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki2002.codfw.wmnet
07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P50467 and previous config saved to /var/cache/conftool/dbconfig/20230811-074803-ladsgroup.json
07:47 ayounsi@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM rpki2002.codfw.wmnet
07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P50466 and previous config saved to /var/cache/conftool/dbconfig/20230811-074105-ladsgroup.json
07:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T342617)', diff saved to https://phabricator.wikimedia.org/P50465 and previous config saved to /var/cache/conftool/dbconfig/20230811-073257-ladsgroup.json
07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T342617)', diff saved to https://phabricator.wikimedia.org/P50464 and previous config saved to /var/cache/conftool/dbconfig/20230811-072559-ladsgroup.json
06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T342617)', diff saved to https://phabricator.wikimedia.org/P50463 and previous config saved to /var/cache/conftool/dbconfig/20230811-061250-ladsgroup.json
05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P50462 and previous config saved to /var/cache/conftool/dbconfig/20230811-055744-ladsgroup.json
05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T342617)', diff saved to https://phabricator.wikimedia.org/P50461 and previous config saved to /var/cache/conftool/dbconfig/20230811-054649-ladsgroup.json
05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T342617)', diff saved to https://phabricator.wikimedia.org/P50460 and previous config saved to /var/cache/conftool/dbconfig/20230811-054628-ladsgroup.json
05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P50459 and previous config saved to /var/cache/conftool/dbconfig/20230811-054238-ladsgroup.json
05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T342617)', diff saved to https://phabricator.wikimedia.org/P50458 and previous config saved to /var/cache/conftool/dbconfig/20230811-053847-ladsgroup.json
05:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
05:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T342617)', diff saved to https://phabricator.wikimedia.org/P50457 and previous config saved to /var/cache/conftool/dbconfig/20230811-053826-ladsgroup.json
05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P50456 and previous config saved to /var/cache/conftool/dbconfig/20230811-053122-ladsgroup.json
05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T342617)', diff saved to https://phabricator.wikimedia.org/P50455 and previous config saved to /var/cache/conftool/dbconfig/20230811-052731-ladsgroup.json
05:23 oblivian@deploy1002: Synchronized private/PrivateSettings.php: Adding proxy vendors (duration: 07m 33s)
05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P50454 and previous config saved to /var/cache/conftool/dbconfig/20230811-052320-ladsgroup.json
05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P50453 and previous config saved to /var/cache/conftool/dbconfig/20230811-051616-ladsgroup.json
05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P50452 and previous config saved to /var/cache/conftool/dbconfig/20230811-050814-ladsgroup.json
05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T342617)', diff saved to https://phabricator.wikimedia.org/P50451 and previous config saved to /var/cache/conftool/dbconfig/20230811-050110-ladsgroup.json
04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T342617)', diff saved to https://phabricator.wikimedia.org/P50450 and previous config saved to /var/cache/conftool/dbconfig/20230811-045307-ladsgroup.json
03:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T342617)', diff saved to https://phabricator.wikimedia.org/P50449 and previous config saved to /var/cache/conftool/dbconfig/20230811-031400-ladsgroup.json
03:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
03:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T342617)', diff saved to https://phabricator.wikimedia.org/P50448 and previous config saved to /var/cache/conftool/dbconfig/20230811-031339-ladsgroup.json
03:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T342617)', diff saved to https://phabricator.wikimedia.org/P50447 and previous config saved to /var/cache/conftool/dbconfig/20230811-030454-ladsgroup.json
03:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
03:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
03:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T342617)', diff saved to https://phabricator.wikimedia.org/P50446 and previous config saved to /var/cache/conftool/dbconfig/20230811-030433-ladsgroup.json
02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P50445 and previous config saved to /var/cache/conftool/dbconfig/20230811-025833-ladsgroup.json
02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P50444 and previous config saved to /var/cache/conftool/dbconfig/20230811-024927-ladsgroup.json
02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P50443 and previous config saved to /var/cache/conftool/dbconfig/20230811-024327-ladsgroup.json
02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P50442 and previous config saved to /var/cache/conftool/dbconfig/20230811-023420-ladsgroup.json
02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T342617)', diff saved to https://phabricator.wikimedia.org/P50441 and previous config saved to /var/cache/conftool/dbconfig/20230811-022820-ladsgroup.json
02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T342617)', diff saved to https://phabricator.wikimedia.org/P50440 and previous config saved to /var/cache/conftool/dbconfig/20230811-021914-ladsgroup.json
02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T342617)', diff saved to https://phabricator.wikimedia.org/P50439 and previous config saved to /var/cache/conftool/dbconfig/20230811-020724-ladsgroup.json
02:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
02:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T342617)', diff saved to https://phabricator.wikimedia.org/P50438 and previous config saved to /var/cache/conftool/dbconfig/20230811-020703-ladsgroup.json
01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P50437 and previous config saved to /var/cache/conftool/dbconfig/20230811-015156-ladsgroup.json
01:43 ryankemper: [WDQS] `ryankemper@wdqs2007:~$ sudo pool` (Caught up on lag)
01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P50436 and previous config saved to /var/cache/conftool/dbconfig/20230811-013650-ladsgroup.json
01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T342617)', diff saved to https://phabricator.wikimedia.org/P50435 and previous config saved to /var/cache/conftool/dbconfig/20230811-012144-ladsgroup.json
00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T342617)', diff saved to https://phabricator.wikimedia.org/P50434 and previous config saved to /var/cache/conftool/dbconfig/20230811-004036-ladsgroup.json
00:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
00:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T342617)', diff saved to https://phabricator.wikimedia.org/P50433 and previous config saved to /var/cache/conftool/dbconfig/20230811-003243-ladsgroup.json
00:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
00:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance

2023-08-10

22:55 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@ff0a21b]: (no justification provided) (duration: 00m 20s)
22:55 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@ff0a21b]: (no justification provided)
22:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
22:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
22:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
22:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
22:12 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
22:10 urbanecm@deploy1002: Finished scap: Backport for GlobalRenameUser: Ensure old username is in canonical form (T343958) (duration: 09m 48s)
22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T342617)', diff saved to https://phabricator.wikimedia.org/P50432 and previous config saved to /var/cache/conftool/dbconfig/20230810-220820-ladsgroup.json
22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50431 and previous config saved to /var/cache/conftool/dbconfig/20230810-220759-ladsgroup.json
22:03 urbanecm@deploy1002: urbanecm: Continuing with sync
22:02 urbanecm@deploy1002: urbanecm: Backport for GlobalRenameUser: Ensure old username is in canonical form (T343958) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
22:00 urbanecm@deploy1002: Started scap: Backport for GlobalRenameUser: Ensure old username is in canonical form (T343958)
21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P50430 and previous config saved to /var/cache/conftool/dbconfig/20230810-215253-ladsgroup.json
21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P50429 and previous config saved to /var/cache/conftool/dbconfig/20230810-213747-ladsgroup.json
21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50428 and previous config saved to /var/cache/conftool/dbconfig/20230810-212241-ladsgroup.json
21:21 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2007.codfw.wmnet with OS bullseye
20:40 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
20:38 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 42s)
20:37 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
20:34 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 16s)
20:34 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
19:24 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@b5a1d04]: (no justification provided) (duration: 00m 09s)
19:24 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@b5a1d04]: (no justification provided)
19:18 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge ganeti changes - sukhe@cumin2002"
19:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge ganeti changes - sukhe@cumin2002"
19:14 sukhe@cumin2002: START - Cookbook sre.dns.netbox
18:55 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@4312d99]: (no justification provided) (duration: 00m 20s)
18:55 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@4312d99]: (no justification provided)
18:43 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: host reimage
18:40 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: host reimage
18:25 urbanecm@deploy1002: Finished scap: Backport for ltwiki: Disable Growth features (duration: 10m 05s)
18:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2007.codfw.wmnet with OS bullseye
18:18 urbanecm@deploy1002: urbanecm: Continuing with sync
18:17 urbanecm@deploy1002: urbanecm: Backport for ltwiki: Disable Growth features synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
18:15 urbanecm@deploy1002: Started scap: Backport for ltwiki: Disable Growth features
18:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T342617)', diff saved to https://phabricator.wikimedia.org/P50426 and previous config saved to /var/cache/conftool/dbconfig/20230810-180656-ladsgroup.json
18:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
18:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
17:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
17:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
17:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
17:06 cstone: payments-wiki upgraded from 5b250aed to e094ea1f
16:15 sukhe: running authdns-update to update ns2 and point it to nsa.wikimedia.org
15:30 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
15:20 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
14:35 jforrester@deploy1002: Finished scap: Backport for wikifunctions: Allow transwiki import from Wikidata (T343365) (duration: 09m 22s)
14:28 jforrester@deploy1002: stang and jforrester: Continuing with sync
14:27 jforrester@deploy1002: stang and jforrester: Backport for wikifunctions: Allow transwiki import from Wikidata (T343365) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
14:25 jforrester@deploy1002: Started scap: Backport for wikifunctions: Allow transwiki import from Wikidata (T343365)
14:22 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Tell WikiLambda to stash results in our bespoke cache (T342753) (duration: 08m 15s)
14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50423 and previous config saved to /var/cache/conftool/dbconfig/20230810-142117-ladsgroup.json
14:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
14:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T342617)', diff saved to https://phabricator.wikimedia.org/P50422 and previous config saved to /var/cache/conftool/dbconfig/20230810-142053-ladsgroup.json
14:16 jforrester@deploy1002: jforrester: Continuing with sync
14:16 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Tell WikiLambda to stash results in our bespoke cache (T342753) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
14:14 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Tell WikiLambda to stash results in our bespoke cache (T342753)
14:12 jforrester@deploy1002: Finished scap: Backport for Add wikifunctions-staff to wmgPrivilegedGroups (T342868) (duration: 08m 35s)
14:06 jforrester@deploy1002: jforrester: Continuing with sync
14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P50421 and previous config saved to /var/cache/conftool/dbconfig/20230810-140546-ladsgroup.json
14:05 jforrester@deploy1002: jforrester: Backport for Add wikifunctions-staff to wmgPrivilegedGroups (T342868) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
14:04 jforrester@deploy1002: Started scap: Backport for Add wikifunctions-staff to wmgPrivilegedGroups (T342868)
14:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-coord1001.eqiad.wmnet with OS bullseye
13:57 Lucas_WMDE: UTC afternoon backport+config window done
13:52 Emperor: restart puppet and repool ms-fe2009 after testing T211661
13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P50420 and previous config saved to /var/cache/conftool/dbconfig/20230810-135040-ladsgroup.json
13:47 Emperor: depool and stop puppet on ms-fe2009 to test updated rewrite.py T211661
13:45 oblivian@deploy1002: Finished scap: Backport for Add wikifunctions object cache (T297815) (duration: 09m 09s)
13:38 oblivian@deploy1002: oblivian: Continuing with sync
13:37 oblivian@deploy1002: oblivian: Backport for Add wikifunctions object cache (T297815) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:36 oblivian@deploy1002: Started scap: Backport for Add wikifunctions object cache (T297815)
13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T342617)', diff saved to https://phabricator.wikimedia.org/P50419 and previous config saved to /var/cache/conftool/dbconfig/20230810-133534-ladsgroup.json
13:33 samtar@deploy1002: Finished scap: Backport for IS: Enable Phonos on medium projects (T336763) (duration: 10m 58s)
13:26 samtar@deploy1002: samtar: Continuing with sync
13:24 samtar@deploy1002: samtar: Backport for IS: Enable Phonos on medium projects (T336763) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:22 samtar@deploy1002: Started scap: Backport for IS: Enable Phonos on medium projects (T336763)
13:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1092.eqiad.wmnet with OS bullseye
13:14 TheresNoTime: `[samtar@mwmaint1002 ~]$ foreachwiki sql.php /srv/mediawiki-staging/php-1.41.0-wmf.20/extensions/CheckUser/schema/mysql/cu_useragent_clienthints_map.sql` for T258105
13:09 TheresNoTime: `[samtar@mwmaint1002 ~]$ foreachwiki sql.php /srv/mediawiki-staging/php-1.41.0-wmf.20/extensions/CheckUser/schema/mysql/cu_useragent_clienthints.sql` for T258105
12:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1092.eqiad.wmnet with reason: host reimage
12:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1092.eqiad.wmnet with reason: host reimage
12:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
12:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T342617)', diff saved to https://phabricator.wikimedia.org/P50418 and previous config saved to /var/cache/conftool/dbconfig/20230810-122626-ladsgroup.json
12:22 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1092.eqiad.wmnet with OS bullseye
12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P50417 and previous config saved to /var/cache/conftool/dbconfig/20230810-121120-ladsgroup.json
12:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1091.eqiad.wmnet with OS bullseye
11:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3007.wikimedia.org
11:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3007.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P50416 and previous config saved to /var/cache/conftool/dbconfig/20230810-115614-ladsgroup.json
11:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
11:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
11:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: host reimage
11:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1091.eqiad.wmnet with reason: host reimage
11:45 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: host reimage
11:45 taavi@deploy1002: Finished scap: Backport for GlobalRename: Ensure status database rows use the normalized name (T343956) (duration: 10m 17s)
11:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3007.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:42 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1091.eqiad.wmnet with reason: host reimage
11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T342617)', diff saved to https://phabricator.wikimedia.org/P50415 and previous config saved to /var/cache/conftool/dbconfig/20230810-114108-ladsgroup.json
11:40 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add manufacture to network devices - jbond@cumin1001 - T329669"
11:39 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add manufacture to network devices - jbond@cumin1001 - T329669"
11:39 taavi@deploy1002: taavi and urbanecm: Continuing with sync
11:36 taavi@deploy1002: taavi and urbanecm: Backport for GlobalRename: Ensure status database rows use the normalized name (T343956) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
11:35 taavi@deploy1002: Started scap: Backport for GlobalRename: Ensure status database rows use the normalized name (T343956)
11:34 taavi@deploy1002: Finished scap: Backport for throttle: remove expired rules, throttle: add rules for Wikimania 2023 (T343595) (duration: 11m 30s)
11:32 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1001.eqiad.wmnet with OS bullseye
11:27 taavi@deploy1002: taavi: Continuing with sync
11:24 taavi@deploy1002: taavi: Backport for throttle: remove expired rules, throttle: add rules for Wikimania 2023 (T343595) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
11:23 taavi@deploy1002: Started scap: Backport for throttle: remove expired rules, throttle: add rules for Wikimania 2023 (T343595)
11:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
11:14 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1091.eqiad.wmnet with OS bullseye
10:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1090.eqiad.wmnet with OS bullseye
10:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
10:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
10:36 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
10:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
10:34 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
10:33 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
10:32 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
10:32 jiji@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
10:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1090.eqiad.wmnet with reason: host reimage
10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1090.eqiad.wmnet with reason: host reimage
10:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet
10:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet
10:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
10:16 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1090.eqiad.wmnet with OS bullseye
10:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
10:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
09:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
09:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1002.eqiad.wmnet
09:09 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=arwiki --logwiki=metawiki 'Qwertyoruiop' '3h6 1'
09:08 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Mittzy' 'Mittzy (usurped)'
09:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
09:07 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=amwiki --logwiki=metawiki 'Jean-Mahmood' 'User92259453'
09:07 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Garciajaysonpinolkwani98' 'Ne_Shokot_Pinolkwane'
09:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1004.eqiad.wmnet
09:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet
09:06 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1004.eqiad.wmnet
09:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet
09:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1002.eqiad.wmnet
09:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
09:03 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'CHUniZH' 'Musik CH' # T343867
08:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
08:46 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:42 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3007.wikimedia.org
08:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
08:26 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
08:21 godog: put back business hours americas for sre business hours escalation - T343812
08:21 godog: put back business hours americas for sre business hours escalation
08:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast5004.wikimedia.org
08:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
07:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
07:52 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast5004.wikimedia.org
07:19 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host bast5004.wikimedia.org
07:19 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast5004.wikimedia.org with OS bookworm
06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T342617)', diff saved to https://phabricator.wikimedia.org/P50414 and previous config saved to /var/cache/conftool/dbconfig/20230810-063611-ladsgroup.json
06:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50413 and previous config saved to /var/cache/conftool/dbconfig/20230810-063523-ladsgroup.json
06:23 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
06:20 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P50412 and previous config saved to /var/cache/conftool/dbconfig/20230810-062017-ladsgroup.json
06:08 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
06:05 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-codfw
06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P50411 and previous config saved to /var/cache/conftool/dbconfig/20230810-060511-ladsgroup.json
05:59 moritzm: installing tiff security updates
05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50410 and previous config saved to /var/cache/conftool/dbconfig/20230810-055005-ladsgroup.json
05:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast5004.wikimedia.org with OS bookworm
05:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5004.wikimedia.org - jmm@cumin2002"
05:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5004.wikimedia.org - jmm@cumin2002"
05:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5004.wikimedia.org on all recursors
05:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast5004.wikimedia.org on all recursors
05:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
05:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5004.wikimedia.org - jmm@cumin2002"
05:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5004.wikimedia.org - jmm@cumin2002"
05:27 jmm@cumin2002: START - Cookbook sre.dns.netbox
05:27 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast5004.wikimedia.org
05:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1015.eqiad.wmnet
04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T342617)', diff saved to https://phabricator.wikimedia.org/P50409 and previous config saved to /var/cache/conftool/dbconfig/20230810-044643-ladsgroup.json
04:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
04:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50408 and previous config saved to /var/cache/conftool/dbconfig/20230810-044622-ladsgroup.json
04:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P50407 and previous config saved to /var/cache/conftool/dbconfig/20230810-043116-ladsgroup.json
04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P50406 and previous config saved to /var/cache/conftool/dbconfig/20230810-041610-ladsgroup.json
04:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50405 and previous config saved to /var/cache/conftool/dbconfig/20230810-040104-ladsgroup.json
03:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
03:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
02:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
02:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
02:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T342617)', diff saved to https://phabricator.wikimedia.org/P50404 and previous config saved to /var/cache/conftool/dbconfig/20230810-024531-ladsgroup.json
02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P50403 and previous config saved to /var/cache/conftool/dbconfig/20230810-023025-ladsgroup.json
02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P50402 and previous config saved to /var/cache/conftool/dbconfig/20230810-021518-ladsgroup.json
02:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T342617)', diff saved to https://phabricator.wikimedia.org/P50401 and previous config saved to /var/cache/conftool/dbconfig/20230810-020012-ladsgroup.json
01:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T342617)', diff saved to https://phabricator.wikimedia.org/P50400 and previous config saved to /var/cache/conftool/dbconfig/20230810-014731-ladsgroup.json
01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P50399 and previous config saved to /var/cache/conftool/dbconfig/20230810-013225-ladsgroup.json
01:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P50398 and previous config saved to /var/cache/conftool/dbconfig/20230810-011718-ladsgroup.json
01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T342617)', diff saved to https://phabricator.wikimedia.org/P50397 and previous config saved to /var/cache/conftool/dbconfig/20230810-011228-ladsgroup.json
01:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
01:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T342617)', diff saved to https://phabricator.wikimedia.org/P50396 and previous config saved to /var/cache/conftool/dbconfig/20230810-011207-ladsgroup.json
01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T342617)', diff saved to https://phabricator.wikimedia.org/P50395 and previous config saved to /var/cache/conftool/dbconfig/20230810-010212-ladsgroup.json
00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P50394 and previous config saved to /var/cache/conftool/dbconfig/20230810-005701-ladsgroup.json
00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P50393 and previous config saved to /var/cache/conftool/dbconfig/20230810-004154-ladsgroup.json
00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T342617)', diff saved to https://phabricator.wikimedia.org/P50392 and previous config saved to /var/cache/conftool/dbconfig/20230810-002648-ladsgroup.json
00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T342617)', diff saved to https://phabricator.wikimedia.org/P50391 and previous config saved to /var/cache/conftool/dbconfig/20230810-001437-ladsgroup.json
00:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
00:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50390 and previous config saved to /var/cache/conftool/dbconfig/20230810-001414-ladsgroup.json

2023-08-09

23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P50389 and previous config saved to /var/cache/conftool/dbconfig/20230809-235908-ladsgroup.json
23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P50388 and previous config saved to /var/cache/conftool/dbconfig/20230809-234402-ladsgroup.json
23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T342617)', diff saved to https://phabricator.wikimedia.org/P50387 and previous config saved to /var/cache/conftool/dbconfig/20230809-234146-ladsgroup.json
23:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
23:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T342617)', diff saved to https://phabricator.wikimedia.org/P50386 and previous config saved to /var/cache/conftool/dbconfig/20230809-234125-ladsgroup.json
23:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50385 and previous config saved to /var/cache/conftool/dbconfig/20230809-232855-ladsgroup.json
23:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P50384 and previous config saved to /var/cache/conftool/dbconfig/20230809-232619-ladsgroup.json
23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P50383 and previous config saved to /var/cache/conftool/dbconfig/20230809-231112-ladsgroup.json
23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50382 and previous config saved to /var/cache/conftool/dbconfig/20230809-230339-ladsgroup.json
23:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
23:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
22:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T342617)', diff saved to https://phabricator.wikimedia.org/P50381 and previous config saved to /var/cache/conftool/dbconfig/20230809-225605-ladsgroup.json
22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50380 and previous config saved to /var/cache/conftool/dbconfig/20230809-224114-ladsgroup.json
22:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
22:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50379 and previous config saved to /var/cache/conftool/dbconfig/20230809-224053-ladsgroup.json
22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P50378 and previous config saved to /var/cache/conftool/dbconfig/20230809-222547-ladsgroup.json
22:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P50377 and previous config saved to /var/cache/conftool/dbconfig/20230809-221041-ladsgroup.json
22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T342617)', diff saved to https://phabricator.wikimedia.org/P50376 and previous config saved to /var/cache/conftool/dbconfig/20230809-220433-ladsgroup.json
22:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
22:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T342617)', diff saved to https://phabricator.wikimedia.org/P50375 and previous config saved to /var/cache/conftool/dbconfig/20230809-220412-ladsgroup.json
21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50373 and previous config saved to /var/cache/conftool/dbconfig/20230809-215535-ladsgroup.json
21:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P50372 and previous config saved to /var/cache/conftool/dbconfig/20230809-214905-ladsgroup.json
21:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P50371 and previous config saved to /var/cache/conftool/dbconfig/20230809-213359-ladsgroup.json
21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50369 and previous config saved to /var/cache/conftool/dbconfig/20230809-212042-ladsgroup.json
21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50368 and previous config saved to /var/cache/conftool/dbconfig/20230809-212021-ladsgroup.json
21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T342617)', diff saved to https://phabricator.wikimedia.org/P50367 and previous config saved to /var/cache/conftool/dbconfig/20230809-211853-ladsgroup.json
21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50366 and previous config saved to /var/cache/conftool/dbconfig/20230809-210856-ladsgroup.json
21:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
21:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T342617)', diff saved to https://phabricator.wikimedia.org/P50365 and previous config saved to /var/cache/conftool/dbconfig/20230809-210835-ladsgroup.json
21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P50364 and previous config saved to /var/cache/conftool/dbconfig/20230809-210514-ladsgroup.json
20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P50363 and previous config saved to /var/cache/conftool/dbconfig/20230809-205329-ladsgroup.json
20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P50362 and previous config saved to /var/cache/conftool/dbconfig/20230809-205008-ladsgroup.json
20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P50361 and previous config saved to /var/cache/conftool/dbconfig/20230809-203822-ladsgroup.json
20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P50360 and previous config saved to /var/cache/conftool/dbconfig/20230809-203731-ladsgroup.json
20:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50359 and previous config saved to /var/cache/conftool/dbconfig/20230809-203502-ladsgroup.json
20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T342617)', diff saved to https://phabricator.wikimedia.org/P50358 and previous config saved to /var/cache/conftool/dbconfig/20230809-203041-ladsgroup.json
20:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
20:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T342617)', diff saved to https://phabricator.wikimedia.org/P50357 and previous config saved to /var/cache/conftool/dbconfig/20230809-203020-ladsgroup.json
20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T342617)', diff saved to https://phabricator.wikimedia.org/P50356 and previous config saved to /var/cache/conftool/dbconfig/20230809-202316-ladsgroup.json
20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P50355 and previous config saved to /var/cache/conftool/dbconfig/20230809-202225-ladsgroup.json
20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P50354 and previous config saved to /var/cache/conftool/dbconfig/20230809-201514-ladsgroup.json
20:09 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts contint2001.wikimedia.org
20:09 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:09 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - aokoth@cumin1001"
20:08 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - aokoth@cumin1001"
20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P50353 and previous config saved to /var/cache/conftool/dbconfig/20230809-200718-ladsgroup.json
20:05 aokoth@cumin1001: START - Cookbook sre.dns.netbox
20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P50352 and previous config saved to /var/cache/conftool/dbconfig/20230809-200007-ladsgroup.json
19:59 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts contint2001.wikimedia.org
19:58 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
19:58 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P50351 and previous config saved to /var/cache/conftool/dbconfig/20230809-195212-ladsgroup.json
19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T342617)', diff saved to https://phabricator.wikimedia.org/P50350 and previous config saved to /var/cache/conftool/dbconfig/20230809-194501-ladsgroup.json
19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T342617)', diff saved to https://phabricator.wikimedia.org/P50349 and previous config saved to /var/cache/conftool/dbconfig/20230809-193623-ladsgroup.json
19:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
19:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T342617)', diff saved to https://phabricator.wikimedia.org/P50348 and previous config saved to /var/cache/conftool/dbconfig/20230809-193559-ladsgroup.json
19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P50347 and previous config saved to /var/cache/conftool/dbconfig/20230809-192818-ladsgroup.json
19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
19:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P50346 and previous config saved to /var/cache/conftool/dbconfig/20230809-192746-ladsgroup.json
19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P50345 and previous config saved to /var/cache/conftool/dbconfig/20230809-192053-ladsgroup.json
19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P50344 and previous config saved to /var/cache/conftool/dbconfig/20230809-191240-ladsgroup.json
19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P50343 and previous config saved to /var/cache/conftool/dbconfig/20230809-190547-ladsgroup.json
18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T342617)', diff saved to https://phabricator.wikimedia.org/P50342 and previous config saved to /var/cache/conftool/dbconfig/20230809-185805-ladsgroup.json
18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T342617)', diff saved to https://phabricator.wikimedia.org/P50341 and previous config saved to /var/cache/conftool/dbconfig/20230809-185745-ladsgroup.json
18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P50340 and previous config saved to /var/cache/conftool/dbconfig/20230809-185734-ladsgroup.json
18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T342617)', diff saved to https://phabricator.wikimedia.org/P50339 and previous config saved to /var/cache/conftool/dbconfig/20230809-185040-ladsgroup.json
18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P50338 and previous config saved to /var/cache/conftool/dbconfig/20230809-184238-ladsgroup.json
18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P50337 and previous config saved to /var/cache/conftool/dbconfig/20230809-184228-ladsgroup.json
18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P50336 and previous config saved to /var/cache/conftool/dbconfig/20230809-184018-ladsgroup.json
18:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
18:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
18:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P50335 and previous config saved to /var/cache/conftool/dbconfig/20230809-183952-ladsgroup.json
18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P50334 and previous config saved to /var/cache/conftool/dbconfig/20230809-182726-ladsgroup.json
18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P50333 and previous config saved to /var/cache/conftool/dbconfig/20230809-182446-ladsgroup.json
18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T342617)', diff saved to https://phabricator.wikimedia.org/P50332 and previous config saved to /var/cache/conftool/dbconfig/20230809-181219-ladsgroup.json
18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P50331 and previous config saved to /var/cache/conftool/dbconfig/20230809-180940-ladsgroup.json
18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T342617)', diff saved to https://phabricator.wikimedia.org/P50330 and previous config saved to /var/cache/conftool/dbconfig/20230809-180143-ladsgroup.json
18:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
18:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T342617)', diff saved to https://phabricator.wikimedia.org/P50329 and previous config saved to /var/cache/conftool/dbconfig/20230809-180122-ladsgroup.json
17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P50328 and previous config saved to /var/cache/conftool/dbconfig/20230809-175434-ladsgroup.json
17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P50327 and previous config saved to /var/cache/conftool/dbconfig/20230809-174616-ladsgroup.json
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P50326 and previous config saved to /var/cache/conftool/dbconfig/20230809-173110-ladsgroup.json
17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P50325 and previous config saved to /var/cache/conftool/dbconfig/20230809-172803-ladsgroup.json
17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T342617)', diff saved to https://phabricator.wikimedia.org/P50324 and previous config saved to /var/cache/conftool/dbconfig/20230809-172507-ladsgroup.json
17:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
17:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T342617)', diff saved to https://phabricator.wikimedia.org/P50323 and previous config saved to /var/cache/conftool/dbconfig/20230809-172447-ladsgroup.json
17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T342617)', diff saved to https://phabricator.wikimedia.org/P50322 and previous config saved to /var/cache/conftool/dbconfig/20230809-171604-ladsgroup.json
17:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
17:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50321 and previous config saved to /var/cache/conftool/dbconfig/20230809-171533-ladsgroup.json
17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P50320 and previous config saved to /var/cache/conftool/dbconfig/20230809-170940-ladsgroup.json
17:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
17:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P50319 and previous config saved to /var/cache/conftool/dbconfig/20230809-170351-ladsgroup.json
17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P50318 and previous config saved to /var/cache/conftool/dbconfig/20230809-170027-ladsgroup.json
16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P50317 and previous config saved to /var/cache/conftool/dbconfig/20230809-165434-ladsgroup.json
16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P50316 and previous config saved to /var/cache/conftool/dbconfig/20230809-164844-ladsgroup.json
16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P50315 and previous config saved to /var/cache/conftool/dbconfig/20230809-164520-ladsgroup.json
16:44 elukey: temporarly bump miscweb bugzilla pods from 4 to 8 in k8s wikikube codfw
16:42 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
16:41 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T342617)', diff saved to https://phabricator.wikimedia.org/P50314 and previous config saved to /var/cache/conftool/dbconfig/20230809-163928-ladsgroup.json
16:38 elukey: temporarly bump miscweb bugzilla pods from 2 to 4 in k8s wikikube codfw
16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P50313 and previous config saved to /var/cache/conftool/dbconfig/20230809-163338-ladsgroup.json
16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50312 and previous config saved to /var/cache/conftool/dbconfig/20230809-163014-ladsgroup.json
16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T342617)', diff saved to https://phabricator.wikimedia.org/P50311 and previous config saved to /var/cache/conftool/dbconfig/20230809-162913-ladsgroup.json
16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T342617)', diff saved to https://phabricator.wikimedia.org/P50310 and previous config saved to /var/cache/conftool/dbconfig/20230809-162836-ladsgroup.json
16:22 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P50308 and previous config saved to /var/cache/conftool/dbconfig/20230809-161832-ladsgroup.json
16:17 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
16:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-master1001.eqiad.wmnet with OS bullseye
16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P50307 and previous config saved to /var/cache/conftool/dbconfig/20230809-161330-ladsgroup.json
15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P50306 and previous config saved to /var/cache/conftool/dbconfig/20230809-155824-ladsgroup.json
15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P50305 and previous config saved to /var/cache/conftool/dbconfig/20230809-155137-ladsgroup.json
15:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T342617)', diff saved to https://phabricator.wikimedia.org/P50304 and previous config saved to /var/cache/conftool/dbconfig/20230809-155127-ladsgroup.json
15:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
15:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50303 and previous config saved to /var/cache/conftool/dbconfig/20230809-155116-ladsgroup.json
15:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T342617)', diff saved to https://phabricator.wikimedia.org/P50302 and previous config saved to /var/cache/conftool/dbconfig/20230809-155106-ladsgroup.json
15:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: host reimage
15:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
15:47 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
15:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
15:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: host reimage
15:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
15:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
15:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1089.eqiad.wmnet with OS bullseye
15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T342617)', diff saved to https://phabricator.wikimedia.org/P50301 and previous config saved to /var/cache/conftool/dbconfig/20230809-154317-ladsgroup.json
15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50300 and previous config saved to /var/cache/conftool/dbconfig/20230809-153610-ladsgroup.json
15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P50299 and previous config saved to /var/cache/conftool/dbconfig/20230809-153600-ladsgroup.json
15:29 hnowlan: disabling puppet on A:cp to test r/947372
15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50298 and previous config saved to /var/cache/conftool/dbconfig/20230809-152103-ladsgroup.json
15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P50297 and previous config saved to /var/cache/conftool/dbconfig/20230809-152053-ladsgroup.json
15:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1089.eqiad.wmnet with reason: host reimage
15:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1089.eqiad.wmnet with reason: host reimage
15:06 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-master1001.eqiad.wmnet with OS bullseye
15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50295 and previous config saved to /var/cache/conftool/dbconfig/20230809-150557-ladsgroup.json
15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T342617)', diff saved to https://phabricator.wikimedia.org/P50294 and previous config saved to /var/cache/conftool/dbconfig/20230809-150547-ladsgroup.json
15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50293 and previous config saved to /var/cache/conftool/dbconfig/20230809-150443-ladsgroup.json
15:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
15:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
14:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
14:57 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1089.eqiad.wmnet with OS bullseye
14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T342617)', diff saved to https://phabricator.wikimedia.org/P50292 and previous config saved to /var/cache/conftool/dbconfig/20230809-145714-ladsgroup.json
14:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1088.eqiad.wmnet with OS bullseye
14:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
14:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T342617)', diff saved to https://phabricator.wikimedia.org/P50291 and previous config saved to /var/cache/conftool/dbconfig/20230809-145653-ladsgroup.json
14:49 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki wikifunctionswiki --fix` for T342964
14:48 samtar@deploy1002: Finished scap: Backport for core-namespaces: Remove dupe wikifunctions alias (T342964) (duration: 14m 21s)
14:42 samtar@deploy1002: samtar: Continuing with sync
14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P50290 and previous config saved to /var/cache/conftool/dbconfig/20230809-144147-ladsgroup.json
14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P50289 and previous config saved to /var/cache/conftool/dbconfig/20230809-144022-ladsgroup.json
14:36 samtar@deploy1002: samtar: Backport for core-namespaces: Remove dupe wikifunctions alias (T342964) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
14:34 samtar@deploy1002: Started scap: Backport for core-namespaces: Remove dupe wikifunctions alias (T342964)
14:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1088.eqiad.wmnet with reason: host reimage
14:31 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1088.eqiad.wmnet with reason: host reimage
14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P50288 and previous config saved to /var/cache/conftool/dbconfig/20230809-142640-ladsgroup.json
14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P50287 and previous config saved to /var/cache/conftool/dbconfig/20230809-142515-ladsgroup.json
14:24 moritzm: installing sudo bugfix updates from Bookworm 12.1 point release
14:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
14:18 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1088.eqiad.wmnet with OS bullseye
14:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T342617)', diff saved to https://phabricator.wikimedia.org/P50285 and previous config saved to /var/cache/conftool/dbconfig/20230809-141134-ladsgroup.json
14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P50284 and previous config saved to /var/cache/conftool/dbconfig/20230809-141009-ladsgroup.json
14:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-master1002.eqiad.wmnet with OS bullseye
14:07 moritzm: restarting FPM on mediawiki canaries to pick up tiff update
14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50283 and previous config saved to /var/cache/conftool/dbconfig/20230809-140551-ladsgroup.json
14:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
14:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50282 and previous config saved to /var/cache/conftool/dbconfig/20230809-140531-ladsgroup.json
13:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1087.eqiad.wmnet with OS bullseye
13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P50281 and previous config saved to /var/cache/conftool/dbconfig/20230809-135503-ladsgroup.json
13:54 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
13:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
13:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P50280 and previous config saved to /var/cache/conftool/dbconfig/20230809-135356-ladsgroup.json
13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P50279 and previous config saved to /var/cache/conftool/dbconfig/20230809-135324-ladsgroup.json
13:52 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
13:52 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
13:52 moritzm: installing tiff security updates
13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P50278 and previous config saved to /var/cache/conftool/dbconfig/20230809-135024-ladsgroup.json
13:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: host reimage
13:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: host reimage
13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T342617)', diff saved to https://phabricator.wikimedia.org/P50277 and previous config saved to /var/cache/conftool/dbconfig/20230809-134136-ladsgroup.json
13:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
13:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50276 and previous config saved to /var/cache/conftool/dbconfig/20230809-134115-ladsgroup.json
13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P50275 and previous config saved to /var/cache/conftool/dbconfig/20230809-133818-ladsgroup.json
13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P50274 and previous config saved to /var/cache/conftool/dbconfig/20230809-133518-ladsgroup.json
13:33 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-master1002.eqiad.wmnet with OS bullseye
13:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1087.eqiad.wmnet with reason: host reimage
13:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1087.eqiad.wmnet with reason: host reimage
13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P50273 and previous config saved to /var/cache/conftool/dbconfig/20230809-132609-ladsgroup.json
13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T342617)', diff saved to https://phabricator.wikimedia.org/P50272 and previous config saved to /var/cache/conftool/dbconfig/20230809-132446-ladsgroup.json
13:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
13:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T342617)', diff saved to https://phabricator.wikimedia.org/P50271 and previous config saved to /var/cache/conftool/dbconfig/20230809-132424-ladsgroup.json
13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P50270 and previous config saved to /var/cache/conftool/dbconfig/20230809-132312-ladsgroup.json
13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50269 and previous config saved to /var/cache/conftool/dbconfig/20230809-132012-ladsgroup.json
13:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
13:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
13:12 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1087.eqiad.wmnet with OS bullseye
13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P50268 and previous config saved to /var/cache/conftool/dbconfig/20230809-131103-ladsgroup.json
13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P50267 and previous config saved to /var/cache/conftool/dbconfig/20230809-130918-ladsgroup.json
13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P50266 and previous config saved to /var/cache/conftool/dbconfig/20230809-130805-ladsgroup.json
13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P50265 and previous config saved to /var/cache/conftool/dbconfig/20230809-130557-ladsgroup.json
13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P50264 and previous config saved to /var/cache/conftool/dbconfig/20230809-130518-ladsgroup.json
12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50263 and previous config saved to /var/cache/conftool/dbconfig/20230809-125555-ladsgroup.json
12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P50262 and previous config saved to /var/cache/conftool/dbconfig/20230809-125412-ladsgroup.json
12:53 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
12:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
12:52 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
12:51 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P50261 and previous config saved to /var/cache/conftool/dbconfig/20230809-125012-ladsgroup.json
12:49 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
12:49 dcausse: restarting blazegraph on wdqs1007 (BlazegraphFreeAllocatorsDecreasingRapidly)
12:48 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:48 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
12:48 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
12:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
12:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1086.eqiad.wmnet with OS bullseye
12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T342617)', diff saved to https://phabricator.wikimedia.org/P50260 and previous config saved to /var/cache/conftool/dbconfig/20230809-123906-ladsgroup.json
12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P50259 and previous config saved to /var/cache/conftool/dbconfig/20230809-123506-ladsgroup.json
12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P50258 and previous config saved to /var/cache/conftool/dbconfig/20230809-122000-ladsgroup.json
12:19 jayme@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
12:19 jayme@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
12:18 jayme@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
12:18 jayme@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P50257 and previous config saved to /var/cache/conftool/dbconfig/20230809-121852-ladsgroup.json
12:18 jayme@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
12:18 jayme@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
12:18 jayme@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
12:18 jayme@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P50256 and previous config saved to /var/cache/conftool/dbconfig/20230809-121831-ladsgroup.json
12:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1086.eqiad.wmnet with reason: host reimage
12:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1086.eqiad.wmnet with reason: host reimage
12:13 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
12:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
12:12 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
12:11 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
12:11 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
12:11 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:11 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
12:11 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:10 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
12:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
12:08 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
12:08 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P50255 and previous config saved to /var/cache/conftool/dbconfig/20230809-120325-ladsgroup.json
12:01 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1086.eqiad.wmnet with OS bullseye
12:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1085.eqiad.wmnet with OS bullseye
11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50254 and previous config saved to /var/cache/conftool/dbconfig/20230809-115534-ladsgroup.json
11:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T342617)', diff saved to https://phabricator.wikimedia.org/P50253 and previous config saved to /var/cache/conftool/dbconfig/20230809-115227-ladsgroup.json
11:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
11:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T342617)', diff saved to https://phabricator.wikimedia.org/P50252 and previous config saved to /var/cache/conftool/dbconfig/20230809-115206-ladsgroup.json
11:49 ladsgroup@deploy1002: Finished scap: Backport for sdwiki: set 'wgTranslateNumerals' to false (T268203) (duration: 09m 22s)
11:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P50251 and previous config saved to /var/cache/conftool/dbconfig/20230809-114819-ladsgroup.json
11:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
11:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
11:41 ladsgroup@deploy1002: kaleembhatti and ladsgroup: Continuing with sync
11:41 ladsgroup@deploy1002: kaleembhatti and ladsgroup: Backport for sdwiki: set 'wgTranslateNumerals' to false (T268203) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
11:39 ladsgroup@deploy1002: Started scap: Backport for sdwiki: set 'wgTranslateNumerals' to false (T268203)
11:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1085.eqiad.wmnet with reason: host reimage
11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P50250 and previous config saved to /var/cache/conftool/dbconfig/20230809-113659-ladsgroup.json
11:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1085.eqiad.wmnet with reason: host reimage
11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P50249 and previous config saved to /var/cache/conftool/dbconfig/20230809-113312-ladsgroup.json
11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P50248 and previous config saved to /var/cache/conftool/dbconfig/20230809-113205-ladsgroup.json
11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P50247 and previous config saved to /var/cache/conftool/dbconfig/20230809-113144-ladsgroup.json
11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P50246 and previous config saved to /var/cache/conftool/dbconfig/20230809-112153-ladsgroup.json
11:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1085.eqiad.wmnet with OS bullseye
11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P50245 and previous config saved to /var/cache/conftool/dbconfig/20230809-111638-ladsgroup.json
11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50244 and previous config saved to /var/cache/conftool/dbconfig/20230809-111141-ladsgroup.json
11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T342617)', diff saved to https://phabricator.wikimedia.org/P50243 and previous config saved to /var/cache/conftool/dbconfig/20230809-110647-ladsgroup.json
11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P50242 and previous config saved to /var/cache/conftool/dbconfig/20230809-110132-ladsgroup.json
10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P50241 and previous config saved to /var/cache/conftool/dbconfig/20230809-105635-ladsgroup.json
10:56 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
10:55 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
10:55 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
10:55 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
10:54 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
10:54 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P50240 and previous config saved to /var/cache/conftool/dbconfig/20230809-104625-ladsgroup.json
10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P50239 and previous config saved to /var/cache/conftool/dbconfig/20230809-104518-ladsgroup.json
10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50238 and previous config saved to /var/cache/conftool/dbconfig/20230809-104457-ladsgroup.json
10:44 _joe_: ran requestctl commit, which removed the comma removal from the requestctl output as per T305582
10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P50237 and previous config saved to /var/cache/conftool/dbconfig/20230809-104128-ladsgroup.json
10:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1084.eqiad.wmnet with OS bullseye
10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P50236 and previous config saved to /var/cache/conftool/dbconfig/20230809-102951-ladsgroup.json
10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50235 and previous config saved to /var/cache/conftool/dbconfig/20230809-102622-ladsgroup.json
10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T342617)', diff saved to https://phabricator.wikimedia.org/P50234 and previous config saved to /var/cache/conftool/dbconfig/20230809-101946-ladsgroup.json
10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P50233 and previous config saved to /var/cache/conftool/dbconfig/20230809-101444-ladsgroup.json
10:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1002.eqiad.wmnet
10:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1084.eqiad.wmnet with reason: host reimage
10:09 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1084.eqiad.wmnet with reason: host reimage
10:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1002.eqiad.wmnet
10:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
10:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1002.eqiad.wmnet
09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50232 and previous config saved to /var/cache/conftool/dbconfig/20230809-095938-ladsgroup.json
09:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-master1002.eqiad.wmnet
09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50231 and previous config saved to /var/cache/conftool/dbconfig/20230809-095730-ladsgroup.json
09:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
09:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
09:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
09:55 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
09:55 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
09:54 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1084.eqiad.wmnet with OS bullseye
09:48 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
09:48 jayme@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50230 and previous config saved to /var/cache/conftool/dbconfig/20230809-093715-ladsgroup.json
09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50229 and previous config saved to /var/cache/conftool/dbconfig/20230809-093341-ladsgroup.json
09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
09:31 hnowlan: disabling puppet on A:cp to test 945558
09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
09:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
09:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
09:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50228 and previous config saved to /var/cache/conftool/dbconfig/20230809-092319-ladsgroup.json
09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
09:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50227 and previous config saved to /var/cache/conftool/dbconfig/20230809-092258-ladsgroup.json
09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P50226 and previous config saved to /var/cache/conftool/dbconfig/20230809-090750-ladsgroup.json
09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
09:02 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P50225 and previous config saved to /var/cache/conftool/dbconfig/20230809-085244-ladsgroup.json
08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50224 and previous config saved to /var/cache/conftool/dbconfig/20230809-083738-ladsgroup.json
08:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
08:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50223 and previous config saved to /var/cache/conftool/dbconfig/20230809-083319-ladsgroup.json
08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
07:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
07:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
07:12 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 7 Wikipedias (T343211) (duration: 09m 58s)
07:05 kartik@deploy1002: kartik: Continuing with sync
07:03 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 7 Wikipedias (T343211) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:02 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 7 Wikipedias (T343211)
06:52 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jkieserman out of all services on: 33 hosts
06:51 root@cumin2002: START - Cookbook sre.idm.logout Logging Jkieserman out of all services on: 33 hosts
06:51 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jkieserman out of all services on: 716 hosts
06:51 root@cumin2002: START - Cookbook sre.idm.logout Logging Jkieserman out of all services on: 716 hosts
06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jkieserman out of all services on: 1309 hosts
06:46 root@cumin2002: START - Cookbook sre.idm.logout Logging Jkieserman out of all services on: 1309 hosts
06:46 root@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Jmads out of all services on: 1309 hosts
06:46 root@cumin2002: START - Cookbook sre.idm.logout Logging Jmads out of all services on: 1309 hosts
06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50222 and previous config saved to /var/cache/conftool/dbconfig/20230809-061826-ladsgroup.json
06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50219 and previous config saved to /var/cache/conftool/dbconfig/20230809-013145-ladsgroup.json
01:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
01:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T342617)', diff saved to https://phabricator.wikimedia.org/P50218 and previous config saved to /var/cache/conftool/dbconfig/20230809-013124-ladsgroup.json
01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P50217 and previous config saved to /var/cache/conftool/dbconfig/20230809-011618-ladsgroup.json
01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P50216 and previous config saved to /var/cache/conftool/dbconfig/20230809-010112-ladsgroup.json
00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T342617)', diff saved to https://phabricator.wikimedia.org/P50215 and previous config saved to /var/cache/conftool/dbconfig/20230809-004605-ladsgroup.json
00:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
00:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50214 and previous config saved to /var/cache/conftool/dbconfig/20230809-003817-ladsgroup.json
00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P50213 and previous config saved to /var/cache/conftool/dbconfig/20230809-002310-ladsgroup.json
00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P50212 and previous config saved to /var/cache/conftool/dbconfig/20230809-000804-ladsgroup.json

2023-08-08

23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50211 and previous config saved to /var/cache/conftool/dbconfig/20230808-235258-ladsgroup.json
22:33 urbanecm: mwmaint1002: stop persistRevisionThreadItems.php frwiki instance because of T343859 (cc T315510)
22:04 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 17s)
22:03 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177
21:57 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
21:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:46 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs1003.eqiad.wmnet with OS bullseye
21:22 brett: Exported varnish-modules 0.15.0-4 for bookworm-wikimedia (T342154)
21:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage
21:15 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage
21:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
21:06 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
21:04 bking@cumin1001: conftool action : set/pooled=no; selector: name=wcqs1003.eqiad.wmnet,service=wcqs
21:02 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wcqs,name=eqiad
21:02 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye
20:58 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 17s)
20:58 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177
20:57 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs1002.eqiad.wmnet with OS bullseye
20:52 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 18s)
20:52 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177
20:43 urbanecm@deploy1002: Finished scap: Backport for Deploy to CN language wikis (T335886) (duration: 09m 08s)
20:41 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: whitelist new qlever endpoints take 4 (forgot git pull) T339347 (duration: 10m 44s)
20:37 urbanecm@deploy1002: ksarabia and urbanecm: Continuing with sync
20:36 urbanecm@deploy1002: ksarabia and urbanecm: Backport for Deploy to CN language wikis (T335886) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:34 urbanecm@deploy1002: Started scap: Backport for Deploy to CN language wikis (T335886)
20:31 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s6' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315510)
20:30 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s5' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353)
20:30 ryankemper@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: whitelist new qlever endpoints take 4 (forgot git pull) T339347
20:30 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s3' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353)
20:29 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s2' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353)
20:24 urbanecm@deploy1002: Finished scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353) (duration: 10m 55s)
20:17 urbanecm@deploy1002: urbanecm and matmarex: Continuing with sync
20:16 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 3 T339347 (duration: 02m 54s)
20:14 urbanecm@deploy1002: urbanecm and matmarex: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:14 ryankemper: [WDQS] Lag caught up on `wdqs1006`; repooled -> `ryankemper@wdqs1006:~$ sudo pool`
20:13 urbanecm@deploy1002: Started scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353)
20:13 ryankemper@deploy1002: Started deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 3 T339347
19:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs[1001-1003].eqiad.wmnet with reason: T331300
19:28 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs[1001-1003].eqiad.wmnet with reason: T331300
19:23 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
19:06 ryankemper: [WDQS] Depooled `wdqs1006` while it catches up on 7 hours of lag
19:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 2 (duration: 11m 34s)
18:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4001.ulsfo.wmnet with OS bullseye
18:54 ryankemper@deploy1002: Started deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 2
18:49 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wcqs,name=eqiad
18:48 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: whitelist new qlever endpoints (duration: 03m 08s)
18:45 ryankemper@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: whitelist new qlever endpoints
18:45 ryankemper@deploy1002: deploy aborted: 0.3.124 (duration: 01m 50s)
18:43 ryankemper@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
18:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
18:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
18:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum4001.ulsfo.wmnet with OS bullseye
18:12 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host durum4001.ulsfo.wmnet with OS bookworm
17:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1001.eqiad.wmnet with OS bullseye
17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
17:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T342617)', diff saved to https://phabricator.wikimedia.org/P50209 and previous config saved to /var/cache/conftool/dbconfig/20230808-175101-ladsgroup.json
17:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
17:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T342617)', diff saved to https://phabricator.wikimedia.org/P50208 and previous config saved to /var/cache/conftool/dbconfig/20230808-175040-ladsgroup.json
17:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1002.eqiad.wmnet with reason: host reimage
17:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1001.eqiad.wmnet with reason: host reimage
17:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1002.eqiad.wmnet with reason: host reimage
17:35 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1001.eqiad.wmnet with reason: host reimage
17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P50207 and previous config saved to /var/cache/conftool/dbconfig/20230808-173534-ladsgroup.json
17:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum4001.ulsfo.wmnet with OS bookworm
17:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1083.eqiad.wmnet with OS bullseye
17:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs1002.eqiad.wmnet with OS bullseye
17:22 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs1001.eqiad.wmnet with OS bullseye
17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P50206 and previous config saved to /var/cache/conftool/dbconfig/20230808-172027-ladsgroup.json
17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T342617)', diff saved to https://phabricator.wikimedia.org/P50205 and previous config saved to /var/cache/conftool/dbconfig/20230808-170521-ladsgroup.json
17:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1083.eqiad.wmnet with reason: host reimage
16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50204 and previous config saved to /var/cache/conftool/dbconfig/20230808-165824-ladsgroup.json
16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T342617)', diff saved to https://phabricator.wikimedia.org/P50203 and previous config saved to /var/cache/conftool/dbconfig/20230808-165803-ladsgroup.json
16:58 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1083.eqiad.wmnet with reason: host reimage
16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P50202 and previous config saved to /var/cache/conftool/dbconfig/20230808-164256-ladsgroup.json
16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P50201 and previous config saved to /var/cache/conftool/dbconfig/20230808-162750-ladsgroup.json
16:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T342617)', diff saved to https://phabricator.wikimedia.org/P50200 and previous config saved to /var/cache/conftool/dbconfig/20230808-161244-ladsgroup.json
15:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1083.eqiad.wmnet with OS bullseye
15:44 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1082.eqiad.wmnet with OS bullseye
15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
15:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
15:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1082.eqiad.wmnet with reason: host reimage
15:19 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1082.eqiad.wmnet with reason: host reimage
15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50197 and previous config saved to /var/cache/conftool/dbconfig/20230808-151637-ladsgroup.json
15:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50196 and previous config saved to /var/cache/conftool/dbconfig/20230808-150131-ladsgroup.json
14:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6001.drmrs.wmnet with OS bookworm
14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50195 and previous config saved to /var/cache/conftool/dbconfig/20230808-144625-ladsgroup.json
14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50194 and previous config saved to /var/cache/conftool/dbconfig/20230808-143119-ladsgroup.json
14:10 _joe_: updated conftool, requestctl on puppetmasters to 2.3.1 to fix bugs with requestctl log
14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50192 and previous config saved to /var/cache/conftool/dbconfig/20230808-140331-ladsgroup.json
14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
14:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1082.eqiad.wmnet with OS bullseye
13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50190 and previous config saved to /var/cache/conftool/dbconfig/20230808-135847-ladsgroup.json
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50189 and previous config saved to /var/cache/conftool/dbconfig/20230808-135636-ladsgroup.json
13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:47 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to old columns of externallinks in ruwikinews (T342683) (duration: 10m 00s)
13:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
13:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
13:41 ladsgroup@deploy1002: ladsgroup: Continuing with sync
13:39 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to old columns of externallinks in ruwikinews (T342683) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:37 ladsgroup@deploy1002: Started scap: Backport for Stop writing to old columns of externallinks in ruwikinews (T342683)
13:36 taavi@deploy1002: Finished scap: Backport for newiki: Fix templateeditor config (T343257) (duration: 09m 49s)
13:36 volans: set platform to null on all devices and VMs in Netbox - T336623
13:29 taavi@deploy1002: taavi and stang: Continuing with sync
13:27 taavi@deploy1002: taavi and stang: Backport for newiki: Fix templateeditor config (T343257) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:26 taavi@deploy1002: Started scap: Backport for newiki: Fix templateeditor config (T343257)
13:21 sukhe: reprepro -C main include bookworm-wikimedia gdnsd_3.99.0~alpha2-2_amd64.changes: T342154
13:19 taavi@deploy1002: Finished scap: Backport for Update piwiki legacy vector logo (T305950), Update idwiktionary old vector logo (T341175) (duration: 10m 48s)
13:18 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
13:18 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
13:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
13:17 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host durum6001.drmrs.wmnet with OS bookworm
13:12 taavi@deploy1002: anzx and taavi: Continuing with sync
13:09 taavi@deploy1002: anzx and taavi: Backport for Update piwiki legacy vector logo (T305950), Update idwiktionary old vector logo (T341175) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:08 taavi@deploy1002: Started scap: Backport for Update piwiki legacy vector logo (T305950), Update idwiktionary old vector logo (T341175)
13:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
13:02 sukhe: reprepro -C main include bookworm-wikimedia anycast-healthchecker_0.9.1-1+wmf12u1_amd64.changes: T342154
12:57 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 46s)
12:57 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
12:40 samtar@deploy1002: Finished scap: Backport for IS: Ensure edit recovery is disabled (T342858) (duration: 08m 18s)
12:34 samtar@deploy1002: samtar: Continuing with sync
12:34 samtar@deploy1002: samtar: Backport for IS: Ensure edit recovery is disabled (T342858) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
12:32 samtar@deploy1002: Started scap: Backport for IS: Ensure edit recovery is disabled (T342858)
12:28 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
12:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
12:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
12:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
12:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
12:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:36 claime: deploying mw-on-k8s - https://gerrit.wikimedia.org/r/945798
10:21 taavi: update T343294 mitigations
10:00 volans: restart ferm on mirror1001 to pick new IP address for debian syncproxy2
09:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
09:52 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
09:44 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
09:43 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T342617)', diff saved to https://phabricator.wikimedia.org/P50188 and previous config saved to /var/cache/conftool/dbconfig/20230808-093835-ladsgroup.json
09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T342617)', diff saved to https://phabricator.wikimedia.org/P50187 and previous config saved to /var/cache/conftool/dbconfig/20230808-093814-ladsgroup.json
09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P50186 and previous config saved to /var/cache/conftool/dbconfig/20230808-092308-ladsgroup.json
09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T342617)', diff saved to https://phabricator.wikimedia.org/P50185 and previous config saved to /var/cache/conftool/dbconfig/20230808-091119-ladsgroup.json
09:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T342617)', diff saved to https://phabricator.wikimedia.org/P50184 and previous config saved to /var/cache/conftool/dbconfig/20230808-091058-ladsgroup.json
09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P50183 and previous config saved to /var/cache/conftool/dbconfig/20230808-090801-ladsgroup.json
08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P50182 and previous config saved to /var/cache/conftool/dbconfig/20230808-085551-ladsgroup.json
08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T342617)', diff saved to https://phabricator.wikimedia.org/P50181 and previous config saved to /var/cache/conftool/dbconfig/20230808-085255-ladsgroup.json
08:45 jynus: restart debmonitor2003 services
08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P50180 and previous config saved to /var/cache/conftool/dbconfig/20230808-084045-ladsgroup.json
08:33 elukey: powercycle ml-serve2004 - mgmt console without tty available, DIMM errors in getsel
08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T342617)', diff saved to https://phabricator.wikimedia.org/P50179 and previous config saved to /var/cache/conftool/dbconfig/20230808-082539-ladsgroup.json
07:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:07 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
07:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
07:06 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
07:06 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T342617)', diff saved to https://phabricator.wikimedia.org/P50178 and previous config saved to /var/cache/conftool/dbconfig/20230808-022547-ladsgroup.json
02:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
02:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T342617)', diff saved to https://phabricator.wikimedia.org/P50177 and previous config saved to /var/cache/conftool/dbconfig/20230808-022526-ladsgroup.json
02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P50176 and previous config saved to /var/cache/conftool/dbconfig/20230808-021020-ladsgroup.json
01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P50175 and previous config saved to /var/cache/conftool/dbconfig/20230808-015513-ladsgroup.json
01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T342617)', diff saved to https://phabricator.wikimedia.org/P50174 and previous config saved to /var/cache/conftool/dbconfig/20230808-014007-ladsgroup.json
00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T342617)', diff saved to https://phabricator.wikimedia.org/P50173 and previous config saved to /var/cache/conftool/dbconfig/20230808-005439-ladsgroup.json
00:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
00:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T342617)', diff saved to https://phabricator.wikimedia.org/P50172 and previous config saved to /var/cache/conftool/dbconfig/20230808-005418-ladsgroup.json
00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P50171 and previous config saved to /var/cache/conftool/dbconfig/20230808-003911-ladsgroup.json
00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P50170 and previous config saved to /var/cache/conftool/dbconfig/20230808-002405-ladsgroup.json
00:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T342617)', diff saved to https://phabricator.wikimedia.org/P50169 and previous config saved to /var/cache/conftool/dbconfig/20230808-000859-ladsgroup.json

2023-08-07

23:28 krinkle@deploy1002: Finished scap: Backport for api: Fix broken /api/index.html rendering (T113114) (duration: 09m 00s)
23:23 krinkle@deploy1002: krinkle: Continuing with sync
23:21 krinkle@deploy1002: krinkle: Backport for api: Fix broken /api/index.html rendering (T113114) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
23:19 krinkle@deploy1002: Started scap: Backport for api: Fix broken /api/index.html rendering (T113114)
22:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1015.eqiad.wmnet
22:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1015.eqiad.wmnet
22:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1014.eqiad.wmnet
22:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1014.eqiad.wmnet
22:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1013.eqiad.wmnet
22:38 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1013.eqiad.wmnet
22:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1012.eqiad.wmnet
22:30 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1012.eqiad.wmnet
22:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1011.eqiad.wmnet
22:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1011.eqiad.wmnet
22:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1010.eqiad.wmnet
22:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1010.eqiad.wmnet
22:04 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wcqs2003.codfw.wmnet
21:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
21:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1081.eqiad.wmnet with OS bullseye
21:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1081.eqiad.wmnet with reason: host reimage
21:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1081.eqiad.wmnet with reason: host reimage
21:05 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1081.eqiad.wmnet with OS bullseye
21:03 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs2003.codfw.wmnet with OS bullseye
21:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1080.eqiad.wmnet with OS bullseye
20:53 urbanecm@deploy1002: Finished scap: Backport for unset orwikisource logo and resize pawikisource logo (T341255) (duration: 08m 09s)
20:47 urbanecm@deploy1002: jdlrobson and urbanecm: Continuing with sync
20:46 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for unset orwikisource logo and resize pawikisource logo (T341255) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:45 urbanecm@deploy1002: Started scap: Backport for unset orwikisource logo and resize pawikisource logo (T341255)
20:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1080.eqiad.wmnet with reason: host reimage
20:38 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1080.eqiad.wmnet with reason: host reimage
20:24 urbanecm: mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki=enwiki --current --all --start '["18618299"]' # T315510
20:24 urbanecm@deploy1002: Finished scap: Backport for ThreadItemStore: Ignore duplicates caused by duplicate executions (T323080 T341811), Update wikisource wordmarks and taglines (T341255), update idwiktionary legacy vector logo (T341175) (duration: 10m 22s)
20:21 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1080.eqiad.wmnet with OS bullseye
20:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs2003.codfw.wmnet with reason: host reimage
20:18 urbanecm@deploy1002: urbanecm and jdlrobson and anzx and matmarex: Continuing with sync
20:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2003.codfw.wmnet with reason: host reimage
20:15 urbanecm@deploy1002: urbanecm and jdlrobson and anzx and matmarex: Backport for ThreadItemStore: Ignore duplicates caused by duplicate executions (T323080 T341811), Update wikisource wordmarks and taglines (T341255), update idwiktionary legacy vector logo (T341175) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet,
20:14 urbanecm@deploy1002: Started scap: Backport for ThreadItemStore: Ignore duplicates caused by duplicate executions (T323080 T341811), Update wikisource wordmarks and taglines (T341255), update idwiktionary legacy vector logo (T341175)
20:13 urbanecm@deploy1002: Finished scap: Backport for Fix finnish projects, remove unused SVG/PNGs, resize wikiversity (T343278), Wikivoyage logos should always be on a single line (T343279) (duration: 11m 18s)
20:08 urbanecm@deploy1002: jdlrobson and urbanecm: Continuing with sync
20:04 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Fix finnish projects, remove unused SVG/PNGs, resize wikiversity (T343278), Wikivoyage logos should always be on a single line (T343279) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimen
20:02 urbanecm@deploy1002: Started scap: Backport for Fix finnish projects, remove unused SVG/PNGs, resize wikiversity (T343278), Wikivoyage logos should always be on a single line (T343279)
20:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs2003.codfw.wmnet with OS bullseye
19:18 cstone: payments-wiki upgraded from 32fe72a9 to 5b250aed
19:15 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:15 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
19:14 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
19:12 jgreen@cumin1001: START - Cookbook sre.dns.netbox
19:12 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:12 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
19:11 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
19:09 jgreen@cumin1001: START - Cookbook sre.dns.netbox
18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T342617)', diff saved to https://phabricator.wikimedia.org/P50168 and previous config saved to /var/cache/conftool/dbconfig/20230807-185732-ladsgroup.json
18:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T342617)', diff saved to https://phabricator.wikimedia.org/P50167 and previous config saved to /var/cache/conftool/dbconfig/20230807-185710-ladsgroup.json
18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P50166 and previous config saved to /var/cache/conftool/dbconfig/20230807-184204-ladsgroup.json
18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P50165 and previous config saved to /var/cache/conftool/dbconfig/20230807-182657-ladsgroup.json
18:21 krinkle@deploy1002: Finished scap: Backport for mc: Remove mcrouter-with-onhost-tier from ParserCache (T264604) (duration: 09m 07s)
18:16 krinkle@deploy1002: krinkle: Continuing with sync
18:14 krinkle@deploy1002: krinkle: Backport for mc: Remove mcrouter-with-onhost-tier from ParserCache (T264604) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
18:12 krinkle@deploy1002: Started scap: Backport for mc: Remove mcrouter-with-onhost-tier from ParserCache (T264604)
18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T342617)', diff saved to https://phabricator.wikimedia.org/P50164 and previous config saved to /var/cache/conftool/dbconfig/20230807-181151-ladsgroup.json
17:59 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:59 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
17:58 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
17:56 jgreen@cumin1001: START - Cookbook sre.dns.netbox
17:55 jgreen@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
17:54 jgreen@cumin1001: START - Cookbook sre.dns.netbox
17:47 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:46 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
17:46 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
17:42 jgreen@cumin1001: START - Cookbook sre.dns.netbox
17:36 jgreen@cumin1001: START - Cookbook sre.dns.netbox
17:34 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:34 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frdev1001 from DNS for decommissioning - jgreen@cumin1001"
17:33 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frdev1001 from DNS for decommissioning - jgreen@cumin1001"
17:31 jgreen@cumin1001: START - Cookbook sre.dns.netbox
17:22 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:22 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: civi1001.frack.eqiad.wmnet - jgreen@cumin1001"
17:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1079.eqiad.wmnet with OS bullseye
17:22 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: civi1001.frack.eqiad.wmnet - jgreen@cumin1001"
17:19 jgreen@cumin1001: START - Cookbook sre.dns.netbox
17:02 inflatador: bking@puppetmaster1001 removing unused(?) puppet cert search.svc.eqiad.wmnet T343319
16:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1079.eqiad.wmnet with reason: host reimage
16:56 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1079.eqiad.wmnet with reason: host reimage
16:47 inflatador: bking@puppetmaster1001 removing unused(?) puppet cert search.svc.codfw.wmnet T343319
16:40 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1079.eqiad.wmnet with OS bullseye
16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T342617)', diff saved to https://phabricator.wikimedia.org/P50163 and previous config saved to /var/cache/conftool/dbconfig/20230807-163421-ladsgroup.json
16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
16:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1078.eqiad.wmnet with OS bullseye
16:18 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Allow logged-in users to edit object labels, aliases, and descriptions (T343400) (duration: 07m 11s)
16:13 jforrester@deploy1002: jforrester: Continuing with sync
16:13 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Allow logged-in users to edit object labels, aliases, and descriptions (T343400) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
16:11 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Allow logged-in users to edit object labels, aliases, and descriptions (T343400)
15:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1078.eqiad.wmnet with reason: host reimage
15:55 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1078.eqiad.wmnet with reason: host reimage
15:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
15:50 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1078.eqiad.wmnet with OS bullseye
15:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
15:41 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
15:35 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
15:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
15:35 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
15:35 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1078.eqiad.wmnet with OS bullseye
15:34 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
15:34 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
14:36 zabe@deploy1002: Finished scap: T343294 (duration: 07m 13s)
14:29 zabe@deploy1002: Started scap: T343294
14:14 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
14:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-worker1078.eqiad.wmnet
14:10 btullis@cumin1001: START - Cookbook sre.hosts.dhcp for host an-worker1078.eqiad.wmnet
14:08 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1078.eqiad.wmnet']
14:08 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1078.eqiad.wmnet']
14:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1078.eqiad.wmnet with OS bullseye
14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
13:59 elukey@deploy1002: Finished scap: Backport for ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308) (duration: 06m 49s)
13:58 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
13:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
13:53 elukey@deploy1002: elukey: Continuing with sync
13:53 elukey@deploy1002: elukey: Backport for ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:52 elukey@deploy1002: Started scap: Backport for ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308)
13:51 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php idwiktionary --fix --add-prefix=BROKEN # T341175
13:51 urbanecm@deploy1002: Finished scap: Backport for idwiktionary change wgSiteName, wgMetaNamespace and add project namespace alias (T341175) (duration: 09m 12s)
13:45 urbanecm@deploy1002: urbanecm and anzx: Continuing with sync
13:43 urbanecm@deploy1002: urbanecm and anzx: Backport for idwiktionary change wgSiteName, wgMetaNamespace and add project namespace alias (T341175) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:41 urbanecm@deploy1002: Started scap: Backport for idwiktionary change wgSiteName, wgMetaNamespace and add project namespace alias (T341175)
13:26 urbanecm@deploy1002: Finished scap: Backport for Revert "enwiki: temp enable emergencyCaptcha" (duration: 06m 59s)
13:19 urbanecm@deploy1002: Started scap: Backport for Revert "enwiki: temp enable emergencyCaptcha"
13:19 urbanecm@deploy1002: Finished scap: Backport for Update knwiktionary logos (T343662), Write new for event table migration on all wikis (T330158), zhwiki: Grant "suppressredirect"to autoreviewer (T343711) (duration: 13m 54s)
13:13 urbanecm@deploy1002: anzx and dreamyjazz and stang and urbanecm: Continuing with sync
13:06 urbanecm@deploy1002: anzx and dreamyjazz and stang and urbanecm: Backport for Update knwiktionary logos (T343662), Write new for event table migration on all wikis (T330158), zhwiki: Grant "suppressredirect"to autoreviewer (T343711) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-d
13:05 urbanecm@deploy1002: Started scap: Backport for Update knwiktionary logos (T343662), Write new for event table migration on all wikis (T330158), zhwiki: Grant "suppressredirect"to autoreviewer (T343711)
12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
12:17 dcausse: repooling wdqs1004
11:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
11:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:53 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old externallinks columns in testwiki (T342683) (duration: 08m 06s)
10:48 ladsgroup@deploy1002: ladsgroup: Continuing with sync
10:47 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old externallinks columns in testwiki (T342683) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
10:45 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old externallinks columns in testwiki (T342683)
10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T342617)', diff saved to https://phabricator.wikimedia.org/P50158 and previous config saved to /var/cache/conftool/dbconfig/20230807-100805-ladsgroup.json
10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
09:23 dcausse: restarting blazegraph on wdqs1004
08:31 elukey@deploy1002: Finished scap: Backport for ext-ORES: force cswiki to use the ORES settings/backend (T343308) (duration: 14m 50s)
08:25 elukey@deploy1002: elukey: Continuing with sync
08:24 elukey@deploy1002: elukey: Backport for ext-ORES: force cswiki to use the ORES settings/backend (T343308) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 100%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50157 and previous config saved to /var/cache/conftool/dbconfig/20230807-081639-root.json
08:16 elukey@deploy1002: Started scap: Backport for ext-ORES: force cswiki to use the ORES settings/backend (T343308)
08:08 godog: start docker-image-prune-old on alert hosts - T329939
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 75%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50156 and previous config saved to /var/cache/conftool/dbconfig/20230807-080133-root.json
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 50%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50155 and previous config saved to /var/cache/conftool/dbconfig/20230807-074628-root.json
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 25%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50154 and previous config saved to /var/cache/conftool/dbconfig/20230807-073123-root.json
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 10%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50153 and previous config saved to /var/cache/conftool/dbconfig/20230807-071618-root.json
07:11 marostegui: Depool clouddb1015 T334650
07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 5%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50152 and previous config saved to /var/cache/conftool/dbconfig/20230807-070113-root.json
06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 3%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50151 and previous config saved to /var/cache/conftool/dbconfig/20230807-064608-root.json
06:33 kart_: Updated cxserver to 2023-08-03-132800-production (T338602, T333969, T343211)
06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 1%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50150 and previous config saved to /var/cache/conftool/dbconfig/20230807-063104-root.json
06:28 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:25 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:22 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:22 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1224 upgrade to mariadb 10.6', diff saved to https://phabricator.wikimedia.org/P50149 and previous config saved to /var/cache/conftool/dbconfig/20230807-061653-root.json
06:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wheels for Aerleon 1.6.0 upgrade - ayounsi@cumin1001
06:09 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wheels for Aerleon 1.6.0 upgrade - ayounsi@cumin1001

2023-08-05

05:57 _joe_: mounting the volume under /srv/dataimport on both puppetmaster frontends
05:53 _joe_: creating logical volume "dataimport" on the puppetmaster frontends
02:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
02:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
01:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
01:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T342617)', diff saved to https://phabricator.wikimedia.org/P50148 and previous config saved to /var/cache/conftool/dbconfig/20230805-013831-ladsgroup.json
01:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P50147 and previous config saved to /var/cache/conftool/dbconfig/20230805-012325-ladsgroup.json
01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P50146 and previous config saved to /var/cache/conftool/dbconfig/20230805-010819-ladsgroup.json
00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T342617)', diff saved to https://phabricator.wikimedia.org/P50145 and previous config saved to /var/cache/conftool/dbconfig/20230805-005312-ladsgroup.json
00:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T342617)', diff saved to https://phabricator.wikimedia.org/P50144 and previous config saved to /var/cache/conftool/dbconfig/20230805-003155-ladsgroup.json
00:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P50143 and previous config saved to /var/cache/conftool/dbconfig/20230805-001649-ladsgroup.json
00:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P50142 and previous config saved to /var/cache/conftool/dbconfig/20230805-000143-ladsgroup.json

2023-08-04

23:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T342617)', diff saved to https://phabricator.wikimedia.org/P50141 and previous config saved to /var/cache/conftool/dbconfig/20230804-234637-ladsgroup.json
23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T342617)', diff saved to https://phabricator.wikimedia.org/P50140 and previous config saved to /var/cache/conftool/dbconfig/20230804-234121-ladsgroup.json
23:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
23:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T342617)', diff saved to https://phabricator.wikimedia.org/P50139 and previous config saved to /var/cache/conftool/dbconfig/20230804-234101-ladsgroup.json
23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P50138 and previous config saved to /var/cache/conftool/dbconfig/20230804-232555-ladsgroup.json
23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P50137 and previous config saved to /var/cache/conftool/dbconfig/20230804-231048-ladsgroup.json
23:00 tzatziki: removing 1 file for legal compliance
22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T342617)', diff saved to https://phabricator.wikimedia.org/P50136 and previous config saved to /var/cache/conftool/dbconfig/20230804-225542-ladsgroup.json
22:33 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 54s)
22:32 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T342617)', diff saved to https://phabricator.wikimedia.org/P50135 and previous config saved to /var/cache/conftool/dbconfig/20230804-222905-ladsgroup.json
22:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
22:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50134 and previous config saved to /var/cache/conftool/dbconfig/20230804-222845-ladsgroup.json
22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T342617)', diff saved to https://phabricator.wikimedia.org/P50133 and previous config saved to /var/cache/conftool/dbconfig/20230804-221915-ladsgroup.json
22:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
22:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T342617)', diff saved to https://phabricator.wikimedia.org/P50132 and previous config saved to /var/cache/conftool/dbconfig/20230804-221855-ladsgroup.json
22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P50131 and previous config saved to /var/cache/conftool/dbconfig/20230804-221338-ladsgroup.json
22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P50130 and previous config saved to /var/cache/conftool/dbconfig/20230804-220348-ladsgroup.json
21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P50129 and previous config saved to /var/cache/conftool/dbconfig/20230804-215832-ladsgroup.json
21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P50128 and previous config saved to /var/cache/conftool/dbconfig/20230804-214842-ladsgroup.json
21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50127 and previous config saved to /var/cache/conftool/dbconfig/20230804-214326-ladsgroup.json
21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T342617)', diff saved to https://phabricator.wikimedia.org/P50126 and previous config saved to /var/cache/conftool/dbconfig/20230804-213336-ladsgroup.json
21:20 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 44s)
21:19 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
21:16 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 09s)
21:16 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
21:16 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 15s)
21:15 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T342617)', diff saved to https://phabricator.wikimedia.org/P50125 and previous config saved to /var/cache/conftool/dbconfig/20230804-205647-ladsgroup.json
20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50124 and previous config saved to /var/cache/conftool/dbconfig/20230804-205626-ladsgroup.json
20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P50123 and previous config saved to /var/cache/conftool/dbconfig/20230804-204120-ladsgroup.json
20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50122 and previous config saved to /var/cache/conftool/dbconfig/20230804-203351-ladsgroup.json
20:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
20:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50121 and previous config saved to /var/cache/conftool/dbconfig/20230804-203330-ladsgroup.json
20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P50120 and previous config saved to /var/cache/conftool/dbconfig/20230804-202613-ladsgroup.json
20:21 brett: imported libvmod-querysort package in bookworm-wikimedia (T342154)
20:18 jforrester@deploy1002: Finished scap: Backport for ApiFunctionCall: Check calls for Z16K2 and deny those too (duration: 34m 04s)
20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P50119 and previous config saved to /var/cache/conftool/dbconfig/20230804-201824-ladsgroup.json
20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50118 and previous config saved to /var/cache/conftool/dbconfig/20230804-201107-ladsgroup.json
20:08 jforrester@deploy1002: jforrester: Continuing with sync
20:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
20:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P50116 and previous config saved to /var/cache/conftool/dbconfig/20230804-200317-ladsgroup.json
20:02 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
19:58 jforrester@deploy1002: jforrester: Backport for ApiFunctionCall: Check calls for Z16K2 and deny those too synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50115 and previous config saved to /var/cache/conftool/dbconfig/20230804-194811-ladsgroup.json
19:44 jforrester@deploy1002: Started scap: Backport for ApiFunctionCall: Check calls for Z16K2 and deny those too
19:17 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
19:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
19:11 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs2002.codfw.wmnet with OS bullseye
19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50114 and previous config saved to /var/cache/conftool/dbconfig/20230804-190152-ladsgroup.json
19:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
19:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50113 and previous config saved to /var/cache/conftool/dbconfig/20230804-190131-ladsgroup.json
18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P50112 and previous config saved to /var/cache/conftool/dbconfig/20230804-184625-ladsgroup.json
18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50111 and previous config saved to /var/cache/conftool/dbconfig/20230804-183927-ladsgroup.json
18:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50110 and previous config saved to /var/cache/conftool/dbconfig/20230804-183906-ladsgroup.json
18:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs2002.codfw.wmnet with reason: host reimage
18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P50109 and previous config saved to /var/cache/conftool/dbconfig/20230804-183118-ladsgroup.json
18:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2002.codfw.wmnet with reason: host reimage
18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P50108 and previous config saved to /var/cache/conftool/dbconfig/20230804-182400-ladsgroup.json
18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50107 and previous config saved to /var/cache/conftool/dbconfig/20230804-181612-ladsgroup.json
18:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs2002.codfw.wmnet with OS bullseye
18:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wcqs2001.codfw.wmnet with reason: T323921
18:13 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wcqs2001.codfw.wmnet with reason: T323921
18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P50106 and previous config saved to /var/cache/conftool/dbconfig/20230804-180854-ladsgroup.json
18:08 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50105 and previous config saved to /var/cache/conftool/dbconfig/20230804-175348-ladsgroup.json
17:27 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
17:24 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs2001.codfw.wmnet with OS bullseye
16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50104 and previous config saved to /var/cache/conftool/dbconfig/20230804-165753-ladsgroup.json
16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T342617)', diff saved to https://phabricator.wikimedia.org/P50103 and previous config saved to /var/cache/conftool/dbconfig/20230804-165731-ladsgroup.json
16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50102 and previous config saved to /var/cache/conftool/dbconfig/20230804-164356-ladsgroup.json
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T342617)', diff saved to https://phabricator.wikimedia.org/P50101 and previous config saved to /var/cache/conftool/dbconfig/20230804-164335-ladsgroup.json
16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P50100 and previous config saved to /var/cache/conftool/dbconfig/20230804-164225-ladsgroup.json
16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P50099 and previous config saved to /var/cache/conftool/dbconfig/20230804-162829-ladsgroup.json
16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P50098 and previous config saved to /var/cache/conftool/dbconfig/20230804-162719-ladsgroup.json
16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P50097 and previous config saved to /var/cache/conftool/dbconfig/20230804-161322-ladsgroup.json
16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T342617)', diff saved to https://phabricator.wikimedia.org/P50096 and previous config saved to /var/cache/conftool/dbconfig/20230804-161212-ladsgroup.json
15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T342617)', diff saved to https://phabricator.wikimedia.org/P50095 and previous config saved to /var/cache/conftool/dbconfig/20230804-155816-ladsgroup.json
15:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs2001.codfw.wmnet with reason: host reimage
15:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2001.codfw.wmnet with reason: host reimage
15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T342617)', diff saved to https://phabricator.wikimedia.org/P50094 and previous config saved to /var/cache/conftool/dbconfig/20230804-151435-ladsgroup.json
15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T342617)', diff saved to https://phabricator.wikimedia.org/P50093 and previous config saved to /var/cache/conftool/dbconfig/20230804-151409-ladsgroup.json
15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T342617)', diff saved to https://phabricator.wikimedia.org/P50092 and previous config saved to /var/cache/conftool/dbconfig/20230804-150310-ladsgroup.json
15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50091 and previous config saved to /var/cache/conftool/dbconfig/20230804-150232-ladsgroup.json
15:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs2001.codfw.wmnet with OS bullseye
14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P50090 and previous config saved to /var/cache/conftool/dbconfig/20230804-145903-ladsgroup.json
14:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2195.codfw.wmnet with OS bullseye
14:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P50089 and previous config saved to /var/cache/conftool/dbconfig/20230804-144726-ladsgroup.json
14:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P50088 and previous config saved to /var/cache/conftool/dbconfig/20230804-144357-ladsgroup.json
14:40 sbassett: Deployed updated mitigation for T336027
14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P50087 and previous config saved to /var/cache/conftool/dbconfig/20230804-143219-ladsgroup.json
14:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2190.codfw.wmnet with OS bullseye
14:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T342617)', diff saved to https://phabricator.wikimedia.org/P50086 and previous config saved to /var/cache/conftool/dbconfig/20230804-142851-ladsgroup.json
14:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
14:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
14:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4005.wikimedia.org
14:25 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync Hiera after adding bast4005 - jmm@cumin2002"
14:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
14:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2193.codfw.wmnet with OS bullseye
14:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync Hiera after adding bast4005 - jmm@cumin2002"
14:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4005.wikimedia.org
14:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
14:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
14:18 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50085 and previous config saved to /var/cache/conftool/dbconfig/20230804-141713-ladsgroup.json
14:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast4005.wikimedia.org
14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4005.wikimedia.org with OS bookworm
14:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2190.codfw.wmnet with reason: host reimage
14:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2193.codfw.wmnet with reason: host reimage
14:08 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2190.codfw.wmnet with reason: host reimage
14:07 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
14:07 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
14:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2193.codfw.wmnet with reason: host reimage
14:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2195.codfw.wmnet with OS bullseye
14:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4005.wikimedia.org with reason: host reimage
14:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2195']
13:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4005.wikimedia.org with reason: host reimage
13:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2195']
13:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2195.mgmt.codfw.wmnet with reboot policy FORCED
13:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
13:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2190.codfw.wmnet with OS bullseye
13:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
13:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2193.codfw.wmnet with OS bullseye
13:39 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4005.wikimedia.org with OS bookworm
13:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2195.mgmt.codfw.wmnet with reboot policy FORCED
13:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast4005.wikimedia.org - jmm@cumin2002"
13:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast4005.wikimedia.org - jmm@cumin2002"
13:12 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast4005.wikimedia.org on all recursors
13:12 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast4005.wikimedia.org on all recursors
13:12 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4005.wikimedia.org - jmm@cumin2002"
13:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4005.wikimedia.org - jmm@cumin2002"
13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T342617)', diff saved to https://phabricator.wikimedia.org/P50084 and previous config saved to /var/cache/conftool/dbconfig/20230804-130622-ladsgroup.json
13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T342617)', diff saved to https://phabricator.wikimedia.org/P50083 and previous config saved to /var/cache/conftool/dbconfig/20230804-130601-ladsgroup.json
13:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50082 and previous config saved to /var/cache/conftool/dbconfig/20230804-130142-ladsgroup.json
13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
13:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast4005.wikimedia.org
13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast3007.wikimedia.org
12:59 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
12:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
12:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
12:57 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P50081 and previous config saved to /var/cache/conftool/dbconfig/20230804-125055-ladsgroup.json
12:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast3007.wikimedia.org
12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3007.wikimedia.org
12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P50080 and previous config saved to /var/cache/conftool/dbconfig/20230804-123548-ladsgroup.json
12:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3007.wikimedia.org
12:32 godog: bounce prometheus@k8s on prometheus100[56] to test failure to reload certs
12:25 jforrester@deploy1002: Synchronized php-1.41.0-wmf.20/extensions/WikiLambda: T343380 and T343400 (duration: 10m 12s)
12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T342617)', diff saved to https://phabricator.wikimedia.org/P50079 and previous config saved to /var/cache/conftool/dbconfig/20230804-122042-ladsgroup.json
12:16 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
12:14 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
12:14 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
12:13 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast3007.wikimedia.org
12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast3007.wikimedia.org with OS bookworm
12:05 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
12:04 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
11:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
11:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50077 and previous config saved to /var/cache/conftool/dbconfig/20230804-115224-ladsgroup.json
11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast3007.wikimedia.org with reason: host reimage
11:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast3007.wikimedia.org with reason: host reimage
11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T342617)', diff saved to https://phabricator.wikimedia.org/P50076 and previous config saved to /var/cache/conftool/dbconfig/20230804-113848-ladsgroup.json
11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P50075 and previous config saved to /var/cache/conftool/dbconfig/20230804-113718-ladsgroup.json
11:30 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
11:30 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P50074 and previous config saved to /var/cache/conftool/dbconfig/20230804-112212-ladsgroup.json
11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50073 and previous config saved to /var/cache/conftool/dbconfig/20230804-110705-ladsgroup.json
11:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast3007.wikimedia.org with OS bookworm
10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast3007.wikimedia.org - jmm@cumin2002"
10:38 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast3007.wikimedia.org - jmm@cumin2002"
10:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast3007.wikimedia.org on all recursors
10:38 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast3007.wikimedia.org on all recursors
10:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3007.wikimedia.org - jmm@cumin2002"
10:37 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3007.wikimedia.org - jmm@cumin2002"
10:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:33 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast3007.wikimedia.org
10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
10:27 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50072 and previous config saved to /var/cache/conftool/dbconfig/20230804-102347-ladsgroup.json
10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
10:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
10:15 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
08:00 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1026.eqiad.wmnet with OS bullseye
07:51 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 398203
07:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 398203
07:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 139901
07:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 139901
07:37 moritzm: installing Django security updates
07:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1026.eqiad.wmnet with reason: host reimage
07:31 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1026.eqiad.wmnet with reason: host reimage
07:19 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1026.eqiad.wmnet with OS bullseye
03:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2194.codfw.wmnet with OS bullseye
03:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
03:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
03:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2192.codfw.wmnet with OS bullseye
03:03 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
03:00 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
02:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
02:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
02:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2192.codfw.wmnet with reason: host reimage
02:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2192.codfw.wmnet with reason: host reimage
02:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2194.codfw.wmnet with OS bullseye
02:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2194']
02:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2194']
02:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2194']
02:26 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db2193.codfw.wmnet with OS bullseye
02:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2193.codfw.wmnet with OS bullseye
02:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2192.codfw.wmnet with OS bullseye
02:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2194']
02:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2193']
01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2194.mgmt.codfw.wmnet with reboot policy FORCED
00:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2192']
00:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2193']
00:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2193.mgmt.codfw.wmnet with reboot policy FORCED
00:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2192']
00:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2192']
00:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2192']
00:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2192.mgmt.codfw.wmnet with reboot policy FORCED
00:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2190.codfw.wmnet with OS bullseye
00:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
00:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2194.mgmt.codfw.wmnet with reboot policy FORCED
00:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2190.codfw.wmnet with OS bullseye
00:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2193.mgmt.codfw.wmnet with reboot policy FORCED
00:26 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2192.mgmt.codfw.wmnet with reboot policy FORCED
00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2191.codfw.wmnet with OS bullseye
00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:24 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2189.codfw.wmnet with OS bullseye
00:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:15 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2188.codfw.wmnet with OS bullseye
00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2191.codfw.wmnet with reason: host reimage
00:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2191.codfw.wmnet with reason: host reimage

2023-08-03

23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2189.codfw.wmnet with reason: host reimage
23:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2189.codfw.wmnet with reason: host reimage
23:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2188.codfw.wmnet with reason: host reimage
23:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2188.codfw.wmnet with reason: host reimage
23:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2191.codfw.wmnet with OS bullseye
23:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2190']
23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2191']
23:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2189.codfw.wmnet with OS bullseye
23:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2191']
23:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2190']
23:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2188.codfw.wmnet with OS bullseye
23:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2190.mgmt.codfw.wmnet with reboot policy FORCED
23:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2191.mgmt.codfw.wmnet with reboot policy FORCED
23:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2189']
23:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2189']
23:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2188']
23:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2188']
23:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2188']
23:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2189']
23:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2189']
23:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2188']
22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2188.mgmt.codfw.wmnet with reboot policy FORCED
22:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2189.mgmt.codfw.wmnet with reboot policy FORCED
22:39 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2191.mgmt.codfw.wmnet with reboot policy FORCED
22:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2190.mgmt.codfw.wmnet with reboot policy FORCED
22:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2189.mgmt.codfw.wmnet with reboot policy FORCED
22:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2188.mgmt.codfw.wmnet with reboot policy FORCED
22:19 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:19 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for db2188-db2195 - pt1979@cumin2002"
22:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for db2188-db2195 - pt1979@cumin2002"
22:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox
20:59 jforrester@deploy1002: Synchronized php-1.41.0-wmf.20/extensions/WikiLambda/: T343402 and T343380 (duration: 07m 50s)
20:56 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
20:55 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
20:55 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
20:54 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
20:52 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:51 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:49 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:49 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:49 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:49 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:39 thcipriani: end UTC late backport
20:36 thcipriani@deploy1002: Finished scap: Backport for pawikisource: add audiobook namespace alias (T343410) (duration: 10m 39s)
20:30 thcipriani@deploy1002: anzx and thcipriani: Continuing with sync
20:27 thcipriani@deploy1002: anzx and thcipriani: Backport for pawikisource: add audiobook namespace alias (T343410) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:26 thcipriani@deploy1002: Started scap: Backport for pawikisource: add audiobook namespace alias (T343410)
20:23 thcipriani@deploy1002: Finished scap: Backport for Write new on group1 except wikidatawiki for event table migration (T330158) (duration: 15m 54s)
20:17 thcipriani@deploy1002: dreamyjazz and thcipriani: Continuing with sync
20:09 thcipriani@deploy1002: dreamyjazz and thcipriani: Backport for Write new on group1 except wikidatawiki for event table migration (T330158) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:07 thcipriani@deploy1002: Started scap: Backport for Write new on group1 except wikidatawiki for event table migration (T330158)
20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lists2001.codfw.wmnet with OS bookworm
20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:53 dancy: dancy@deploy1002 rebuilt and synchronized wikiversions files group2 wikis to 1.41.0-wmf.20 refs T340248
19:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists2001.codfw.wmnet with reason: host reimage
19:35 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists2001.codfw.wmnet with reason: host reimage
19:31 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
19:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
19:26 dancy@deploy1002: Finished scap: Backport for Fix mobile search text overlapping (T343397) (duration: 09m 33s)
19:20 dancy@deploy1002: jdlrobson and dancy: Continuing with sync
19:20 dancy@deploy1002: jdlrobson and dancy: Backport for Fix mobile search text overlapping (T343397) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
19:16 dancy@deploy1002: Started scap: Backport for Fix mobile search text overlapping (T343397)
19:12 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
19:12 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
19:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lists2001.codfw.wmnet with OS bookworm
17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2002.codfw.wmnet with OS bookworm
17:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:17 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:12 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:11 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
16:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1025.eqiad.wmnet with OS bullseye
16:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1001"
16:40 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1001"
16:28 jforrester@deploy1002: Finished scap: Backport for Fix unsafe validator to not reach into undefined keys (T343393) (duration: 10m 57s)
16:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:22 jforrester@deploy1002: jforrester: Continuing with sync
16:19 jforrester@deploy1002: jforrester: Backport for Fix unsafe validator to not reach into undefined keys (T343393) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
16:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1025.eqiad.wmnet with reason: host reimage
16:18 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
16:18 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
16:17 jforrester@deploy1002: Started scap: Backport for Fix unsafe validator to not reach into undefined keys (T343393)
16:15 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1025.eqiad.wmnet with reason: host reimage
16:14 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Rename kubernetes10[25-26] - cgoubert@cumin1001 - T343306"
16:13 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Rename kubernetes10[25-26] - cgoubert@cumin1001 - T343306"
16:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
16:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
16:02 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
16:01 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
15:47 moritzm: installing pandoc security updates
15:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
15:40 fabfur: imported `varnishkafka` package in bookworm-wikimedia (T342154)
15:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
15:30 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
15:24 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
15:23 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
15:23 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
15:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
15:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
15:22 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
15:22 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
15:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
15:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
15:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
15:20 moritzm: installing glibc security updates on bookworm
15:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
15:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
15:19 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
15:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
15:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
15:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
15:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
15:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
15:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:11 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
15:11 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
15:11 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
15:10 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
15:10 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
15:10 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
15:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
15:09 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
15:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
15:05 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1025.eqiad.wmnet with OS bullseye
15:02 claime: Run homer on lsw1-f3-eqiad for kubernetes102[5-6] imaging - T343306
14:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2001.codfw.wmnet with OS bookworm
14:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:22 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
14:22 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
14:21 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
14:21 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
14:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
14:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
13:58 jforrester@deploy1002: Finished scap: Backport for [Wikifunctions] Allow logged-in users to make function calls again (duration: 08m 24s)
13:51 jforrester@deploy1002: jforrester: Continuing with sync
13:51 jforrester@deploy1002: jforrester: Backport for [Wikifunctions] Allow logged-in users to make function calls again synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:49 jforrester@deploy1002: Started scap: Backport for [Wikifunctions] Allow logged-in users to make function calls again
13:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
13:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
13:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
13:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
13:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2001']
13:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2001']
13:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host titan2001.codfw.wmnet with OS bookworm
13:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
13:26 taavi: taavi@mwmaint1002 ~ $ mwscript namespaceDupes.php pawikisource --fix --add-prefix "BROKEN " # T343410
13:23 taavi@deploy1002: Finished scap: Backport for pawikisource: create audiobook namespace (T343410) (duration: 13m 01s)
13:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2001']
13:17 taavi@deploy1002: taavi and anzx: Continuing with sync
13:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
13:12 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2001']
13:12 taavi@deploy1002: taavi and anzx: Backport for pawikisource: create audiobook namespace (T343410) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:12 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
13:12 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
13:10 taavi@deploy1002: Started scap: Backport for pawikisource: create audiobook namespace (T343410)
12:41 jforrester@deploy1002: Finished scap: Backport for WikiLambda: Add PHP code for Z2K5/'short descriptions' (T343396) (duration: 09m 41s)
12:34 jforrester@deploy1002: jforrester: Continuing with sync
12:33 jforrester@deploy1002: jforrester: Backport for WikiLambda: Add PHP code for Z2K5/'short descriptions' (T343396) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
12:31 taavi: updated T343294 migitations
12:31 jforrester@deploy1002: Started scap: Backport for WikiLambda: Add PHP code for Z2K5/'short descriptions' (T343396)
12:15 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@54c0898] (releasing): (no justification provided) (duration: 00m 42s)
12:15 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@54c0898] (releasing): (no justification provided)
12:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
12:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
12:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
12:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
12:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
12:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
12:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
12:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
12:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
12:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
12:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
12:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
12:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
12:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
12:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
12:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
12:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
12:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:49 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
11:48 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
11:48 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
11:48 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
11:48 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
11:48 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
11:48 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
11:48 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
11:48 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
11:48 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
11:48 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
11:48 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
11:47 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
11:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
11:47 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
11:47 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
11:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
11:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
11:46 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
11:46 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
11:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
11:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
11:45 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
11:45 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
11:45 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
11:45 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
11:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50070 and previous config saved to /var/cache/conftool/dbconfig/20230803-110028-ladsgroup.json
11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50069 and previous config saved to /var/cache/conftool/dbconfig/20230803-110000-ladsgroup.json
10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P50068 and previous config saved to /var/cache/conftool/dbconfig/20230803-104521-ladsgroup.json
10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P50067 and previous config saved to /var/cache/conftool/dbconfig/20230803-104454-ladsgroup.json
10:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P50066 and previous config saved to /var/cache/conftool/dbconfig/20230803-103015-ladsgroup.json
10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P50065 and previous config saved to /var/cache/conftool/dbconfig/20230803-102948-ladsgroup.json
10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50062 and previous config saved to /var/cache/conftool/dbconfig/20230803-101509-ladsgroup.json
10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50061 and previous config saved to /var/cache/conftool/dbconfig/20230803-101441-ladsgroup.json
10:13 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
10:11 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
10:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
10:11 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
10:11 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
10:10 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
10:10 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
10:09 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
10:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
09:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50059 and previous config saved to /var/cache/conftool/dbconfig/20230803-092338-ladsgroup.json
09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
09:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
09:21 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:17 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
09:16 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
09:03 claime: Deploying rename changes for mw149[7-8] to kubernetes102[5-6] - T343306
09:03 moritzm: installing systemd bugfix updates from Bookworm 12.1 point release
08:55 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
08:55 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
08:53 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
08:53 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
08:44 moritzm: installing yajl security updates
08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50058 and previous config saved to /var/cache/conftool/dbconfig/20230803-084103-root.json
08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50057 and previous config saved to /var/cache/conftool/dbconfig/20230803-083845-ladsgroup.json
08:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
08:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50056 and previous config saved to /var/cache/conftool/dbconfig/20230803-083824-ladsgroup.json
08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50055 and previous config saved to /var/cache/conftool/dbconfig/20230803-082558-root.json
08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P50054 and previous config saved to /var/cache/conftool/dbconfig/20230803-082318-ladsgroup.json
08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50053 and previous config saved to /var/cache/conftool/dbconfig/20230803-081053-root.json
08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P50052 and previous config saved to /var/cache/conftool/dbconfig/20230803-080812-ladsgroup.json
07:59 moritzm: installing Linux 5.10.179 on Buster hosts with Linux 5.10
07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50051 and previous config saved to /var/cache/conftool/dbconfig/20230803-075548-root.json
07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50050 and previous config saved to /var/cache/conftool/dbconfig/20230803-075305-ladsgroup.json
07:51 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab 16 major version upgrade
07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50049 and previous config saved to /var/cache/conftool/dbconfig/20230803-074044-root.json
07:39 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
07:38 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
07:36 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
07:36 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50048 and previous config saved to /var/cache/conftool/dbconfig/20230803-072539-root.json
07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 3%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50047 and previous config saved to /var/cache/conftool/dbconfig/20230803-071034-root.json
06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50046 and previous config saved to /var/cache/conftool/dbconfig/20230803-065529-root.json
06:35 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab 16 major version upgrade
06:33 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
06:33 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
06:33 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
06:33 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
06:31 kart_: Updated MinT to 2023-08-02-142037-production (T338292)
06:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
06:25 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50045 and previous config saved to /var/cache/conftool/dbconfig/20230803-061827-ladsgroup.json
06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50044 and previous config saved to /var/cache/conftool/dbconfig/20230803-061817-ladsgroup.json
06:17 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
06:11 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
06:07 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:05 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
06:04 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
06:03 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P50043 and previous config saved to /var/cache/conftool/dbconfig/20230803-060311-ladsgroup.json
06:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2129 T343296', diff saved to https://phabricator.wikimedia.org/P50042 and previous config saved to /var/cache/conftool/dbconfig/20230803-060241-marostegui.json
06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2114 to s6 primary T343296', diff saved to https://phabricator.wikimedia.org/P50041 and previous config saved to /var/cache/conftool/dbconfig/20230803-060055-marostegui.json
06:00 marostegui: Starting s6 codfw failover from db2129 to db2114 - T343296
05:52 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
05:52 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
05:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P50040 and previous config saved to /var/cache/conftool/dbconfig/20230803-054805-ladsgroup.json
05:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
05:46 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2114 with weight 0 T343296', diff saved to https://phabricator.wikimedia.org/P50039 and previous config saved to /var/cache/conftool/dbconfig/20230803-054418-marostegui.json
05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s6 T343296
05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s6 T343296
05:34 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
05:34 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
05:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50038 and previous config saved to /var/cache/conftool/dbconfig/20230803-053259-ladsgroup.json
03:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50037 and previous config saved to /var/cache/conftool/dbconfig/20230803-035940-ladsgroup.json
03:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
03:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
03:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T342617)', diff saved to https://phabricator.wikimedia.org/P50036 and previous config saved to /var/cache/conftool/dbconfig/20230803-035917-ladsgroup.json
03:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P50035 and previous config saved to /var/cache/conftool/dbconfig/20230803-034411-ladsgroup.json
03:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P50034 and previous config saved to /var/cache/conftool/dbconfig/20230803-032905-ladsgroup.json
03:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T342617)', diff saved to https://phabricator.wikimedia.org/P50033 and previous config saved to /var/cache/conftool/dbconfig/20230803-031359-ladsgroup.json
02:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan2002.mgmt.codfw.wmnet with reboot policy FORCED
02:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
02:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T342617)', diff saved to https://phabricator.wikimedia.org/P50032 and previous config saved to /var/cache/conftool/dbconfig/20230803-021643-ladsgroup.json
02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P50031 and previous config saved to /var/cache/conftool/dbconfig/20230803-020137-ladsgroup.json
01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P50030 and previous config saved to /var/cache/conftool/dbconfig/20230803-014629-ladsgroup.json
01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T342617)', diff saved to https://phabricator.wikimedia.org/P50029 and previous config saved to /var/cache/conftool/dbconfig/20230803-014503-ladsgroup.json
01:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
01:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
01:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T342617)', diff saved to https://phabricator.wikimedia.org/P50028 and previous config saved to /var/cache/conftool/dbconfig/20230803-014426-ladsgroup.json
01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T342617)', diff saved to https://phabricator.wikimedia.org/P50027 and previous config saved to /var/cache/conftool/dbconfig/20230803-013123-ladsgroup.json
01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P50026 and previous config saved to /var/cache/conftool/dbconfig/20230803-012920-ladsgroup.json
01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P50025 and previous config saved to /var/cache/conftool/dbconfig/20230803-011414-ladsgroup.json
00:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T342617)', diff saved to https://phabricator.wikimedia.org/P50024 and previous config saved to /var/cache/conftool/dbconfig/20230803-005908-ladsgroup.json
00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T342617)', diff saved to https://phabricator.wikimedia.org/P50023 and previous config saved to /var/cache/conftool/dbconfig/20230803-003939-ladsgroup.json
00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
00:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T342617)', diff saved to https://phabricator.wikimedia.org/P50022 and previous config saved to /var/cache/conftool/dbconfig/20230803-003916-ladsgroup.json
00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P50021 and previous config saved to /var/cache/conftool/dbconfig/20230803-002410-ladsgroup.json
00:13 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host titan2002.mgmt.codfw.wmnet with reboot policy FORCED
00:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan2001.mgmt.codfw.wmnet with reboot policy FORCED
00:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P50020 and previous config saved to /var/cache/conftool/dbconfig/20230803-000904-ladsgroup.json

2023-08-02

23:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T342617)', diff saved to https://phabricator.wikimedia.org/P50019 and previous config saved to /var/cache/conftool/dbconfig/20230802-235358-ladsgroup.json
23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T342617)', diff saved to https://phabricator.wikimedia.org/P50018 and previous config saved to /var/cache/conftool/dbconfig/20230802-232528-ladsgroup.json
23:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T342617)', diff saved to https://phabricator.wikimedia.org/P50017 and previous config saved to /var/cache/conftool/dbconfig/20230802-232507-ladsgroup.json
23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P50016 and previous config saved to /var/cache/conftool/dbconfig/20230802-231001-ladsgroup.json
23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T342617)', diff saved to https://phabricator.wikimedia.org/P50015 and previous config saved to /var/cache/conftool/dbconfig/20230802-230127-ladsgroup.json
23:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
23:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T342617)', diff saved to https://phabricator.wikimedia.org/P50014 and previous config saved to /var/cache/conftool/dbconfig/20230802-230106-ladsgroup.json
22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P50013 and previous config saved to /var/cache/conftool/dbconfig/20230802-225454-ladsgroup.json
22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P50012 and previous config saved to /var/cache/conftool/dbconfig/20230802-224559-ladsgroup.json
22:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists2001']
22:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists2001']
22:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lists2001']
22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T342617)', diff saved to https://phabricator.wikimedia.org/P50011 and previous config saved to /var/cache/conftool/dbconfig/20230802-223948-ladsgroup.json
22:39 krinkle@deploy1002: Finished scap: Backport for noc: Remove ?blame=1 from highlight.php URLs (duration: 08m 07s)
22:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host titan2001.mgmt.codfw.wmnet with reboot policy FORCED
22:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists2001']
22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for titan200[1-2] - pt1979@cumin2002"
22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lists2001.mgmt.codfw.wmnet with reboot policy FORCED
22:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for titan200[1-2] - pt1979@cumin2002"
22:32 krinkle@deploy1002: reedy and krinkle: Continuing with sync
22:32 krinkle@deploy1002: reedy and krinkle: Backport for noc: Remove ?blame=1 from highlight.php URLs synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
22:31 krinkle@deploy1002: Started scap: Backport for noc: Remove ?blame=1 from highlight.php URLs
22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P50010 and previous config saved to /var/cache/conftool/dbconfig/20230802-223053-ladsgroup.json
22:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lists2001.mgmt.codfw.wmnet with reboot policy FORCED
22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add switch interface and DNS for lists2001 - pt1979@cumin2002"
22:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add switch interface and DNS for lists2001 - pt1979@cumin2002"
22:18 krinkle@deploy1002: Finished scap: Backport for Profiler: Sync minor changes with arc-lamp.git package (T337873) (duration: 11m 02s)
22:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T342617)', diff saved to https://phabricator.wikimedia.org/P50009 and previous config saved to /var/cache/conftool/dbconfig/20230802-221547-ladsgroup.json
22:12 krinkle@deploy1002: krinkle: Continuing with sync
22:09 krinkle@deploy1002: krinkle: Backport for Profiler: Sync minor changes with arc-lamp.git package (T337873) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
22:07 krinkle@deploy1002: Started scap: Backport for Profiler: Sync minor changes with arc-lamp.git package (T337873)
21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T342617)', diff saved to https://phabricator.wikimedia.org/P50008 and previous config saved to /var/cache/conftool/dbconfig/20230802-212412-ladsgroup.json
21:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
21:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50007 and previous config saved to /var/cache/conftool/dbconfig/20230802-212352-ladsgroup.json
21:10 dancy@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.20 refs T340248 (duration: 06m 21s)
21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P50006 and previous config saved to /var/cache/conftool/dbconfig/20230802-210846-ladsgroup.json
21:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.20 refs T340248
20:55 dancy@deploy1002: Finished scap: Backport for Revert "LocalisationCache: Load only core data if possible" (T342418 T343375) (duration: 08m 47s)
20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P50005 and previous config saved to /var/cache/conftool/dbconfig/20230802-205339-ladsgroup.json
20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T342617)', diff saved to https://phabricator.wikimedia.org/P50004 and previous config saved to /var/cache/conftool/dbconfig/20230802-204941-ladsgroup.json
20:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
20:49 dancy@deploy1002: dancy: Continuing with sync
20:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T342617)', diff saved to https://phabricator.wikimedia.org/P50003 and previous config saved to /var/cache/conftool/dbconfig/20230802-204919-ladsgroup.json
20:48 dancy@deploy1002: dancy: Backport for Revert "LocalisationCache: Load only core data if possible" (T342418 T343375) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:46 dancy@deploy1002: Started scap: Backport for Revert "LocalisationCache: Load only core data if possible" (T342418 T343375)
20:41 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@cbce175]: Deploy latest for Airflow analytics instance. (duration: 00m 20s)
20:41 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@cbce175]: Deploy latest for Airflow analytics instance.
20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50002 and previous config saved to /var/cache/conftool/dbconfig/20230802-203833-ladsgroup.json
20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P50001 and previous config saved to /var/cache/conftool/dbconfig/20230802-203413-ladsgroup.json
20:29 dancy@deploy1002: Finished scap: Backport for Add validator userright for pawikisource (T341428) (duration: 20m 49s)
20:23 dancy@deploy1002: dancy and soda: Continuing with sync
20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P50000 and previous config saved to /var/cache/conftool/dbconfig/20230802-201907-ladsgroup.json
20:10 dancy@deploy1002: dancy and soda: Backport for Add validator userright for pawikisource (T341428) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:08 dancy@deploy1002: Started scap: Backport for Add validator userright for pawikisource (T341428)
20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T342617)', diff saved to https://phabricator.wikimedia.org/P49999 and previous config saved to /var/cache/conftool/dbconfig/20230802-200401-ladsgroup.json
19:46 xcollazo@deploy1002: Finished deploy [analytics/refinery@27def33] (hadoop-test): Special refinery deploy to fix mediwiki_history_denormalize TEST [analytics/refinery@27def33] (duration: 01m 59s)
19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T342617)', diff saved to https://phabricator.wikimedia.org/P49998 and previous config saved to /var/cache/conftool/dbconfig/20230802-194518-ladsgroup.json
19:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
19:44 xcollazo@deploy1002: Started deploy [analytics/refinery@27def33] (hadoop-test): Special refinery deploy to fix mediwiki_history_denormalize TEST [analytics/refinery@27def33]
19:43 xcollazo@deploy1002: Finished deploy [analytics/refinery@27def33] (thin): Special refinery deploy to fix mediwiki_history_denormalize THIN [analytics/refinery@27def33] (duration: 00m 04s)
19:43 xcollazo@deploy1002: Started deploy [analytics/refinery@27def33] (thin): Special refinery deploy to fix mediwiki_history_denormalize THIN [analytics/refinery@27def33]
19:41 xcollazo@deploy1002: Finished deploy [analytics/refinery@27def33]: Special refinery deploy to fix mediwiki_history_denormalize [analytics/refinery@27def33] (duration: 07m 48s)
19:39 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
19:39 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
19:34 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
19:34 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
19:34 xcollazo@deploy1002: Started deploy [analytics/refinery@27def33]: Special refinery deploy to fix mediwiki_history_denormalize [analytics/refinery@27def33]
18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2025']
18:32 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.20 refs T340248
18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T342617)', diff saved to https://phabricator.wikimedia.org/P49997 and previous config saved to /var/cache/conftool/dbconfig/20230802-182059-ladsgroup.json
18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T342617)', diff saved to https://phabricator.wikimedia.org/P49996 and previous config saved to /var/cache/conftool/dbconfig/20230802-182038-ladsgroup.json
18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P49995 and previous config saved to /var/cache/conftool/dbconfig/20230802-181724-ladsgroup.json
18:16 dancy@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.20 refs T340248 (duration: 06m 38s)
18:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.20 refs T340248
18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P49994 and previous config saved to /var/cache/conftool/dbconfig/20230802-180532-ladsgroup.json
18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P49993 and previous config saved to /var/cache/conftool/dbconfig/20230802-180218-ladsgroup.json
17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P49991 and previous config saved to /var/cache/conftool/dbconfig/20230802-175026-ladsgroup.json
17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P49990 and previous config saved to /var/cache/conftool/dbconfig/20230802-174712-ladsgroup.json
17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T342617)', diff saved to https://phabricator.wikimedia.org/P49989 and previous config saved to /var/cache/conftool/dbconfig/20230802-173520-ladsgroup.json
17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P49988 and previous config saved to /var/cache/conftool/dbconfig/20230802-173206-ladsgroup.json
16:58 samtar@deploy1002: Finished scap: Backport for enwiki: temp enable emergencyCaptcha (duration: 07m 48s)
16:52 samtar@deploy1002: samtar: Continuing with sync
16:52 samtar@deploy1002: samtar: Backport for enwiki: temp enable emergencyCaptcha synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
16:51 samtar@deploy1002: Started scap: Backport for enwiki: temp enable emergencyCaptcha
16:46 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
16:46 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
16:46 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
16:46 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
16:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
16:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
16:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
15:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2016.codfw.wmnet with OS bullseye
15:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P49985 and previous config saved to /var/cache/conftool/dbconfig/20230802-155618-ladsgroup.json
15:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
15:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49984 and previous config saved to /var/cache/conftool/dbconfig/20230802-155558-ladsgroup.json
15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T342617)', diff saved to https://phabricator.wikimedia.org/P49983 and previous config saved to /var/cache/conftool/dbconfig/20230802-155319-ladsgroup.json
15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
15:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
15:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T342617)', diff saved to https://phabricator.wikimedia.org/P49982 and previous config saved to /var/cache/conftool/dbconfig/20230802-155258-ladsgroup.json
15:51 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
15:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1026
15:45 cgoubert@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1026
15:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1025
15:45 cgoubert@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1025
15:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix kubernetes10[25-26] main interfaces - cgoubert@cumin1001"
15:43 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix kubernetes10[25-26] main interfaces - cgoubert@cumin1001"
15:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3002.wikimedia.org
15:41 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P49981 and previous config saved to /var/cache/conftool/dbconfig/20230802-154051-ladsgroup.json
15:40 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
15:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2016.codfw.wmnet with reason: host reimage
15:38 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns3002.wikimedia.org
15:37 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P49980 and previous config saved to /var/cache/conftool/dbconfig/20230802-153751-ladsgroup.json
15:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2016.codfw.wmnet with reason: host reimage
15:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2025']
15:30 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
15:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2025']
15:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P49979 and previous config saved to /var/cache/conftool/dbconfig/20230802-152545-ladsgroup.json
15:25 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
15:24 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
15:24 brett: Remove dns3002 from cr2-esams and cr3-esams routes in prep for reboot - T335835
15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P49978 and previous config saved to /var/cache/conftool/dbconfig/20230802-152245-ladsgroup.json
15:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pc2016.codfw.wmnet with OS bullseye
15:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on config-master2001.codfw.wmnet,config-master1001.eqiad.wmnet with reason: WIP hosts to be setup
15:15 volans@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on config-master2001.codfw.wmnet,config-master1001.eqiad.wmnet with reason: WIP hosts to be setup
15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49977 and previous config saved to /var/cache/conftool/dbconfig/20230802-151038-ladsgroup.json
15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T342617)', diff saved to https://phabricator.wikimedia.org/P49976 and previous config saved to /var/cache/conftool/dbconfig/20230802-150739-ladsgroup.json
15:07 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
15:04 moritzm: installing gst-plugins-base1.0 security updates
14:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc2016']
14:58 elukey@deploy1002: Finished scap: Backport for ext-ORES: avoid Lift Wing calls for fiwiki (T343308) (duration: 09m 08s)
14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
14:52 elukey@deploy1002: elukey: Continuing with sync
14:52 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1026.mgmt.eqiad.wmnet with reboot policy FORCED
14:51 elukey@deploy1002: elukey: Backport for ext-ORES: avoid Lift Wing calls for fiwiki (T343308) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
14:50 volans@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1026.mgmt.eqiad.wmnet with reboot policy FORCED
14:49 elukey@deploy1002: Started scap: Backport for ext-ORES: avoid Lift Wing calls for fiwiki (T343308)
14:48 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
14:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2016']
14:44 moritzm: installing iperf3 security updates
14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
14:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
14:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudservices1006.mgmt.eqiad.wmnet with reboot policy FORCED
14:42 volans@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
14:41 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
14:41 volans@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
14:39 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:39 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mw[1497-1498] to kubernetes[1025-1026] - cgoubert@cumin1001"
14:38 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
14:38 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mw[1497-1498] to kubernetes[1025-1026] - cgoubert@cumin1001"
14:35 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
14:35 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
14:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
14:26 sbassett: Deployed updated mitigation for T336027
14:19 fabfur: importing python-logstash in bookworm-wikimedia (T342154)
14:19 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1497-1498].eqiad.wmnet
14:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1497-1498].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1001"
14:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2025']
14:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
14:18 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1497-1498].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1001"
14:17 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on people1004.eqiad.wmnet with reason: Resizing disk
14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49975 and previous config saved to /var/cache/conftool/dbconfig/20230802-141719-ladsgroup.json
14:17 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on people1004.eqiad.wmnet with reason: Resizing disk
14:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
14:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T342617)', diff saved to https://phabricator.wikimedia.org/P49974 and previous config saved to /var/cache/conftool/dbconfig/20230802-141640-ladsgroup.json
14:15 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
14:15 fabfur: importing varnish and libvarnishapi2 in bookworm-wikimedia (T342154)
14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2015.codfw.wmnet with OS bullseye
14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:06 cgoubert@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1497-1498].eqiad.wmnet
14:05 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1497-1498].eqiad.wment
14:05 cgoubert@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:03 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
14:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2014']
14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P49973 and previous config saved to /var/cache/conftool/dbconfig/20230802-140134-ladsgroup.json
13:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2025']
13:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
13:56 claime: Decomissioning mw1497 and mw1498 - T343306
13:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudservices1006.mgmt.eqiad.wmnet with reboot policy FORCED
13:54 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices1006
13:54 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices1006
13:52 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:52 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudservices1006 - jclark@cumin1001"
13:51 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudservices1006 - jclark@cumin1001"
13:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2015.codfw.wmnet with reason: host reimage
13:49 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2015.codfw.wmnet with reason: host reimage
13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P49971 and previous config saved to /var/cache/conftool/dbconfig/20230802-134628-ladsgroup.json
13:36 Lucas_WMDE: UTC afternoon backport+config window done
13:35 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Inject LanguageNameLookupFactory into WikibaseValueFormatterBuilders (T281726) (duration: 08m 39s)
13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T342617)', diff saved to https://phabricator.wikimedia.org/P49970 and previous config saved to /var/cache/conftool/dbconfig/20230802-133122-ladsgroup.json
13:29 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Continuing with sync
13:28 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Inject LanguageNameLookupFactory into WikibaseValueFormatterBuilders (T281726) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49969 and previous config saved to /var/cache/conftool/dbconfig/20230802-132819-ladsgroup.json
13:26 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
13:26 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Inject LanguageNameLookupFactory into WikibaseValueFormatterBuilders (T281726)
13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T342617)', diff saved to https://phabricator.wikimedia.org/P49968 and previous config saved to /var/cache/conftool/dbconfig/20230802-132632-ladsgroup.json
13:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
13:26 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for uzwiki: Install WikiLove (T343270) (duration: 09m 58s)
13:26 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
13:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pc2015.codfw.wmnet with OS bullseye
13:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2014']
13:20 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Continuing with sync
13:17 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Backport for uzwiki: Install WikiLove (T343270) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:16 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for uzwiki: Install WikiLove (T343270)
13:13 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php uzwiki wikilove # Create extension tables for Wikilove on uzwiki (T343270)
13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49967 and previous config saved to /var/cache/conftool/dbconfig/20230802-131314-ladsgroup.json
13:12 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/simplewiktionary.png\n' | mwscript purgeList.php # T343084
13:11 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for simplewiktionary: Update project logo (T343084) (duration: 08m 13s)
13:06 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Continuing with sync
13:05 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Backport for simplewiktionary: Update project logo (T343084) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:03 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for simplewiktionary: Update project logo (T343084)
12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49965 and previous config saved to /var/cache/conftool/dbconfig/20230802-125810-ladsgroup.json
12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49964 and previous config saved to /var/cache/conftool/dbconfig/20230802-124305-ladsgroup.json
12:42 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics_product@8bba01c]: Redeploy of analytics_product Airflow instance (duration: 00m 08s)
12:42 xcollazo@deploy1002: Started deploy [airflow-dags/analytics_product@8bba01c]: Redeploy of analytics_product Airflow instance
12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1184 T342284', diff saved to https://phabricator.wikimedia.org/P49963 and previous config saved to /var/cache/conftool/dbconfig/20230802-123228-ladsgroup.json
12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T342617)', diff saved to https://phabricator.wikimedia.org/P49962 and previous config saved to /var/cache/conftool/dbconfig/20230802-122816-ladsgroup.json
12:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
12:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49961 and previous config saved to /var/cache/conftool/dbconfig/20230802-122756-ladsgroup.json
12:21 dcausse@deploy1002: Finished deploy [airflow-dags/search@8bba01c]: search: do not use hive partitions to wait for wmf_raw.mediawiki_page (duration: 00m 11s)
12:21 dcausse@deploy1002: Started deploy [airflow-dags/search@8bba01c]: search: do not use hive partitions to wait for wmf_raw.mediawiki_page
12:19 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1498.eqiad.wmnet
12:19 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1497.eqiad.wmnet
12:19 claime: Depool mw1497 and mw1498 for reimage as wikikube nodes - T343306
12:18 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on people2003.codfw.wmnet with reason: Resizing disk
12:17 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on people2003.codfw.wmnet with reason: Resizing disk
12:13 claime: Repool mw1451 and mw1452, more recent servers will be used - T343306
12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P49960 and previous config saved to /var/cache/conftool/dbconfig/20230802-121249-ladsgroup.json
12:11 jelto: update gitlab-ce package to 16.0.8-ce.0
12:09 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1452.eqiad.wmnet
12:09 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1451.eqiad.wmnet
12:09 claime: Depool mw1451 and mw1452 for reimage as wikikube nodes - T343306
11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P49959 and previous config saved to /var/cache/conftool/dbconfig/20230802-115743-ladsgroup.json
11:57 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49958 and previous config saved to /var/cache/conftool/dbconfig/20230802-114237-ladsgroup.json
11:41 moritzm: installing libxml2 security updates
11:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
11:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
11:40 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest2002.codfw.wmnet
11:40 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest2002.codfw.wmnet
11:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
11:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
11:17 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
11:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
10:41 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
10:40 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
10:40 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
10:37 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
10:30 samtar@deploy1002: Finished scap: Backport for Revert "enwiki: temp enable emergencyCaptcha" (duration: 07m 33s)
10:24 samtar@deploy1002: samtar: Continuing with sync
10:24 samtar@deploy1002: samtar: Backport for Revert "enwiki: temp enable emergencyCaptcha" synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
10:22 samtar@deploy1002: Started scap: Backport for Revert "enwiki: temp enable emergencyCaptcha"
10:02 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49954 and previous config saved to /var/cache/conftool/dbconfig/20230802-095428-ladsgroup.json
09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
09:39 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
09:24 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
09:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
09:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
09:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
09:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
09:13 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:12 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:02 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
09:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
09:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
09:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
08:53 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
08:39 jelto: downgrade gitlab-ce package to 15.11.13-ce.0
08:15 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
08:07 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
07:28 taavi: mwscript namespaceDupes.php idwikisource --fix --add-prefix "BROKEN " # T341173
07:19 taavi@deploy1002: Finished scap: Backport for idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias (T341173), Change idwikisource logos (T341173) (duration: 11m 43s)
07:18 moritzm: installing Linux 5.10.179-3 on bullseye hosts
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49951 and previous config saved to /var/cache/conftool/dbconfig/20230802-071441-root.json
07:13 taavi@deploy1002: anzx and taavi: Continuing with sync
07:09 taavi@deploy1002: anzx and taavi: Backport for idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias (T341173), Change idwikisource logos (T341173) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:07 taavi@deploy1002: Started scap: Backport for idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias (T341173), Change idwikisource logos (T341173)
06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49950 and previous config saved to /var/cache/conftool/dbconfig/20230802-065936-root.json
06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49949 and previous config saved to /var/cache/conftool/dbconfig/20230802-064431-root.json
06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49948 and previous config saved to /var/cache/conftool/dbconfig/20230802-062925-root.json
06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49947 and previous config saved to /var/cache/conftool/dbconfig/20230802-061420-root.json
06:13 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
06:12 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
06:12 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
06:12 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
06:11 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
06:10 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49946 and previous config saved to /var/cache/conftool/dbconfig/20230802-055916-root.json
05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 3%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49945 and previous config saved to /var/cache/conftool/dbconfig/20230802-054411-root.json
05:33 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: enabling emergency captcha on enwiki - T343294 (take 2) (duration: 06m 40s)
05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 1%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49944 and previous config saved to /var/cache/conftool/dbconfig/20230802-052906-root.json
05:23 marostegui: Stop mariadb on es2025 for onsite maintenance dbmaint codfw T343254
05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2025 T343254', diff saved to https://phabricator.wikimedia.org/P49943 and previous config saved to /var/cache/conftool/dbconfig/20230802-052021-root.json
05:11 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: enabling emergency captcha on enwiki - T343294 (duration: 06m 36s)
04:49 _joe_: running scap pull on mwmaint1002 to pick up the noc.w.o changes
01:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc2015']
01:24 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2015']
01:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2015.mgmt.codfw.wmnet with reboot policy FORCED
01:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
01:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for pc2016 - pt1979@cumin2002"
01:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for pc2016 - pt1979@cumin2002"
00:59 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2005-dev.codfw.wmnet with OS bullseye
00:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pc2015.mgmt.codfw.wmnet with reboot policy FORCED
00:41 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:41 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch interfaces and DNS for pc201[5-6] - pt1979@cumin2002"
00:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch interfaces and DNS for pc201[5-6] - pt1979@cumin2002"
00:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
00:37 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bullseye
00:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2006-dev.codfw.wmnet with OS bullseye
00:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:25 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
00:17 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
00:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
00:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage

2023-08-01

23:13 eileen: config revision changed from 8b3a46c3 to f5e6425b - updated process controll (added segmentation_aging job - rollback if it doesn't work)
22:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bullseye
22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bullseye
22:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bullseye
22:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2006-dev']
22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2005-dev']
22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2004-dev']
22:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2006-dev']
22:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
22:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
22:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2006-dev']
22:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2004-dev']
22:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2005-dev']
22:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
22:11 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2005-dev']
22:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
22:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2004-dev']
22:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2006-dev']
22:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
21:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
21:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2008-dev.codfw.wmnet with OS bullseye
21:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
21:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2007-dev.codfw.wmnet with OS bullseye
21:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
21:29 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Log the 'WikiLambda' warnings and above logs (duration: 10m 22s)
21:23 jforrester@deploy1002: jforrester: Continuing with sync
21:20 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Log the 'WikiLambda' warnings and above logs synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
21:19 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Log the 'WikiLambda' warnings and above logs
21:16 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Restrict wikilambda-execute to functioneers for now (duration: 09m 03s)
21:10 jforrester@deploy1002: jforrester: Continuing with sync
21:09 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Restrict wikilambda-execute to functioneers for now synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
21:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
21:07 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Restrict wikilambda-execute to functioneers for now
21:05 jforrester@deploy1002: Synchronized ./php-1.41.0-wmf.20/extensions/WikiLambda/: T343253 T343256 (duration: 07m 23s)
20:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:55 jforrester@deploy1002: Synchronized ./php-1.41.0-wmf.19/extensions/WikiLambda/: T343253 T343256 (duration: 06m 58s)
20:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2008-dev.codfw.wmnet with reason: host reimage
20:49 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2008-dev.codfw.wmnet with reason: host reimage
20:44 urbanecm@deploy1002: Finished scap: Backport for Write new on group0 for event table migration (T330158) (duration: 21m 46s)
20:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2007-dev.codfw.wmnet with reason: host reimage
20:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2008-dev.codfw.wmnet with OS bullseye
20:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2007-dev.codfw.wmnet with reason: host reimage
20:38 urbanecm@deploy1002: urbanecm and dreamyjazz: Continuing with sync
20:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2008-dev.codfw.wmnet with OS bullseye
20:23 urbanecm@deploy1002: urbanecm and dreamyjazz: Backport for Write new on group0 for event table migration (T330158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:22 urbanecm@deploy1002: Started scap: Backport for Write new on group0 for event table migration (T330158)
20:19 urbanecm@deploy1002: Finished scap: Backport for Design: Provide wordmarks/taglines for Wikiversity projects (T341256), Provide wordmarks for Wikivoyage projects (T341259) (duration: 09m 41s)
20:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2007-dev.codfw.wmnet with OS bullseye
20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2007-dev.codfw.wmnet with OS bullseye
20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2008-dev.codfw.wmnet with reason: host reimage
20:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2008-dev.codfw.wmnet with reason: host reimage
20:13 urbanecm@deploy1002: urbanecm and jdlrobson: Continuing with sync
20:11 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Design: Provide wordmarks/taglines for Wikiversity projects (T341256), Provide wordmarks for Wikivoyage projects (T341259) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option
20:10 urbanecm@deploy1002: Started scap: Backport for Design: Provide wordmarks/taglines for Wikiversity projects (T341256), Provide wordmarks for Wikivoyage projects (T341259)
20:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49941 and previous config saved to /var/cache/conftool/dbconfig/20230801-200444-ladsgroup.json
19:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
19:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2008-dev']
19:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2008-dev.codfw.wmnet with OS bullseye
19:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
19:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2007-dev.codfw.wmnet with reason: host reimage
19:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2008-dev']
19:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49940 and previous config saved to /var/cache/conftool/dbconfig/20230801-194938-ladsgroup.json
19:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2007-dev.codfw.wmnet with reason: host reimage
19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2008-dev']
19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49939 and previous config saved to /var/cache/conftool/dbconfig/20230801-193432-ladsgroup.json
19:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
19:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
19:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2008-dev']
19:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2007-dev.codfw.wmnet with OS bullseye
19:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2007-dev']
19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49938 and previous config saved to /var/cache/conftool/dbconfig/20230801-191925-ladsgroup.json
19:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
19:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49937 and previous config saved to /var/cache/conftool/dbconfig/20230801-191709-ladsgroup.json
19:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2007-dev']
19:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2006-dev']
19:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2007-dev']
19:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2006-dev']
19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P49936 and previous config saved to /var/cache/conftool/dbconfig/20230801-190203-ladsgroup.json
19:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2007-dev']
18:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P49935 and previous config saved to /var/cache/conftool/dbconfig/20230801-184657-ladsgroup.json
18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49934 and previous config saved to /var/cache/conftool/dbconfig/20230801-184220-ladsgroup.json
18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49933 and previous config saved to /var/cache/conftool/dbconfig/20230801-184159-ladsgroup.json
18:39 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
18:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2007-dev']
18:37 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
18:37 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
18:36 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
18:36 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:35 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:33 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:33 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49932 and previous config saved to /var/cache/conftool/dbconfig/20230801-183151-ladsgroup.json
18:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2008-dev']
18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49931 and previous config saved to /var/cache/conftool/dbconfig/20230801-182653-ladsgroup.json
18:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2007-dev']
18:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2008-dev']
18:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2007-dev']
18:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2006-dev']
18:15 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.20 refs T340248
18:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2007-dev']
18:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2006-dev']
18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49930 and previous config saved to /var/cache/conftool/dbconfig/20230801-181147-ladsgroup.json
18:05 fabfur: adding dns3001 on cr2-esams and cr3-esams routing for ns2 (T335835)
17:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49929 and previous config saved to /var/cache/conftool/dbconfig/20230801-175641-ladsgroup.json
17:55 fabfur: running authdns-update on dns1004 to revert ntp.esams to dns3001 (T335835)
17:48 fabfur: running puppet on 'A:cumin or A:dns-rec or A:netbox' (https://gerrit.wikimedia.org/r/c/operations/puppet/+/944286) (T335835)
17:42 fabfur: started bird and enabled puppet on dns3001 (T335835)
17:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3001.wikimedia.org
17:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns3001.wikimedia.org
17:36 fabfur: stopped bird and disable puppet on dns3001 (T335835)
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49928 and previous config saved to /var/cache/conftool/dbconfig/20230801-173130-ladsgroup.json
17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49927 and previous config saved to /var/cache/conftool/dbconfig/20230801-173109-ladsgroup.json
17:26 fabfur: running puppet on 'A:cumin or A:dns-rec or A:netbox' (https://gerrit.wikimedia.org/r/c/operations/puppet/+/944286) (T335835)
17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P49926 and previous config saved to /var/cache/conftool/dbconfig/20230801-171603-ladsgroup.json
17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49925 and previous config saved to /var/cache/conftool/dbconfig/20230801-171120-ladsgroup.json
17:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
17:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49924 and previous config saved to /var/cache/conftool/dbconfig/20230801-171059-ladsgroup.json
17:09 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb]: Update kartotherian to e28ea7ef (T334668 T332985 T332664 T329924) (duration: 04m 25s)
17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@ee544cb]: Update kartotherian to e28ea7ef (T334668 T332985 T332664 T329924)
17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P49923 and previous config saved to /var/cache/conftool/dbconfig/20230801-170057-ladsgroup.json
16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49922 and previous config saved to /var/cache/conftool/dbconfig/20230801-165553-ladsgroup.json
16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49921 and previous config saved to /var/cache/conftool/dbconfig/20230801-164550-ladsgroup.json
16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49920 and previous config saved to /var/cache/conftool/dbconfig/20230801-164047-ladsgroup.json
16:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2005-dev.mgmt.codfw.wmnet with reboot policy FORCED
16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49919 and previous config saved to /var/cache/conftool/dbconfig/20230801-162541-ladsgroup.json
16:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
16:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
16:21 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
16:20 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
16:07 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
16:06 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49918 and previous config saved to /var/cache/conftool/dbconfig/20230801-160006-ladsgroup.json
16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
15:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49917 and previous config saved to /var/cache/conftool/dbconfig/20230801-155945-ladsgroup.json
15:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt2005-dev.mgmt.codfw.wmnet with reboot policy FORCED
15:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
15:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P49916 and previous config saved to /var/cache/conftool/dbconfig/20230801-154439-ladsgroup.json
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49915 and previous config saved to /var/cache/conftool/dbconfig/20230801-154242-ladsgroup.json
15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49914 and previous config saved to /var/cache/conftool/dbconfig/20230801-154220-ladsgroup.json
15:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49913 and previous config saved to /var/cache/conftool/dbconfig/20230801-153155-ladsgroup.json
15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P49912 and previous config saved to /var/cache/conftool/dbconfig/20230801-152933-ladsgroup.json
15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49911 and previous config saved to /var/cache/conftool/dbconfig/20230801-152714-ladsgroup.json
15:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudnet2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49910 and previous config saved to /var/cache/conftool/dbconfig/20230801-151650-ladsgroup.json
15:15 moritzm: bounce ferm on dse-k8s-ctrl1001
15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49909 and previous config saved to /var/cache/conftool/dbconfig/20230801-151427-ladsgroup.json
15:14 apine@deploy1002: Finished scap: Backport for Move wikifunctions.org from locked-down to limited deployment (T342820) (duration: 07m 45s)
15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49908 and previous config saved to /var/cache/conftool/dbconfig/20230801-151208-ladsgroup.json
15:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
15:08 apine@deploy1002: jforrester and apine: Continuing with sync
15:07 apine@deploy1002: jforrester and apine: Backport for Move wikifunctions.org from locked-down to limited deployment (T342820) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
15:06 apine@deploy1002: Started scap: Backport for Move wikifunctions.org from locked-down to limited deployment (T342820)
15:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudnet2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49907 and previous config saved to /var/cache/conftool/dbconfig/20230801-150146-ladsgroup.json
14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49906 and previous config saved to /var/cache/conftool/dbconfig/20230801-145702-ladsgroup.json
14:47 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add config-master[12]001 - jbond@cumin1001 - T341717"
14:46 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add config-master[12]001 - jbond@cumin1001 - T341717"
14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49905 and previous config saved to /var/cache/conftool/dbconfig/20230801-144641-ladsgroup.json
14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49904 and previous config saved to /var/cache/conftool/dbconfig/20230801-143930-ladsgroup.json
14:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
14:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49903 and previous config saved to /var/cache/conftool/dbconfig/20230801-143909-ladsgroup.json
14:38 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dse-k8s-ctrl1001.eqiad.wmnet
14:34 Lucas_WMDE: UTC afternoon backport+config window done (one change, then some k8s issues, which are resolved for now)
14:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
14:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
14:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
14:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
14:25 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
14:25 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
14:24 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
14:24 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
14:24 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
14:24 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
14:24 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P49902 and previous config saved to /var/cache/conftool/dbconfig/20230801-142403-ladsgroup.json
14:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
14:22 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
14:22 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
14:21 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
14:21 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
14:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
14:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
14:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
14:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
14:19 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:18 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
14:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
14:16 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
14:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
14:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
14:15 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
14:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
14:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
14:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
14:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
14:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
14:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
14:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49901 and previous config saved to /var/cache/conftool/dbconfig/20230801-141144-ladsgroup.json
14:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
14:11 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49900 and previous config saved to /var/cache/conftool/dbconfig/20230801-141123-ladsgroup.json
14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P49899 and previous config saved to /var/cache/conftool/dbconfig/20230801-140856-ladsgroup.json
14:07 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
14:05 fabfur: running authdns-update on dns1004 to move ntp.esams to dns3002 (https://gerrit.wikimedia.org/r/c/operations/dns/+/944232) (T335835)
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P49897 and previous config saved to /var/cache/conftool/dbconfig/20230801-135617-ladsgroup.json
13:54 fabfur: removing dns3001 from cr2-esams and cr3-esams routing for reboot (T335835)
13:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1001.eqiad.wmnet
13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49896 and previous config saved to /var/cache/conftool/dbconfig/20230801-135350-ladsgroup.json
13:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
13:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
13:49 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
13:49 cgoubert@deploy1002: Started scap: (no justification provided)
13:47 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
13:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1001.eqiad.wmnet
13:46 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
13:46 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
13:45 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host config-master2001.codfw.wmnet
13:45 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host config-master2001.codfw.wmnet with OS bookworm
13:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
13:43 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for btmwiktionary: Add project logo (T343004) (duration: 32m 32s)
13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P49895 and previous config saved to /var/cache/conftool/dbconfig/20230801-134111-ladsgroup.json
13:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
13:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
13:33 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on config-master2001.codfw.wmnet with reason: host reimage
13:33 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host config-master1001.eqiad.wmnet
13:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host config-master1001.eqiad.wmnet with OS bookworm
13:32 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
13:31 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
13:30 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on config-master2001.codfw.wmnet with reason: host reimage
13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49891 and previous config saved to /var/cache/conftool/dbconfig/20230801-132604-ladsgroup.json
13:24 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Continuing with sync
13:22 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for btmwiktionary: Add project logo (T343004) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on config-master1001.eqiad.wmnet with reason: host reimage
13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49890 and previous config saved to /var/cache/conftool/dbconfig/20230801-131946-ladsgroup.json
13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49889 and previous config saved to /var/cache/conftool/dbconfig/20230801-131925-ladsgroup.json
13:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on config-master1001.eqiad.wmnet with reason: host reimage
13:12 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host config-master2001.codfw.wmnet with OS bookworm
13:11 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master2001.codfw.wmnet - jbond@cumin2002"
13:11 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master2001.codfw.wmnet - jbond@cumin2002"
13:11 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for btmwiktionary: Add project logo (T343004)
13:10 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master2001.codfw.wmnet on all recursors
13:10 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache config-master2001.codfw.wmnet on all recursors
13:10 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:10 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master2001.codfw.wmnet - jbond@cumin2002"
13:09 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master2001.codfw.wmnet - jbond@cumin2002"
13:06 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host config-master1001.eqiad.wmnet with OS bookworm
13:06 jbond@cumin2002: START - Cookbook sre.dns.netbox
13:06 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host config-master2001.codfw.wmnet
13:05 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master1001.eqiad.wmnet - jbond@cumin1001"
13:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master1001.eqiad.wmnet - jbond@cumin1001"
13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P49888 and previous config saved to /var/cache/conftool/dbconfig/20230801-130419-ladsgroup.json
13:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master1001.eqiad.wmnet on all recursors
13:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache config-master1001.eqiad.wmnet on all recursors
13:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:01 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master1001.eqiad.wmnet - jbond@cumin1001"
13:00 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master1001.eqiad.wmnet - jbond@cumin1001"
12:58 jbond@cumin1001: START - Cookbook sre.dns.netbox
12:58 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host config-master1001.eqiad.wmnet
12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P49887 and previous config saved to /var/cache/conftool/dbconfig/20230801-124912-ladsgroup.json
12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49886 and previous config saved to /var/cache/conftool/dbconfig/20230801-124508-ladsgroup.json
12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49885 and previous config saved to /var/cache/conftool/dbconfig/20230801-124442-ladsgroup.json
12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49883 and previous config saved to /var/cache/conftool/dbconfig/20230801-123406-ladsgroup.json
12:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
12:30 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.resource_report (exit_code=0)
12:30 jbond@cumin1001: START - Cookbook sre.ganeti.resource_report
12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P49882 and previous config saved to /var/cache/conftool/dbconfig/20230801-122936-ladsgroup.json
12:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P49881 and previous config saved to /var/cache/conftool/dbconfig/20230801-121430-ladsgroup.json
12:11 fabfur: imported purged package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/purged/+/944177) T342154
12:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1076.eqiad.wmnet with OS bullseye
12:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1077.eqiad.wmnet with OS bullseye
12:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49880 and previous config saved to /var/cache/conftool/dbconfig/20230801-115924-ladsgroup.json
11:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
11:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49879 and previous config saved to /var/cache/conftool/dbconfig/20230801-115110-ladsgroup.json
11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
11:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
11:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
11:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1077.eqiad.wmnet with reason: host reimage
11:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1076.eqiad.wmnet with reason: host reimage
11:33 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1077.eqiad.wmnet with reason: host reimage
11:33 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1076.eqiad.wmnet with reason: host reimage
11:22 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
11:21 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49878 and previous config saved to /var/cache/conftool/dbconfig/20230801-111829-ladsgroup.json
11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49877 and previous config saved to /var/cache/conftool/dbconfig/20230801-111808-ladsgroup.json
11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49876 and previous config saved to /var/cache/conftool/dbconfig/20230801-110858-ladsgroup.json
11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P49875 and previous config saved to /var/cache/conftool/dbconfig/20230801-110302-ladsgroup.json
10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P49874 and previous config saved to /var/cache/conftool/dbconfig/20230801-105352-ladsgroup.json
10:51 hnowlan@deploy1002: Finished deploy [restbase/deploy@8eb62f2]: Add gpewiki and btmwiktionary (T335988, T336116) (duration: 20m 29s)
10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P49873 and previous config saved to /var/cache/conftool/dbconfig/20230801-104755-ladsgroup.json
10:45 moritzm: update d-i images to bookworm 12.1 T343121
10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P49872 and previous config saved to /var/cache/conftool/dbconfig/20230801-103846-ladsgroup.json
10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49871 and previous config saved to /var/cache/conftool/dbconfig/20230801-103249-ladsgroup.json
10:31 hnowlan@deploy1002: Started deploy [restbase/deploy@8eb62f2]: Add gpewiki and btmwiktionary (T335988, T336116)
10:28 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1076.eqiad.wmnet with OS bullseye
10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49870 and previous config saved to /var/cache/conftool/dbconfig/20230801-102340-ladsgroup.json
10:21 fabfur: imported prometheus-varnishkafka-exporter package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/prometheus-varnishkafka-exporter/+/944169) T342154
10:18 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1077.eqiad.wmnet with OS bullseye
09:47 urbanecm@deploy1002: Finished scap: Backport for Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192) (duration: 11m 35s)
09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49869 and previous config saved to /var/cache/conftool/dbconfig/20230801-094538-ladsgroup.json
09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
09:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
09:40 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
09:39 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
09:38 urbanecm@deploy1002: urbanecm: Continuing with sync
09:37 urbanecm@deploy1002: urbanecm: Backport for Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49868 and previous config saved to /var/cache/conftool/dbconfig/20230801-093717-ladsgroup.json
09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
09:35 urbanecm@deploy1002: Started scap: Backport for Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192)
09:33 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
09:33 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1076.eqiad.wmnet with OS bullseye
09:32 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
09:21 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
09:15 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
09:12 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
09:11 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
09:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
09:03 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1077.eqiad.wmnet with OS bullseye
09:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
09:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
08:40 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
08:38 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
08:33 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135) (duration: 10m 52s)
08:30 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
08:29 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
08:27 urbanecm@deploy1002: sgimeno and urbanecm: Continuing with sync
08:24 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
08:22 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135)
08:22 moritzm: installing Linux 4.19.289 on Buster hosts
08:17 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
08:17 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
07:49 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
07:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
07:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
07:44 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
07:41 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
07:41 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
07:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 732 hosts
07:37 root@cumin2002: START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 732 hosts
07:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 24 hosts
07:37 root@cumin2002: START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 24 hosts
07:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 1277 hosts
07:36 root@cumin2002: START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 1277 hosts
07:07 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
07:07 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
06:54 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
06:54 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
06:24 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
05:48 kart_: cxserver: Remove Youdao MT service (T329137)
05:46 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:45 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:41 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:26 kart_: Updated cxserver to 2023-07-13-063245-production (T340953)
05:24 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:23 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:18 marostegui: dbmaint s4 testcommonswiki eqiad T343174
05:16 marostegui: dbmaint s4 labswiki (wikitech) eqiad T343175
05:15 marostegui: dbmaint s4 testcommonswiki eqiad T343175
05:12 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:12 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:07 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:06 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
03:57 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.18 (duration: 02m 09s)
03:54 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.20 refs T340248 (duration: 52m 06s)
03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.20 refs T340248
02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49867 and previous config saved to /var/cache/conftool/dbconfig/20230801-023010-ladsgroup.json
02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P49866 and previous config saved to /var/cache/conftool/dbconfig/20230801-021504-ladsgroup.json
01:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P49865 and previous config saved to /var/cache/conftool/dbconfig/20230801-015958-ladsgroup.json
01:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49864 and previous config saved to /var/cache/conftool/dbconfig/20230801-014452-ladsgroup.json
00:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
00:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
00:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
00:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49863 and previous config saved to /var/cache/conftool/dbconfig/20230801-004000-ladsgroup.json
00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P49862 and previous config saved to /var/cache/conftool/dbconfig/20230801-002454-ladsgroup.json
00:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
00:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud nodes DNS and switch config - pt1979@cumin2002"
00:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud nodes DNS and switch config - pt1979@cumin2002"
00:11 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P49861 and previous config saved to /var/cache/conftool/dbconfig/20230801-000948-ladsgroup.json

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s