Jump to content

Server Admin Log/Archive 69

From Wikitech

2023-08-15

  • 23:26 hmonroy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Set wikidiff2 maxSplitSize = 10 on group0 wikis T341754 (duration: 07m 39s)
  • 22:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1012.eqiad.wmnet with OS bullseye
  • 22:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1013.eqiad.wmnet with OS bullseye
  • 21:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: host reimage
  • 21:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: host reimage
  • 21:47 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1013.eqiad.wmnet with reason: host reimage
  • 21:47 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: host reimage
  • 21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1013.eqiad.wmnet with OS bullseye
  • 21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1012.eqiad.wmnet with OS bullseye
  • 21:16 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:16 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pdus - robh@cumin1001"
  • 21:15 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pdus - robh@cumin1001"
  • 21:09 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 21:07 robh@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 21:07 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 20:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3010.esams.wmnet with OS bullseye
  • 20:55 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:54 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3010.esams.wmnet with reason: host reimage
  • 20:36 ebernhardson: T342444 start cirrussearch reindex of all wikis to enable new text analysis components from mwmaint1002
  • 20:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3010.esams.wmnet with reason: host reimage
  • 20:20 ryankemper@deploy1002: Finished scap: Backport for elastic: allow only 1 enwiki_content per host (T343820) (duration: 09m 25s)
  • 20:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3008.esams.wmnet with OS bullseye
  • 20:20 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:19 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3080.esams.wmnet with OS bullseye
  • 20:19 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:18 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3072.esams.wmnet with OS bullseye
  • 20:17 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 20:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3010.esams.wmnet with OS bullseye
  • 20:13 ryankemper@deploy1002: ryankemper: Continuing with sync
  • 20:12 ryankemper@deploy1002: ryankemper: Backport for elastic: allow only 1 enwiki_content per host (T343820) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:11 ryankemper@deploy1002: Started scap: Backport for elastic: allow only 1 enwiki_content per host (T343820)
  • 20:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3008.esams.wmnet with reason: host reimage
  • 20:01 sukhe: running dummy authdns-update
  • 19:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3008.esams.wmnet with reason: host reimage
  • 19:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3080.esams.wmnet with reason: host reimage
  • 19:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
  • 19:51 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3080.esams.wmnet with reason: host reimage
  • 19:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
  • 19:45 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3004.wikimedia.org with OS bullseye
  • 19:45 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 19:44 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 19:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3008.esams.wmnet with OS bullseye
  • 19:32 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "manual trigger - cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002 - brett@cumin2002"
  • 19:32 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "manual trigger - cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002 - brett@cumin2002"
  • 19:31 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:31 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge flink-zk2002 DNS changes - sukhe@cumin2002"
  • 19:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3078.esams.wmnet with OS bullseye
  • 19:31 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 19:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3070.esams.wmnet with OS bullseye
  • 19:30 brett@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 19:30 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge flink-zk2002 DNS changes - sukhe@cumin2002"
  • 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3080.esams.wmnet with OS bullseye
  • 19:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3072.esams.wmnet with OS bullseye
  • 19:28 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 19:26 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 19:26 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3078.esams.wmnet with reason: host reimage
  • 19:03 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
  • 19:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3070.esams.wmnet with reason: host reimage
  • 18:58 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3004.wikimedia.org with reason: host reimage
  • 18:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3078.esams.wmnet with reason: host reimage
  • 18:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3070.esams.wmnet with reason: host reimage
  • 18:37 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3004.wikimedia.org with OS bullseye
  • 18:36 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns3004.wikimedia.org with OS bullseye
  • 18:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3078.esams.wmnet with OS bullseye
  • 18:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3070.esams.wmnet with OS bullseye
  • 18:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3076.esams.wmnet with OS bullseye
  • 18:32 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 18:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3068.esams.wmnet with OS bullseye
  • 18:30 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 18:30 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 18:28 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 18:26 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3004.wikimedia.org with OS bullseye
  • 18:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3003.wikimedia.org with OS bullseye
  • 18:17 fabfur@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3074.esams.wmnet with OS bullseye
  • 18:16 sukhe@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3066.esams.wmnet with OS bullseye
  • 18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 18:11 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 18:10 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 18:09 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 18:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3076.esams.wmnet with reason: host reimage
  • 18:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3068.esams.wmnet with reason: host reimage
  • 18:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3076.esams.wmnet with reason: host reimage
  • 18:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3068.esams.wmnet with reason: host reimage
  • 17:54 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:53 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:48 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3074.esams.wmnet with reason: host reimage
  • 17:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3066.esams.wmnet with reason: host reimage
  • 17:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3074.esams.wmnet with reason: host reimage
  • 17:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3066.esams.wmnet with reason: host reimage
  • 17:42 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
  • 17:42 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3076.esams.wmnet with OS bullseye
  • 17:40 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3068.esams.wmnet with OS bullseye
  • 17:39 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3003.wikimedia.org with reason: host reimage
  • 17:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3074.esams.wmnet with OS bullseye
  • 17:21 brett: Upload libvmod-netmapper 1.9-4 (bookworm) to archive - T342154
  • 17:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3066.esams.wmnet with OS bullseye
  • 17:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bullseye
  • 17:02 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3070']
  • 17:01 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3066']
  • 17:01 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3068']
  • 17:00 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3072']
  • 16:57 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
  • 16:57 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
  • 16:56 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3074']
  • 16:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3066']
  • 16:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3068']
  • 16:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3070']
  • 16:54 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3072']
  • 16:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3076']
  • 16:53 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3010']
  • 16:52 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3078']
  • 16:51 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3080']
  • 16:50 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3074']
  • 16:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3066.mgmt.esams.wmnet with reboot policy FORCED
  • 16:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1098.eqiad.wmnet with OS bullseye
  • 16:48 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp3074']
  • 16:48 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3074']
  • 16:47 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2002.codfw.wmnet
  • 16:47 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3076']
  • 16:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3078']
  • 16:45 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3080']
  • 16:45 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 16:45 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2002.codfw.wmnet
  • 16:45 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3010']
  • 16:44 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3006']
  • 16:43 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns3004']
  • 16:43 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3010']
  • 16:42 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3008']
  • 16:42 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3008']
  • 16:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns3004']
  • 16:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3006']
  • 16:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3008']
  • 16:36 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3008']
  • 16:33 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3010']
  • 16:32 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3066.mgmt.esams.wmnet with reboot policy FORCED
  • 16:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3070.mgmt.esams.wmnet with reboot policy FORCED
  • 16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3068.mgmt.esams.wmnet with reboot policy FORCED
  • 16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3072.mgmt.esams.wmnet with reboot policy FORCED
  • 16:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3074.mgmt.esams.wmnet with reboot policy FORCED
  • 16:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3078.mgmt.esams.wmnet with reboot policy FORCED
  • 16:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3076.mgmt.esams.wmnet with reboot policy FORCED
  • 16:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1098.eqiad.wmnet with reason: host reimage
  • 16:24 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:21 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1098.eqiad.wmnet with reason: host reimage
  • 16:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:20 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3068.mgmt.esams.wmnet with reboot policy FORCED
  • 16:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3070.mgmt.esams.wmnet with reboot policy FORCED
  • 16:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3072.mgmt.esams.wmnet with reboot policy FORCED
  • 16:10 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3074.mgmt.esams.wmnet with reboot policy FORCED
  • 16:10 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3076.mgmt.esams.wmnet with reboot policy FORCED
  • 16:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3080.mgmt.esams.wmnet with reboot policy FORCED
  • 16:09 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3078.mgmt.esams.wmnet with reboot policy FORCED
  • 16:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3010.mgmt.esams.wmnet with reboot policy FORCED
  • 16:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns3004.mgmt.esams.wmnet with reboot policy FORCED
  • 16:08 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3008.mgmt.esams.wmnet with reboot policy FORCED
  • 16:05 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1098.eqiad.wmnet with OS bullseye
  • 16:02 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3006.mgmt.esams.wmnet with reboot policy FORCED
  • 16:02 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3008.mgmt.esams.wmnet with reboot policy FORCED
  • 16:00 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:59 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:58 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 15s)
  • 15:58 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 15:56 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 14s)
  • 15:56 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 15:51 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3080.mgmt.esams.wmnet with reboot policy FORCED
  • 15:51 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns3004.mgmt.esams.wmnet with reboot policy FORCED
  • 15:50 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3006.mgmt.esams.wmnet with reboot policy FORCED
  • 15:50 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3008.mgmt.esams.wmnet with reboot policy FORCED
  • 15:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3009.esams.wmnet with OS bullseye
  • 15:50 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:50 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3008.mgmt.esams.wmnet with reboot policy FORCED
  • 15:49 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:49 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3010.mgmt.esams.wmnet with reboot policy FORCED
  • 15:44 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns3004 - robh@cumin1001"
  • 15:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1015.eqiad.wmnet with OS bullseye
  • 15:44 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns3004 - robh@cumin1001"
  • 15:42 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 15:41 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
  • 15:41 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:40 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:40 bking@cumin1001: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) flink-zk2001.codfw.wmnet on all recursors
  • 15:40 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
  • 15:40 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:40 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:40 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3066
  • 15:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
  • 15:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3066
  • 15:39 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3068
  • 15:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3068
  • 15:39 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:39 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3070
  • 15:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3070
  • 15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3072
  • 15:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3072
  • 15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3074
  • 15:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3074
  • 15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3076
  • 15:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3076
  • 15:38 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3078
  • 15:37 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3078
  • 15:37 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns3004
  • 15:37 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dns3004
  • 15:37 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp3080
  • 15:37 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp3080
  • 15:36 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 15:36 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
  • 15:35 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs3008
  • 15:35 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs3008
  • 15:35 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs3010
  • 15:35 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs3010
  • 15:34 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
  • 15:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack bw27 hosts - robh@cumin1001"
  • 15:32 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack bw27 hosts - robh@cumin1001"
  • 15:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3009.esams.wmnet with reason: host reimage
  • 15:30 bking@cumin1001: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) flink-zk2001.codfw.wmnet on all recursors
  • 15:30 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
  • 15:30 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:29 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 15:29 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 15:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
  • 15:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1014.eqiad.wmnet with OS bullseye
  • 15:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3009.esams.wmnet with reason: host reimage
  • 15:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1097.eqiad.wmnet with OS bullseye
  • 15:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3009.esams.wmnet with OS bullseye
  • 15:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 14:56 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 14:55 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 14:54 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 14:54 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
  • 14:54 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:52 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
  • 14:52 bking@cumin1001: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) flink-zk2001.codfw.wmnet on all recursors
  • 14:52 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
  • 14:52 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 14:51 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs3009.esams.wmnet with OS bullseye
  • 14:51 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 14:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3005.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 14:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1014.eqiad.wmnet with reason: host reimage
  • 14:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1015.eqiad.wmnet with reason: host reimage
  • 14:45 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:45 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
  • 14:43 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti3005.esams.wmnet
  • 14:39 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3005.mgmt.esams.wmnet with reboot policy GRACEFUL
  • 14:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1097.eqiad.wmnet with reason: host reimage
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3007.esams.wmnet with OS bullseye
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 14:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1097.eqiad.wmnet with reason: host reimage
  • 14:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 14:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1014.eqiad.wmnet with OS bullseye
  • 14:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1015.eqiad.wmnet with OS bullseye
  • 14:26 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3081.esams.wmnet with OS bullseye
  • 14:23 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti3005.esams.wmnet
  • 14:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3009.esams.wmnet with OS bullseye
  • 14:17 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1097.eqiad.wmnet with OS bullseye
  • 14:14 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns3003.wikimedia.org with OS bullseye
  • 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 14:07 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti3005.esams.wmnet
  • 14:03 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3081.esams.wmnet with reason: host reimage
  • 14:00 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3081.esams.wmnet with reason: host reimage
  • 13:51 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bullseye
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3007.esams.wmnet with reason: host reimage
  • 13:47 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 13:46 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:45 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3007.esams.wmnet with reason: host reimage
  • 13:45 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 13:45 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:44 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns3003.wikimedia.org with OS bullseye
  • 13:44 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:38 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
  • 13:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 13:29 urbanecm@deploy1002: Finished scap: Backport for Remove knwiktionary tagline (T343662) (duration: 10m 20s)
  • 13:25 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3007.esams.wmnet with OS bullseye
  • 13:24 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:23 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3007']
  • 13:23 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host dns3003.wikimedia.org with OS bullseye
  • 13:23 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:22 urbanecm@deploy1002: urbanecm and anzx: Continuing with sync
  • 13:20 urbanecm@deploy1002: urbanecm and anzx: Backport for Remove knwiktionary tagline (T343662) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:19 urbanecm@deploy1002: Started scap: Backport for Remove knwiktionary tagline (T343662)
  • 13:18 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3007']
  • 13:17 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3007']
  • 13:17 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: enable AddLink backend 13th round of wikis (T308138) (duration: 10m 47s)
  • 13:10 urbanecm@deploy1002: sgimeno and urbanecm: Continuing with sync
  • 13:07 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: enable AddLink backend 13th round of wikis (T308138) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:06 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3007']
  • 13:06 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: enable AddLink backend 13th round of wikis (T308138)
  • 13:05 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3005']
  • 12:56 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3005']
  • 12:47 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:36 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sw - ayounsi@cumin1001"
  • 12:12 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sw - ayounsi@cumin1001"
  • 12:04 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti3007.esams.wmnet with OS bullseye
  • 12:02 sukhe: sukhe@contint2002:~$ sudo systemctl restart zuul: T344238
  • 12:02 sukhe: sukhe@contint2002:~$ sudo systemctl restart zuul
  • 12:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3077.esams.wmnet with OS bullseye
  • 12:02 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 11:54 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 11:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3069.esams.wmnet with OS bullseye
  • 11:54 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 11:53 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3067.esams.wmnet with OS bullseye
  • 11:53 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 11:52 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 11:51 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 11:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3075.esams.wmnet with OS bullseye
  • 11:50 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 11:49 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 11:31 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3077.esams.wmnet with reason: host reimage
  • 11:29 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3069.esams.wmnet with reason: host reimage
  • 11:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3075.esams.wmnet with reason: host reimage
  • 11:26 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3077.esams.wmnet with reason: host reimage
  • 11:26 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3067.esams.wmnet with reason: host reimage
  • 11:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3069.esams.wmnet with reason: host reimage
  • 11:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti3005.esams.wmnet
  • 11:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 11:23 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3075.esams.wmnet with reason: host reimage
  • 11:22 sukhe: sukhe@contint2002:~$ sudo systemctl restart zuul
  • 11:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti3005.esams.wmnet
  • 11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 11:22 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3067.esams.wmnet with reason: host reimage
  • 11:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-db1001.eqiad.wmnet
  • 11:04 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3077.esams.wmnet with OS bullseye
  • 11:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3069.esams.wmnet with OS bullseye
  • 11:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3075.esams.wmnet with OS bullseye
  • 11:00 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3067.esams.wmnet with OS bullseye
  • 10:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
  • 10:56 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1096.eqiad.wmnet with OS bullseye
  • 10:54 sukhe: zuul@contint1002:/srv/zuul/git/operations/puppet$ git fetch --force --tags -v origin
  • 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3007.esams.wmnet with OS bullseye
  • 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3005.esams.wmnet with OS bullseye
  • 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 10:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 10:42 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3079.esams.wmnet with OS bullseye
  • 10:42 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 10:41 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 10:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3071.esams.wmnet with OS bullseye
  • 10:37 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 10:36 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 10:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3073.esams.wmnet with OS bullseye
  • 10:32 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 10:31 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 10:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti3005.esams.wmnet with reason: host reimage
  • 10:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3005.esams.wmnet with reason: host reimage
  • 10:20 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
  • 10:16 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
  • 10:11 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3071.esams.wmnet with reason: host reimage
  • 10:09 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3073.esams.wmnet with reason: host reimage
  • 10:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3071.esams.wmnet with reason: host reimage
  • 10:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3073.esams.wmnet with reason: host reimage
  • 10:01 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bullseye
  • 09:54 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
  • 09:52 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3079.esams.wmnet with OS bullseye
  • 09:52 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
  • 09:51 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti3005.esams.wmnet with OS bullseye
  • 09:50 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3079.esams.wmnet with OS bullseye
  • 09:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3071.esams.wmnet with OS bullseye
  • 09:41 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
  • 09:37 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
  • 09:34 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1096.eqiad.wmnet with OS bullseye
  • 09:30 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3073.esams.wmnet with OS bullseye
  • 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bullseye
  • 09:17 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:15 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
  • 09:15 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3079.esams.wmnet with OS bullseye
  • 09:11 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti3005.esams.wmnet with OS bullseye
  • 09:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bullseye
  • 09:08 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS bullseye
  • 09:05 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:55 klausman: Draining ml2003 for kubelet partition resize
  • 08:46 klausman: Draining ml2002 for kubelet partition resize
  • 08:42 zabe@deploy1002: Finished scap: Backport for Add messages for Pa'O Wiktionary (blkwiktionary) (T343540), Add messages for Sundanese Wikisource (suwikisource) (T343539) (duration: 33m 26s)
  • 08:37 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 08:36 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 08:31 zabe@deploy1002: zabe: Continuing with sync
  • 08:30 zabe@deploy1002: zabe: Backport for Add messages for Pa'O Wiktionary (blkwiktionary) (T343540), Add messages for Sundanese Wikisource (suwikisource) (T343539) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:28 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:16 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:09 zabe@deploy1002: Started scap: Backport for Add messages for Pa'O Wiktionary (blkwiktionary) (T343540), Add messages for Sundanese Wikisource (suwikisource) (T343539)
  • 07:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cr2-esams mgmt - ayounsi@cumin1001"
  • 07:55 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cr2-esams mgmt - ayounsi@cumin1001"
  • 07:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:49 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 07:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:33 taavi@deploy1002: Finished scap: Backport for Enable EditInSequence on all wikisources (T308098) (duration: 13m 29s)
  • 07:29 gehel: restarting wmf_auto_restart_prometheus-blazegraph-exporter-wdqs-blazegraph on wdqs2012
  • 07:27 taavi@deploy1002: soda and taavi: Continuing with sync
  • 07:21 taavi@deploy1002: soda and taavi: Backport for Enable EditInSequence on all wikisources (T308098) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
  • 07:20 taavi@deploy1002: Started scap: Backport for Enable EditInSequence on all wikisources (T308098)
  • 07:18 taavi@deploy1002: Finished scap: Backport for jawiki: reassign the changetags user right (T344150) (duration: 11m 05s)
  • 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
  • 07:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cp3081 - ayounsi@cumin1001"
  • 07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet
  • 07:15 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cp3081 - ayounsi@cumin1001"
  • 07:12 taavi@deploy1002: anzx and taavi: Continuing with sync
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
  • 07:08 taavi@deploy1002: anzx and taavi: Backport for jawiki: reassign the changetags user right (T344150) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:07 taavi@deploy1002: Started scap: Backport for jawiki: reassign the changetags user right (T344150)
  • 07:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists2001.codfw.wmnet
  • 07:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:05 taavi@deploy1002: Finished scap: Backport for clienthints: Collect Client Hints data on group0 wikis (T341110) (duration: 15m 23s)
  • 07:04 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:03 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists2001.codfw.wmnet
  • 06:59 taavi@deploy1002: taavi and dreamyjazz: Continuing with sync
  • 06:57 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
  • 06:52 taavi@deploy1002: taavi and dreamyjazz: Backport for clienthints: Collect Client Hints data on group0 wikis (T341110) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 06:50 taavi@deploy1002: Started scap: Backport for clienthints: Collect Client Hints data on group0 wikis (T341110)
  • 04:34 taavi@deploy1002: Finished scap: Backport for Add a comment why PdfHandler does not use Shellbox (duration: 08m 24s)
  • 04:28 taavi@deploy1002: taavi: Continuing with sync
  • 04:28 taavi@deploy1002: taavi: Backport for Add a comment why PdfHandler does not use Shellbox synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 04:26 taavi@deploy1002: Started scap: Backport for Add a comment why PdfHandler does not use Shellbox
  • 03:58 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.19 (duration: 02m 13s)
  • 03:56 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.22 refs T343724 (duration: 53m 42s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.22 refs T343724
  • 01:54 eileen: config revision changed from a61171bc to a05a2a82
  • 01:51 eileen: civicrm upgraded from 16c2e58a to 5e631101
  • 01:39 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
  • 01:39 eileen: config revision changed from 2d598716 to a61171bc
  • 01:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
  • 01:01 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
  • 00:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye

2023-08-14

  • 23:40 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1030.eqiad.wmnet
  • 23:31 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1030.eqiad.wmnet
  • 22:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
  • 22:41 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs3009']
  • 22:35 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs3009']
  • 22:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 22:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
  • 22:23 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs3009.mgmt.esams.wmnet with reboot policy FORCED
  • 22:19 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:19 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs3009 - robh@cumin1001"
  • 22:18 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs3009 - robh@cumin1001"
  • 22:09 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 22:02 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3005']
  • 22:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3007.mgmt.esams.wmnet with reboot policy FORCED
  • 21:59 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3075']
  • 21:56 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3005']
  • 21:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
  • 21:55 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti3005']
  • 21:55 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3005']
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3067']
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3073']
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3077']
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti3005.mgmt.esams.wmnet with reboot policy FORCED
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3069']
  • 21:54 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns3003']
  • 21:53 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3071']
  • 21:48 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns3003']
  • 21:47 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3067']
  • 21:47 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3069']
  • 21:47 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3071']
  • 21:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3073']
  • 21:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3075']
  • 21:46 urandom: upgrading Cassandra to 4.1.1, restbase10[18,25-27,30,33]-{a,b,c} (eqiad/row D) — T339298
  • 21:46 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3077']
  • 21:43 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3007.mgmt.esams.wmnet with reboot policy FORCED
  • 21:43 robh@cumin1001: START - Cookbook sre.hosts.provision for host ganeti3005.mgmt.esams.wmnet with reboot policy FORCED
  • 21:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3067.mgmt.esams.wmnet with reboot policy FORCED
  • 21:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns3003.mgmt.esams.wmnet with reboot policy FORCED
  • 21:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3069.mgmt.esams.wmnet with reboot policy FORCED
  • 21:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3071.mgmt.esams.wmnet with reboot policy FORCED
  • 21:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3073.mgmt.esams.wmnet with reboot policy FORCED
  • 21:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3075.mgmt.esams.wmnet with reboot policy FORCED
  • 21:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3077.mgmt.esams.wmnet with reboot policy FORCED
  • 21:35 maryum: security deploy for T341529
  • 21:27 urandom: upgrading Cassandra to 4.1.1, restbase10[17,22-24,29,32]-{a,b,c} (eqiad/row B) — T339298
  • 21:22 robh@cumin1001: START - Cookbook sre.hosts.provision for host dns3003.mgmt.esams.wmnet with reboot policy FORCED
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3067.mgmt.esams.wmnet with reboot policy FORCED
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3069.mgmt.esams.wmnet with reboot policy FORCED
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3071.mgmt.esams.wmnet with reboot policy FORCED
  • 21:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3073.mgmt.esams.wmnet with reboot policy FORCED
  • 21:19 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3075.mgmt.esams.wmnet with reboot policy FORCED
  • 21:18 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3077.mgmt.esams.wmnet with reboot policy FORCED
  • 21:11 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:11 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new hosts in by27 - robh@cumin1001"
  • 21:10 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new hosts in by27 - robh@cumin1001"
  • 21:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
  • 21:08 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS bullseye
  • 21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 20:42 urbanecm@deploy1002: Finished scap: Backport for Config changes for new Android schema (duration: 13m 36s)
  • 20:35 urbanecm@deploy1002: urbanecm and sharvaniharan: Continuing with sync
  • 20:33 urandom: upgrading Cassandra to 4.1.1, restbase10[19-21,28,31]-{a,b,c} (eqiad/row A) — T339298
  • 20:30 urbanecm@deploy1002: urbanecm and sharvaniharan: Backport for Config changes for new Android schema synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:28 urbanecm@deploy1002: Started scap: Backport for Config changes for new Android schema
  • 20:25 urbanecm@deploy1002: Finished scap: Backport for NewcomerTasksLogFactory: Use getName(), not getDbKey() (T344163) (duration: 09m 08s)
  • 20:18 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 20:18 urbanecm@deploy1002: urbanecm: Backport for NewcomerTasksLogFactory: Use getName(), not getDbKey() (T344163) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:17 urandom: upgrading Cassandra to 4.1.1, restbase20[12,17-18,23,26-27]-{a,b,c} (codfw/row C) — T339298
  • 20:16 urbanecm@deploy1002: Started scap: Backport for NewcomerTasksLogFactory: Use getName(), not getDbKey() (T344163)
  • 19:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS bullseye
  • 19:57 urandom: upgrading Cassandra to 4.1.1, restbase20[15,16,20,22,25]-{a,b,c} (codfw/row C) — T339298
  • 19:52 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3079']
  • 19:45 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3079']
  • 19:45 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3079.mgmt.esams.wmnet with reboot policy FORCED
  • 19:44 robh@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp3081']
  • 19:43 urandom: upgrading Cassandra to 4.1.1, restbase2024-{a,b,c} — T339298
  • 19:38 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3081']
  • 19:38 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp3081']
  • 19:37 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp3081']
  • 19:34 urandom: upgrading Cassandra to 4.1.1, restbase2021-{a,b,c} — T339298
  • 19:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 19:31 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3079.mgmt.esams.wmnet with reboot policy FORCED
  • 19:24 urandom: upgrading Cassandra to 4.1.1, restbase2019-{a,b,c} — T339298
  • 19:16 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 19:11 urandom: upgrading Cassandra to 4.1.1, restbase2014-{a,b,c} — T339298
  • 18:45 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 18:45 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 18:43 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 18:43 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp3081.mgmt.esams.wmnet with reboot policy FORCED
  • 18:38 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:38 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge cp3081 and cp3079 - sukhe@cumin2002"
  • 18:37 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge cp3081 and cp3079 - sukhe@cumin2002"
  • 18:23 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 17:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1095.eqiad.wmnet with OS bullseye
  • 17:41 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:39 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1095.eqiad.wmnet with reason: host reimage
  • 17:15 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1095.eqiad.wmnet with reason: host reimage
  • 17:02 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1095.eqiad.wmnet with OS bullseye
  • 16:58 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:57 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 100%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50580 and previous config saved to /var/cache/conftool/dbconfig/20230814-164727-root.json
  • 16:42 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1094.eqiad.wmnet with OS bullseye
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 75%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50579 and previous config saved to /var/cache/conftool/dbconfig/20230814-163222-root.json
  • 16:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:28 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename cr3-knams to cr2-esams - cmooney@cumin1001"
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 50%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50578 and previous config saved to /var/cache/conftool/dbconfig/20230814-161718-root.json
  • 16:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1094.eqiad.wmnet with reason: host reimage
  • 16:11 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1094.eqiad.wmnet with reason: host reimage
  • 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 25%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50577 and previous config saved to /var/cache/conftool/dbconfig/20230814-160213-root.json
  • 16:01 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cr2-esams.wikimedia.org on all recursors
  • 16:00 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache cr2-esams.wikimedia.org on all recursors
  • 15:58 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1094.eqiad.wmnet with OS bullseye
  • 15:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1093.eqiad.wmnet with OS bullseye
  • 15:53 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename cr3-knams to cr2-esams - cmooney@cumin1001"
  • 15:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 10%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50576 and previous config saved to /var/cache/conftool/dbconfig/20230814-154708-root.json
  • 15:47 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:46 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 15s)
  • 15:45 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 15:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:36 urandom: upgrading Cassandra to 4.1.1, restbase1016-{a,b,c} — T339298
  • 15:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1093.eqiad.wmnet with reason: host reimage
  • 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 5%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50575 and previous config saved to /var/cache/conftool/dbconfig/20230814-153203-root.json
  • 15:30 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 43s)
  • 15:29 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 15:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1093.eqiad.wmnet with reason: host reimage
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 3%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50574 and previous config saved to /var/cache/conftool/dbconfig/20230814-151659-root.json
  • 15:16 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1093.eqiad.wmnet with OS bullseye
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2025 (re)pooling @ 1%: Repooling after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P50572 and previous config saved to /var/cache/conftool/dbconfig/20230814-150154-root.json
  • 14:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2012.codfw.wmnet with OS bullseye
  • 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS bookworm
  • 14:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
  • 14:34 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided) (duration: 00m 00s)
  • 14:34 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided)
  • 14:33 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided) (duration: 00m 03s)
  • 14:33 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided)
  • 14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 14:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided) (duration: 00m 00s)
  • 14:30 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb] (eqiad): (no justification provided)
  • 14:27 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb]: (no justification provided) (duration: 00m 01s)
  • 14:27 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb]: (no justification provided)
  • 14:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 14:26 sukhe: running authdns-update for CR 948195: T344073
  • 14:26 sukhe: running authdns-update for CR 948195
  • 14:25 jgiannelos@deploy1002: deploy aborted: (no justification provided) (duration: 00m 10s)
  • 14:25 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@ee544cb]: (no justification provided)
  • 14:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
  • 14:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
  • 14:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum1002.eqiad.wmnet with OS bookworm
  • 14:05 urandom: upgrading Cassandra to 4.1.1, restbase2013-{a,b,c} — T339298
  • 14:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2012.codfw.wmnet with reason: host reimage
  • 14:01 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2012.codfw.wmnet with reason: host reimage
  • 13:53 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
  • 13:40 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2012.codfw.wmnet with OS bullseye
  • 13:27 derick@deploy1002: Finished scap: Backport for wmf-config: Remove wgContentTranslationDefaultParsoidClient cleanup (duration: 16m 56s)
  • 13:20 derick@deploy1002: d3r1ck01 and derick: Continuing with sync
  • 13:19 derick@deploy1002: d3r1ck01 and derick: Backport for wmf-config: Remove wgContentTranslationDefaultParsoidClient cleanup synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:10 derick@deploy1002: Started scap: Backport for wmf-config: Remove wgContentTranslationDefaultParsoidClient cleanup
  • 13:08 derick@deploy1002: Backport cancelled.
  • 11:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
  • 11:23 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
  • 11:21 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
  • 11:16 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mr1-esams oob - ayounsi@cumin1001"
  • 11:13 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:09 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1007.eqiad.wmnet
  • 11:09 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-airflow1007.eqiad.wmnet with OS buster
  • 10:54 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1007.eqiad.wmnet with reason: host reimage
  • 10:51 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1007.eqiad.wmnet with reason: host reimage
  • 10:40 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-airflow1007.eqiad.wmnet with OS buster
  • 10:39 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-airflow1007.eqiad.wmnet - stevemunene@cumin1001"
  • 10:39 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:39 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:38 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-airflow1007.eqiad.wmnet - stevemunene@cumin1001"
  • 10:38 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:38 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1007.eqiad.wmnet on all recursors
  • 10:38 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1007.eqiad.wmnet on all recursors
  • 10:36 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:34 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:33 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:30 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:26 stevemunene@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:25 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
  • 10:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams-new to mr1-esams in dns. - cmooney@cumin1001"
  • 10:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1009.eqiad.wmnet
  • 10:13 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 10:13 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1007.eqiad.wmnet
  • 10:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1009.eqiad.wmnet
  • 09:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1124.eqiad.wmnet
  • 09:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
  • 09:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1007.eqiad.wmnet
  • 09:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:41 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1007.eqiad.wmnet
  • 09:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1006.eqiad.wmnet
  • 09:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:32 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 09:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1006.eqiad.wmnet
  • 09:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1005.eqiad.wmnet
  • 09:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1124.eqiad.wmnet
  • 09:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1005.eqiad.wmnet
  • 09:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1004.eqiad.wmnet
  • 09:11 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:11 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams dns to mr1-eams-old. - cmooney@cumin1001"
  • 09:10 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mr1-esams dns to mr1-eams-old. - cmooney@cumin1001"
  • 09:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:02 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1004.eqiad.wmnet

2023-08-13

  • 16:07 topranks: powering down cr3-esams
  • 16:05 topranks: powering down cr2-esams
  • 15:54 topranks: Disabling esams peering at AMS-IX prior to removing router
  • 15:45 topranks: Disable transport cct cr2-esams to cr2-eqiad prior to disconnect T329219
  • 15:26 topranks: disable transit and peering links on cr2-esams & cr3-esams before decom T329219

2023-08-12

  • 08:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T342617)', diff saved to https://phabricator.wikimedia.org/P50569 and previous config saved to /var/cache/conftool/dbconfig/20230812-082511-ladsgroup.json
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P50568 and previous config saved to /var/cache/conftool/dbconfig/20230812-081005-ladsgroup.json
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P50567 and previous config saved to /var/cache/conftool/dbconfig/20230812-075459-ladsgroup.json
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T342617)', diff saved to https://phabricator.wikimedia.org/P50566 and previous config saved to /var/cache/conftool/dbconfig/20230812-073953-ladsgroup.json
  • 05:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T342617)', diff saved to https://phabricator.wikimedia.org/P50565 and previous config saved to /var/cache/conftool/dbconfig/20230812-055651-ladsgroup.json
  • 05:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 05:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T342617)', diff saved to https://phabricator.wikimedia.org/P50564 and previous config saved to /var/cache/conftool/dbconfig/20230812-050127-ladsgroup.json
  • 04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P50563 and previous config saved to /var/cache/conftool/dbconfig/20230812-044621-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 04:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T342617)', diff saved to https://phabricator.wikimedia.org/P50562 and previous config saved to /var/cache/conftool/dbconfig/20230812-043724-ladsgroup.json
  • 04:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P50561 and previous config saved to /var/cache/conftool/dbconfig/20230812-043115-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P50560 and previous config saved to /var/cache/conftool/dbconfig/20230812-042217-ladsgroup.json
  • 04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T342617)', diff saved to https://phabricator.wikimedia.org/P50559 and previous config saved to /var/cache/conftool/dbconfig/20230812-041608-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P50558 and previous config saved to /var/cache/conftool/dbconfig/20230812-040711-ladsgroup.json
  • 03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T342617)', diff saved to https://phabricator.wikimedia.org/P50557 and previous config saved to /var/cache/conftool/dbconfig/20230812-035205-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T342617)', diff saved to https://phabricator.wikimedia.org/P50556 and previous config saved to /var/cache/conftool/dbconfig/20230812-023441-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50555 and previous config saved to /var/cache/conftool/dbconfig/20230812-023419-ladsgroup.json
  • 02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P50554 and previous config saved to /var/cache/conftool/dbconfig/20230812-021913-ladsgroup.json
  • 02:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P50553 and previous config saved to /var/cache/conftool/dbconfig/20230812-020407-ladsgroup.json
  • 01:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T342617)', diff saved to https://phabricator.wikimedia.org/P50552 and previous config saved to /var/cache/conftool/dbconfig/20230812-015910-ladsgroup.json
  • 01:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 01:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T342617)', diff saved to https://phabricator.wikimedia.org/P50551 and previous config saved to /var/cache/conftool/dbconfig/20230812-015849-ladsgroup.json
  • 01:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50550 and previous config saved to /var/cache/conftool/dbconfig/20230812-014901-ladsgroup.json
  • 01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P50549 and previous config saved to /var/cache/conftool/dbconfig/20230812-014342-ladsgroup.json
  • 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P50548 and previous config saved to /var/cache/conftool/dbconfig/20230812-012836-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T342617)', diff saved to https://phabricator.wikimedia.org/P50547 and previous config saved to /var/cache/conftool/dbconfig/20230812-011330-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50546 and previous config saved to /var/cache/conftool/dbconfig/20230812-000623-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T342617)', diff saved to https://phabricator.wikimedia.org/P50545 and previous config saved to /var/cache/conftool/dbconfig/20230812-000602-ladsgroup.json

2023-08-11

  • 23:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P50544 and previous config saved to /var/cache/conftool/dbconfig/20230811-235056-ladsgroup.json
  • 23:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P50543 and previous config saved to /var/cache/conftool/dbconfig/20230811-233549-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T342617)', diff saved to https://phabricator.wikimedia.org/P50542 and previous config saved to /var/cache/conftool/dbconfig/20230811-233320-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 23:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T342617)', diff saved to https://phabricator.wikimedia.org/P50541 and previous config saved to /var/cache/conftool/dbconfig/20230811-233259-ladsgroup.json
  • 23:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T342617)', diff saved to https://phabricator.wikimedia.org/P50540 and previous config saved to /var/cache/conftool/dbconfig/20230811-232043-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P50539 and previous config saved to /var/cache/conftool/dbconfig/20230811-231753-ladsgroup.json
  • 23:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P50538 and previous config saved to /var/cache/conftool/dbconfig/20230811-230247-ladsgroup.json
  • 22:49 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T342617)', diff saved to https://phabricator.wikimedia.org/P50537 and previous config saved to /var/cache/conftool/dbconfig/20230811-224741-ladsgroup.json
  • 22:06 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 22:03 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 22:02 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:00 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:57 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:49 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:48 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T342617)', diff saved to https://phabricator.wikimedia.org/P50536 and previous config saved to /var/cache/conftool/dbconfig/20230811-214142-ladsgroup.json
  • 21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50535 and previous config saved to /var/cache/conftool/dbconfig/20230811-214105-ladsgroup.json
  • 21:28 andrewbogott: rebooting wikitech-static-ord via rackspace UI
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P50534 and previous config saved to /var/cache/conftool/dbconfig/20230811-212559-ladsgroup.json
  • 21:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P50533 and previous config saved to /var/cache/conftool/dbconfig/20230811-211053-ladsgroup.json
  • 21:10 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:10 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 21:08 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 21:06 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:06 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T342617)', diff saved to https://phabricator.wikimedia.org/P50532 and previous config saved to /var/cache/conftool/dbconfig/20230811-210102-ladsgroup.json
  • 21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T342617)', diff saved to https://phabricator.wikimedia.org/P50531 and previous config saved to /var/cache/conftool/dbconfig/20230811-210024-ladsgroup.json
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50530 and previous config saved to /var/cache/conftool/dbconfig/20230811-205546-ladsgroup.json
  • 20:48 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:46 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 12s)
  • 20:46 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 20:46 bking@deploy1002: deploy aborted: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 02m 44s)
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P50529 and previous config saved to /var/cache/conftool/dbconfig/20230811-204517-ladsgroup.json
  • 20:43 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 20:31 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2011.codfw.wmnet with OS bullseye
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P50528 and previous config saved to /var/cache/conftool/dbconfig/20230811-203011-ladsgroup.json
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T342617)', diff saved to https://phabricator.wikimedia.org/P50527 and previous config saved to /var/cache/conftool/dbconfig/20230811-201505-ladsgroup.json
  • 20:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2011.codfw.wmnet with reason: host reimage
  • 20:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2011.codfw.wmnet with reason: host reimage
  • 20:02 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:02 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:02 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 41s)
  • 20:02 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:01 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 19:44 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2011.codfw.wmnet with OS bullseye
  • 19:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 19:37 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 19:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2010.codfw.wmnet with OS bullseye
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50526 and previous config saved to /var/cache/conftool/dbconfig/20230811-191548-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50525 and previous config saved to /var/cache/conftool/dbconfig/20230811-191527-ladsgroup.json
  • 19:06 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2010.codfw.wmnet with reason: host reimage
  • 19:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS bookworm
  • 19:03 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2010.codfw.wmnet with reason: host reimage
  • 19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T342617)', diff saved to https://phabricator.wikimedia.org/P50524 and previous config saved to /var/cache/conftool/dbconfig/20230811-190208-ladsgroup.json
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P50523 and previous config saved to /var/cache/conftool/dbconfig/20230811-190021-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P50522 and previous config saved to /var/cache/conftool/dbconfig/20230811-184701-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P50521 and previous config saved to /var/cache/conftool/dbconfig/20230811-184514-ladsgroup.json
  • 18:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2010.codfw.wmnet with OS bullseye
  • 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T342617)', diff saved to https://phabricator.wikimedia.org/P50520 and previous config saved to /var/cache/conftool/dbconfig/20230811-183431-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T342617)', diff saved to https://phabricator.wikimedia.org/P50519 and previous config saved to /var/cache/conftool/dbconfig/20230811-183410-ladsgroup.json
  • 18:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P50518 and previous config saved to /var/cache/conftool/dbconfig/20230811-183155-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50517 and previous config saved to /var/cache/conftool/dbconfig/20230811-183008-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P50516 and previous config saved to /var/cache/conftool/dbconfig/20230811-181904-ladsgroup.json
  • 18:17 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum1002.eqiad.wmnet with OS bookworm
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T342617)', diff saved to https://phabricator.wikimedia.org/P50515 and previous config saved to /var/cache/conftool/dbconfig/20230811-181649-ladsgroup.json
  • 18:14 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:14 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 18:12 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 18:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:08 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:05 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P50514 and previous config saved to /var/cache/conftool/dbconfig/20230811-180358-ladsgroup.json
  • 18:02 sukhe: reload icinga on alert1001
  • 17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T342617)', diff saved to https://phabricator.wikimedia.org/P50513 and previous config saved to /var/cache/conftool/dbconfig/20230811-174851-ladsgroup.json
  • 17:43 topranks: removing routing for former ns2.wikimedia.org IP 91.198.174.239 from esams CRs T343942
  • 17:33 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 44s)
  • 17:32 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 17:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=97) rolling restart_daemons on A:wikidough and A:wikidough
  • 17:17 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:13 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
  • 17:07 sukhe: running agent on dns-rec to remove old ns2 IP
  • 16:52 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T342617)', diff saved to https://phabricator.wikimedia.org/P50512 and previous config saved to /var/cache/conftool/dbconfig/20230811-165033-ladsgroup.json
  • 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T342617)', diff saved to https://phabricator.wikimedia.org/P50511 and previous config saved to /var/cache/conftool/dbconfig/20230811-165013-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P50510 and previous config saved to /var/cache/conftool/dbconfig/20230811-163506-ladsgroup.json
  • 16:32 sukhe: running dummy authdns-update
  • 16:27 sukhe: running agent on A:dns-rec to remove ns2-v4 IP: T329219
  • 16:23 sukhe: running dummy authdns-update
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P50508 and previous config saved to /var/cache/conftool/dbconfig/20230811-161959-ladsgroup.json
  • 16:17 sukhe: running agent on A:cumin or A:dns-rec or A:netbox to remove dns300x from authdns_servers: T329219
  • 16:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1001.eqiad.wmnet with OS bookworm
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T342617)', diff saved to https://phabricator.wikimedia.org/P50507 and previous config saved to /var/cache/conftool/dbconfig/20230811-161025-ladsgroup.json
  • 16:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T342617)', diff saved to https://phabricator.wikimedia.org/P50506 and previous config saved to /var/cache/conftool/dbconfig/20230811-160953-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T342617)', diff saved to https://phabricator.wikimedia.org/P50505 and previous config saved to /var/cache/conftool/dbconfig/20230811-160453-ladsgroup.json
  • 15:54 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P50504 and previous config saved to /var/cache/conftool/dbconfig/20230811-155447-ladsgroup.json
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P50503 and previous config saved to /var/cache/conftool/dbconfig/20230811-153941-ladsgroup.json
  • 15:37 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124 (duration: 00m 22s)
  • 15:37 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: deploying WDQS on newly-reimaged Bullseye hosts T343124
  • 15:27 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:27 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 15:26 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T342617)', diff saved to https://phabricator.wikimedia.org/P50502 and previous config saved to /var/cache/conftool/dbconfig/20230811-152433-ladsgroup.json
  • 15:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:23 inflatador: bking@deploy1002 'deploying WDQS on newly-reimaged Bullseye hosts T343124'
  • 15:18 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:18 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 42s)
  • 15:17 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
  • 15:09 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 15:08 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update DNS additions esams move. - cmooney@cumin1001"
  • 15:07 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2009.codfw.wmnet with OS bullseye
  • 15:05 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:05 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:03 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2008.codfw.wmnet with OS bullseye
  • 15:02 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 50s)
  • 15:01 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
  • 15:01 bking@deploy1002: deploy aborted: f1a6177 (duration: 00m 05s)
  • 15:01 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
  • 14:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs[2008-2009].codfw.wmnet with reason: T343124
  • 14:53 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs[2008-2009].codfw.wmnet with reason: T343124
  • 14:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
  • 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 14:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
  • 14:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 14:41 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
  • 14:40 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
  • 14:31 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum1001.eqiad.wmnet with OS bookworm
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T342617)', diff saved to https://phabricator.wikimedia.org/P50501 and previous config saved to /var/cache/conftool/dbconfig/20230811-142611-ladsgroup.json
  • 14:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 14:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T342617)', diff saved to https://phabricator.wikimedia.org/P50500 and previous config saved to /var/cache/conftool/dbconfig/20230811-142550-ladsgroup.json
  • 14:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2008.codfw.wmnet with OS bullseye
  • 14:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS bullseye
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P50496 and previous config saved to /var/cache/conftool/dbconfig/20230811-141043-ladsgroup.json
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P50494 and previous config saved to /var/cache/conftool/dbconfig/20230811-135537-ladsgroup.json
  • 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T342617)', diff saved to https://phabricator.wikimedia.org/P50493 and previous config saved to /var/cache/conftool/dbconfig/20230811-134804-ladsgroup.json
  • 13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:42 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T342617)', diff saved to https://phabricator.wikimedia.org/P50492 and previous config saved to /var/cache/conftool/dbconfig/20230811-134030-ladsgroup.json
  • 13:22 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:01 fabfur@cumin1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet
  • 12:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 12:05 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS bullseye
  • 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T342617)', diff saved to https://phabricator.wikimedia.org/P50490 and previous config saved to /var/cache/conftool/dbconfig/20230811-120211-ladsgroup.json
  • 12:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T342617)', diff saved to https://phabricator.wikimedia.org/P50489 and previous config saved to /var/cache/conftool/dbconfig/20230811-120150-ladsgroup.json
  • 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P50486 and previous config saved to /var/cache/conftool/dbconfig/20230811-114644-ladsgroup.json
  • 11:44 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 11:41 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 11:36 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on 29 hosts with reason: Downtime esams hosts prior to migration week.
  • 11:35 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on 29 hosts with reason: Downtime esams hosts prior to migration week.
  • 11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P50485 and previous config saved to /var/cache/conftool/dbconfig/20230811-113138-ladsgroup.json
  • 11:26 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on 16 hosts with reason: Downtime esams network kit prior to migration week.
  • 11:26 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on 16 hosts with reason: Downtime esams network kit prior to migration week.
  • 11:21 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS bullseye
  • 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T342617)', diff saved to https://phabricator.wikimedia.org/P50484 and previous config saved to /var/cache/conftool/dbconfig/20230811-111631-ladsgroup.json
  • 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 10:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 10:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T342617)', diff saved to https://phabricator.wikimedia.org/P50482 and previous config saved to /var/cache/conftool/dbconfig/20230811-104210-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P50481 and previous config saved to /var/cache/conftool/dbconfig/20230811-102704-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T342617)', diff saved to https://phabricator.wikimedia.org/P50480 and previous config saved to /var/cache/conftool/dbconfig/20230811-102009-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T342617)', diff saved to https://phabricator.wikimedia.org/P50479 and previous config saved to /var/cache/conftool/dbconfig/20230811-101930-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P50478 and previous config saved to /var/cache/conftool/dbconfig/20230811-101157-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P50477 and previous config saved to /var/cache/conftool/dbconfig/20230811-100424-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T342617)', diff saved to https://phabricator.wikimedia.org/P50476 and previous config saved to /var/cache/conftool/dbconfig/20230811-095651-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P50475 and previous config saved to /var/cache/conftool/dbconfig/20230811-094918-ladsgroup.json
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T342617)', diff saved to https://phabricator.wikimedia.org/P50474 and previous config saved to /var/cache/conftool/dbconfig/20230811-094118-ladsgroup.json
  • 09:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T342617)', diff saved to https://phabricator.wikimedia.org/P50473 and previous config saved to /var/cache/conftool/dbconfig/20230811-093412-ladsgroup.json
  • 09:31 topranks: Withdrawing anycast prefixes 198.35.27.0/24 (authdns), 185.71.138.0/24 & 2001:67c:930::/48 (wikidough) from esams/knams in BGP
  • 09:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:00 topranks: depool esams site until next week for knams POP migration / rebuild
  • 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:59 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:59 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:34 moritzm: installing intel-microcode security updates on bookworm/bullseye
  • 08:32 elukey: expand kubelet partition on ml-serve2001 - T339231
  • 08:31 elukey: restart kubelet on ml-serve1001 - T343900
  • 08:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T342617)', diff saved to https://phabricator.wikimedia.org/P50472 and previous config saved to /var/cache/conftool/dbconfig/20230811-081815-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T342617)', diff saved to https://phabricator.wikimedia.org/P50471 and previous config saved to /var/cache/conftool/dbconfig/20230811-081139-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T342617)', diff saved to https://phabricator.wikimedia.org/P50470 and previous config saved to /var/cache/conftool/dbconfig/20230811-081118-ladsgroup.json
  • 08:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve2001.codfw.wmnet with reason: Expand the kubelet disk partition
  • 08:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve2001.codfw.wmnet with reason: Expand the kubelet disk partition
  • 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P50469 and previous config saved to /var/cache/conftool/dbconfig/20230811-080309-ladsgroup.json
  • 07:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki1001.eqiad.wmnet
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P50468 and previous config saved to /var/cache/conftool/dbconfig/20230811-075612-ladsgroup.json
  • 07:54 ayounsi@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM rpki1001.eqiad.wmnet
  • 07:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki2002.codfw.wmnet
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P50467 and previous config saved to /var/cache/conftool/dbconfig/20230811-074803-ladsgroup.json
  • 07:47 ayounsi@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM rpki2002.codfw.wmnet
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P50466 and previous config saved to /var/cache/conftool/dbconfig/20230811-074105-ladsgroup.json
  • 07:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T342617)', diff saved to https://phabricator.wikimedia.org/P50465 and previous config saved to /var/cache/conftool/dbconfig/20230811-073257-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T342617)', diff saved to https://phabricator.wikimedia.org/P50464 and previous config saved to /var/cache/conftool/dbconfig/20230811-072559-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T342617)', diff saved to https://phabricator.wikimedia.org/P50463 and previous config saved to /var/cache/conftool/dbconfig/20230811-061250-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P50462 and previous config saved to /var/cache/conftool/dbconfig/20230811-055744-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T342617)', diff saved to https://phabricator.wikimedia.org/P50461 and previous config saved to /var/cache/conftool/dbconfig/20230811-054649-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T342617)', diff saved to https://phabricator.wikimedia.org/P50460 and previous config saved to /var/cache/conftool/dbconfig/20230811-054628-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P50459 and previous config saved to /var/cache/conftool/dbconfig/20230811-054238-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T342617)', diff saved to https://phabricator.wikimedia.org/P50458 and previous config saved to /var/cache/conftool/dbconfig/20230811-053847-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 05:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T342617)', diff saved to https://phabricator.wikimedia.org/P50457 and previous config saved to /var/cache/conftool/dbconfig/20230811-053826-ladsgroup.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P50456 and previous config saved to /var/cache/conftool/dbconfig/20230811-053122-ladsgroup.json
  • 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T342617)', diff saved to https://phabricator.wikimedia.org/P50455 and previous config saved to /var/cache/conftool/dbconfig/20230811-052731-ladsgroup.json
  • 05:23 oblivian@deploy1002: Synchronized private/PrivateSettings.php: Adding proxy vendors (duration: 07m 33s)
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P50454 and previous config saved to /var/cache/conftool/dbconfig/20230811-052320-ladsgroup.json
  • 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P50453 and previous config saved to /var/cache/conftool/dbconfig/20230811-051616-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P50452 and previous config saved to /var/cache/conftool/dbconfig/20230811-050814-ladsgroup.json
  • 05:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T342617)', diff saved to https://phabricator.wikimedia.org/P50451 and previous config saved to /var/cache/conftool/dbconfig/20230811-050110-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T342617)', diff saved to https://phabricator.wikimedia.org/P50450 and previous config saved to /var/cache/conftool/dbconfig/20230811-045307-ladsgroup.json
  • 03:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T342617)', diff saved to https://phabricator.wikimedia.org/P50449 and previous config saved to /var/cache/conftool/dbconfig/20230811-031400-ladsgroup.json
  • 03:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 03:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T342617)', diff saved to https://phabricator.wikimedia.org/P50448 and previous config saved to /var/cache/conftool/dbconfig/20230811-031339-ladsgroup.json
  • 03:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T342617)', diff saved to https://phabricator.wikimedia.org/P50447 and previous config saved to /var/cache/conftool/dbconfig/20230811-030454-ladsgroup.json
  • 03:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 03:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 03:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T342617)', diff saved to https://phabricator.wikimedia.org/P50446 and previous config saved to /var/cache/conftool/dbconfig/20230811-030433-ladsgroup.json
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P50445 and previous config saved to /var/cache/conftool/dbconfig/20230811-025833-ladsgroup.json
  • 02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P50444 and previous config saved to /var/cache/conftool/dbconfig/20230811-024927-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P50443 and previous config saved to /var/cache/conftool/dbconfig/20230811-024327-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P50442 and previous config saved to /var/cache/conftool/dbconfig/20230811-023420-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T342617)', diff saved to https://phabricator.wikimedia.org/P50441 and previous config saved to /var/cache/conftool/dbconfig/20230811-022820-ladsgroup.json
  • 02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T342617)', diff saved to https://phabricator.wikimedia.org/P50440 and previous config saved to /var/cache/conftool/dbconfig/20230811-021914-ladsgroup.json
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T342617)', diff saved to https://phabricator.wikimedia.org/P50439 and previous config saved to /var/cache/conftool/dbconfig/20230811-020724-ladsgroup.json
  • 02:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 02:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T342617)', diff saved to https://phabricator.wikimedia.org/P50438 and previous config saved to /var/cache/conftool/dbconfig/20230811-020703-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P50437 and previous config saved to /var/cache/conftool/dbconfig/20230811-015156-ladsgroup.json
  • 01:43 ryankemper: [WDQS] `ryankemper@wdqs2007:~$ sudo pool` (Caught up on lag)
  • 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P50436 and previous config saved to /var/cache/conftool/dbconfig/20230811-013650-ladsgroup.json
  • 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T342617)', diff saved to https://phabricator.wikimedia.org/P50435 and previous config saved to /var/cache/conftool/dbconfig/20230811-012144-ladsgroup.json
  • 00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T342617)', diff saved to https://phabricator.wikimedia.org/P50434 and previous config saved to /var/cache/conftool/dbconfig/20230811-004036-ladsgroup.json
  • 00:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 00:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T342617)', diff saved to https://phabricator.wikimedia.org/P50433 and previous config saved to /var/cache/conftool/dbconfig/20230811-003243-ladsgroup.json
  • 00:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 00:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance

2023-08-10

  • 22:55 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@ff0a21b]: (no justification provided) (duration: 00m 20s)
  • 22:55 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@ff0a21b]: (no justification provided)
  • 22:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 22:12 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:10 urbanecm@deploy1002: Finished scap: Backport for GlobalRenameUser: Ensure old username is in canonical form (T343958) (duration: 09m 48s)
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T342617)', diff saved to https://phabricator.wikimedia.org/P50432 and previous config saved to /var/cache/conftool/dbconfig/20230810-220820-ladsgroup.json
  • 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50431 and previous config saved to /var/cache/conftool/dbconfig/20230810-220759-ladsgroup.json
  • 22:03 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 22:02 urbanecm@deploy1002: urbanecm: Backport for GlobalRenameUser: Ensure old username is in canonical form (T343958) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 22:00 urbanecm@deploy1002: Started scap: Backport for GlobalRenameUser: Ensure old username is in canonical form (T343958)
  • 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P50430 and previous config saved to /var/cache/conftool/dbconfig/20230810-215253-ladsgroup.json
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P50429 and previous config saved to /var/cache/conftool/dbconfig/20230810-213747-ladsgroup.json
  • 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50428 and previous config saved to /var/cache/conftool/dbconfig/20230810-212241-ladsgroup.json
  • 21:21 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2007.codfw.wmnet with OS bullseye
  • 20:40 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:38 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 42s)
  • 20:37 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
  • 20:34 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: f1a6177 (duration: 00m 16s)
  • 20:34 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: f1a6177
  • 19:24 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@b5a1d04]: (no justification provided) (duration: 00m 09s)
  • 19:24 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@b5a1d04]: (no justification provided)
  • 19:18 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:18 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge ganeti changes - sukhe@cumin2002"
  • 19:16 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge ganeti changes - sukhe@cumin2002"
  • 19:14 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:55 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@4312d99]: (no justification provided) (duration: 00m 20s)
  • 18:55 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@4312d99]: (no justification provided)
  • 18:43 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: host reimage
  • 18:40 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: host reimage
  • 18:25 urbanecm@deploy1002: Finished scap: Backport for ltwiki: Disable Growth features (duration: 10m 05s)
  • 18:21 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2007.codfw.wmnet with OS bullseye
  • 18:18 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 18:17 urbanecm@deploy1002: urbanecm: Backport for ltwiki: Disable Growth features synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 18:15 urbanecm@deploy1002: Started scap: Backport for ltwiki: Disable Growth features
  • 18:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T342617)', diff saved to https://phabricator.wikimedia.org/P50426 and previous config saved to /var/cache/conftool/dbconfig/20230810-180656-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 18:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 17:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 17:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 17:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
  • 17:06 cstone: payments-wiki upgraded from 5b250aed to e094ea1f
  • 16:15 sukhe: running authdns-update to update ns2 and point it to nsa.wikimedia.org
  • 15:30 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
  • 15:20 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
  • 14:35 jforrester@deploy1002: Finished scap: Backport for wikifunctions: Allow transwiki import from Wikidata (T343365) (duration: 09m 22s)
  • 14:28 jforrester@deploy1002: stang and jforrester: Continuing with sync
  • 14:27 jforrester@deploy1002: stang and jforrester: Backport for wikifunctions: Allow transwiki import from Wikidata (T343365) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:25 jforrester@deploy1002: Started scap: Backport for wikifunctions: Allow transwiki import from Wikidata (T343365)
  • 14:22 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Tell WikiLambda to stash results in our bespoke cache (T342753) (duration: 08m 15s)
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50423 and previous config saved to /var/cache/conftool/dbconfig/20230810-142117-ladsgroup.json
  • 14:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T342617)', diff saved to https://phabricator.wikimedia.org/P50422 and previous config saved to /var/cache/conftool/dbconfig/20230810-142053-ladsgroup.json
  • 14:16 jforrester@deploy1002: jforrester: Continuing with sync
  • 14:16 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Tell WikiLambda to stash results in our bespoke cache (T342753) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:14 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Tell WikiLambda to stash results in our bespoke cache (T342753)
  • 14:12 jforrester@deploy1002: Finished scap: Backport for Add wikifunctions-staff to wmgPrivilegedGroups (T342868) (duration: 08m 35s)
  • 14:06 jforrester@deploy1002: jforrester: Continuing with sync
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P50421 and previous config saved to /var/cache/conftool/dbconfig/20230810-140546-ladsgroup.json
  • 14:05 jforrester@deploy1002: jforrester: Backport for Add wikifunctions-staff to wmgPrivilegedGroups (T342868) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:04 jforrester@deploy1002: Started scap: Backport for Add wikifunctions-staff to wmgPrivilegedGroups (T342868)
  • 14:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-coord1001.eqiad.wmnet with OS bullseye
  • 13:57 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:52 Emperor: restart puppet and repool ms-fe2009 after testing T211661
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P50420 and previous config saved to /var/cache/conftool/dbconfig/20230810-135040-ladsgroup.json
  • 13:47 Emperor: depool and stop puppet on ms-fe2009 to test updated rewrite.py T211661
  • 13:45 oblivian@deploy1002: Finished scap: Backport for Add wikifunctions object cache (T297815) (duration: 09m 09s)
  • 13:38 oblivian@deploy1002: oblivian: Continuing with sync
  • 13:37 oblivian@deploy1002: oblivian: Backport for Add wikifunctions object cache (T297815) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:36 oblivian@deploy1002: Started scap: Backport for Add wikifunctions object cache (T297815)
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T342617)', diff saved to https://phabricator.wikimedia.org/P50419 and previous config saved to /var/cache/conftool/dbconfig/20230810-133534-ladsgroup.json
  • 13:33 samtar@deploy1002: Finished scap: Backport for IS: Enable Phonos on medium projects (T336763) (duration: 10m 58s)
  • 13:26 samtar@deploy1002: samtar: Continuing with sync
  • 13:24 samtar@deploy1002: samtar: Backport for IS: Enable Phonos on medium projects (T336763) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:22 samtar@deploy1002: Started scap: Backport for IS: Enable Phonos on medium projects (T336763)
  • 13:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1092.eqiad.wmnet with OS bullseye
  • 13:14 TheresNoTime: `[samtar@mwmaint1002 ~]$ foreachwiki sql.php /srv/mediawiki-staging/php-1.41.0-wmf.20/extensions/CheckUser/schema/mysql/cu_useragent_clienthints_map.sql` for T258105
  • 13:09 TheresNoTime: `[samtar@mwmaint1002 ~]$ foreachwiki sql.php /srv/mediawiki-staging/php-1.41.0-wmf.20/extensions/CheckUser/schema/mysql/cu_useragent_clienthints.sql` for T258105
  • 12:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1092.eqiad.wmnet with reason: host reimage
  • 12:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1092.eqiad.wmnet with reason: host reimage
  • 12:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 12:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T342617)', diff saved to https://phabricator.wikimedia.org/P50418 and previous config saved to /var/cache/conftool/dbconfig/20230810-122626-ladsgroup.json
  • 12:22 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1092.eqiad.wmnet with OS bullseye
  • 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P50417 and previous config saved to /var/cache/conftool/dbconfig/20230810-121120-ladsgroup.json
  • 12:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1091.eqiad.wmnet with OS bullseye
  • 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3007.wikimedia.org
  • 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3007.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P50416 and previous config saved to /var/cache/conftool/dbconfig/20230810-115614-ladsgroup.json
  • 11:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: host reimage
  • 11:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1091.eqiad.wmnet with reason: host reimage
  • 11:45 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1001.eqiad.wmnet with reason: host reimage
  • 11:45 taavi@deploy1002: Finished scap: Backport for GlobalRename: Ensure status database rows use the normalized name (T343956) (duration: 10m 17s)
  • 11:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3007.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:42 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1091.eqiad.wmnet with reason: host reimage
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T342617)', diff saved to https://phabricator.wikimedia.org/P50415 and previous config saved to /var/cache/conftool/dbconfig/20230810-114108-ladsgroup.json
  • 11:40 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add manufacture to network devices - jbond@cumin1001 - T329669"
  • 11:39 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add manufacture to network devices - jbond@cumin1001 - T329669"
  • 11:39 taavi@deploy1002: taavi and urbanecm: Continuing with sync
  • 11:36 taavi@deploy1002: taavi and urbanecm: Backport for GlobalRename: Ensure status database rows use the normalized name (T343956) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:35 taavi@deploy1002: Started scap: Backport for GlobalRename: Ensure status database rows use the normalized name (T343956)
  • 11:34 taavi@deploy1002: Finished scap: Backport for throttle: remove expired rules, throttle: add rules for Wikimania 2023 (T343595) (duration: 11m 30s)
  • 11:32 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1001.eqiad.wmnet with OS bullseye
  • 11:27 taavi@deploy1002: taavi: Continuing with sync
  • 11:24 taavi@deploy1002: taavi: Backport for throttle: remove expired rules, throttle: add rules for Wikimania 2023 (T343595) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:23 taavi@deploy1002: Started scap: Backport for throttle: remove expired rules, throttle: add rules for Wikimania 2023 (T343595)
  • 11:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
  • 11:14 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1091.eqiad.wmnet with OS bullseye
  • 10:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1090.eqiad.wmnet with OS bullseye
  • 10:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 10:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 10:36 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 10:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 10:34 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 10:33 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 10:32 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 10:32 jiji@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 10:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1090.eqiad.wmnet with reason: host reimage
  • 10:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1090.eqiad.wmnet with reason: host reimage
  • 10:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet
  • 10:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet
  • 10:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
  • 10:16 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1090.eqiad.wmnet with OS bullseye
  • 10:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
  • 10:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
  • 09:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host karapace1001.eqiad.wmnet
  • 09:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1002.eqiad.wmnet
  • 09:09 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=arwiki --logwiki=metawiki 'Qwertyoruiop' '3h6 1'
  • 09:08 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Mittzy' 'Mittzy (usurped)'
  • 09:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
  • 09:07 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=amwiki --logwiki=metawiki 'Jean-Mahmood' 'User92259453'
  • 09:07 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Garciajaysonpinolkwani98' 'Ne_Shokot_Pinolkwane'
  • 09:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1004.eqiad.wmnet
  • 09:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet
  • 09:06 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1004.eqiad.wmnet
  • 09:05 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet
  • 09:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1002.eqiad.wmnet
  • 09:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
  • 09:03 urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'CHUniZH' 'Musik CH' # T343867
  • 08:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
  • 08:46 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:42 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3007.wikimedia.org
  • 08:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 08:26 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 08:21 godog: put back business hours americas for sre business hours escalation - T343812
  • 08:21 godog: put back business hours americas for sre business hours escalation
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast5004.wikimedia.org
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:52 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast5004.wikimedia.org
  • 07:19 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host bast5004.wikimedia.org
  • 07:19 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast5004.wikimedia.org with OS bookworm
  • 06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T342617)', diff saved to https://phabricator.wikimedia.org/P50414 and previous config saved to /var/cache/conftool/dbconfig/20230810-063611-ladsgroup.json
  • 06:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50413 and previous config saved to /var/cache/conftool/dbconfig/20230810-063523-ladsgroup.json
  • 06:23 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 06:20 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P50412 and previous config saved to /var/cache/conftool/dbconfig/20230810-062017-ladsgroup.json
  • 06:08 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 06:05 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-codfw
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P50411 and previous config saved to /var/cache/conftool/dbconfig/20230810-060511-ladsgroup.json
  • 05:59 moritzm: installing tiff security updates
  • 05:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50410 and previous config saved to /var/cache/conftool/dbconfig/20230810-055005-ladsgroup.json
  • 05:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast5004.wikimedia.org with OS bookworm
  • 05:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5004.wikimedia.org - jmm@cumin2002"
  • 05:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast5004.wikimedia.org - jmm@cumin2002"
  • 05:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast5004.wikimedia.org on all recursors
  • 05:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast5004.wikimedia.org on all recursors
  • 05:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5004.wikimedia.org - jmm@cumin2002"
  • 05:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast5004.wikimedia.org - jmm@cumin2002"
  • 05:27 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 05:27 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast5004.wikimedia.org
  • 05:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1015.eqiad.wmnet
  • 04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T342617)', diff saved to https://phabricator.wikimedia.org/P50409 and previous config saved to /var/cache/conftool/dbconfig/20230810-044643-ladsgroup.json
  • 04:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 04:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 04:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50408 and previous config saved to /var/cache/conftool/dbconfig/20230810-044622-ladsgroup.json
  • 04:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P50407 and previous config saved to /var/cache/conftool/dbconfig/20230810-043116-ladsgroup.json
  • 04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P50406 and previous config saved to /var/cache/conftool/dbconfig/20230810-041610-ladsgroup.json
  • 04:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50405 and previous config saved to /var/cache/conftool/dbconfig/20230810-040104-ladsgroup.json
  • 03:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 03:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T342617)', diff saved to https://phabricator.wikimedia.org/P50404 and previous config saved to /var/cache/conftool/dbconfig/20230810-024531-ladsgroup.json
  • 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P50403 and previous config saved to /var/cache/conftool/dbconfig/20230810-023025-ladsgroup.json
  • 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P50402 and previous config saved to /var/cache/conftool/dbconfig/20230810-021518-ladsgroup.json
  • 02:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T342617)', diff saved to https://phabricator.wikimedia.org/P50401 and previous config saved to /var/cache/conftool/dbconfig/20230810-020012-ladsgroup.json
  • 01:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T342617)', diff saved to https://phabricator.wikimedia.org/P50400 and previous config saved to /var/cache/conftool/dbconfig/20230810-014731-ladsgroup.json
  • 01:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P50399 and previous config saved to /var/cache/conftool/dbconfig/20230810-013225-ladsgroup.json
  • 01:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P50398 and previous config saved to /var/cache/conftool/dbconfig/20230810-011718-ladsgroup.json
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T342617)', diff saved to https://phabricator.wikimedia.org/P50397 and previous config saved to /var/cache/conftool/dbconfig/20230810-011228-ladsgroup.json
  • 01:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 01:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T342617)', diff saved to https://phabricator.wikimedia.org/P50396 and previous config saved to /var/cache/conftool/dbconfig/20230810-011207-ladsgroup.json
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T342617)', diff saved to https://phabricator.wikimedia.org/P50395 and previous config saved to /var/cache/conftool/dbconfig/20230810-010212-ladsgroup.json
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P50394 and previous config saved to /var/cache/conftool/dbconfig/20230810-005701-ladsgroup.json
  • 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P50393 and previous config saved to /var/cache/conftool/dbconfig/20230810-004154-ladsgroup.json
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T342617)', diff saved to https://phabricator.wikimedia.org/P50392 and previous config saved to /var/cache/conftool/dbconfig/20230810-002648-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T342617)', diff saved to https://phabricator.wikimedia.org/P50391 and previous config saved to /var/cache/conftool/dbconfig/20230810-001437-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50390 and previous config saved to /var/cache/conftool/dbconfig/20230810-001414-ladsgroup.json

2023-08-09

  • 23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P50389 and previous config saved to /var/cache/conftool/dbconfig/20230809-235908-ladsgroup.json
  • 23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P50388 and previous config saved to /var/cache/conftool/dbconfig/20230809-234402-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T342617)', diff saved to https://phabricator.wikimedia.org/P50387 and previous config saved to /var/cache/conftool/dbconfig/20230809-234146-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 23:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T342617)', diff saved to https://phabricator.wikimedia.org/P50386 and previous config saved to /var/cache/conftool/dbconfig/20230809-234125-ladsgroup.json
  • 23:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50385 and previous config saved to /var/cache/conftool/dbconfig/20230809-232855-ladsgroup.json
  • 23:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P50384 and previous config saved to /var/cache/conftool/dbconfig/20230809-232619-ladsgroup.json
  • 23:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P50383 and previous config saved to /var/cache/conftool/dbconfig/20230809-231112-ladsgroup.json
  • 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50382 and previous config saved to /var/cache/conftool/dbconfig/20230809-230339-ladsgroup.json
  • 23:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 23:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 22:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T342617)', diff saved to https://phabricator.wikimedia.org/P50381 and previous config saved to /var/cache/conftool/dbconfig/20230809-225605-ladsgroup.json
  • 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50380 and previous config saved to /var/cache/conftool/dbconfig/20230809-224114-ladsgroup.json
  • 22:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50379 and previous config saved to /var/cache/conftool/dbconfig/20230809-224053-ladsgroup.json
  • 22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P50378 and previous config saved to /var/cache/conftool/dbconfig/20230809-222547-ladsgroup.json
  • 22:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P50377 and previous config saved to /var/cache/conftool/dbconfig/20230809-221041-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T342617)', diff saved to https://phabricator.wikimedia.org/P50376 and previous config saved to /var/cache/conftool/dbconfig/20230809-220433-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 22:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T342617)', diff saved to https://phabricator.wikimedia.org/P50375 and previous config saved to /var/cache/conftool/dbconfig/20230809-220412-ladsgroup.json
  • 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50373 and previous config saved to /var/cache/conftool/dbconfig/20230809-215535-ladsgroup.json
  • 21:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P50372 and previous config saved to /var/cache/conftool/dbconfig/20230809-214905-ladsgroup.json
  • 21:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P50371 and previous config saved to /var/cache/conftool/dbconfig/20230809-213359-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50369 and previous config saved to /var/cache/conftool/dbconfig/20230809-212042-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50368 and previous config saved to /var/cache/conftool/dbconfig/20230809-212021-ladsgroup.json
  • 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T342617)', diff saved to https://phabricator.wikimedia.org/P50367 and previous config saved to /var/cache/conftool/dbconfig/20230809-211853-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T342617)', diff saved to https://phabricator.wikimedia.org/P50366 and previous config saved to /var/cache/conftool/dbconfig/20230809-210856-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T342617)', diff saved to https://phabricator.wikimedia.org/P50365 and previous config saved to /var/cache/conftool/dbconfig/20230809-210835-ladsgroup.json
  • 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P50364 and previous config saved to /var/cache/conftool/dbconfig/20230809-210514-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P50363 and previous config saved to /var/cache/conftool/dbconfig/20230809-205329-ladsgroup.json
  • 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P50362 and previous config saved to /var/cache/conftool/dbconfig/20230809-205008-ladsgroup.json
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P50361 and previous config saved to /var/cache/conftool/dbconfig/20230809-203822-ladsgroup.json
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P50360 and previous config saved to /var/cache/conftool/dbconfig/20230809-203731-ladsgroup.json
  • 20:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50359 and previous config saved to /var/cache/conftool/dbconfig/20230809-203502-ladsgroup.json
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T342617)', diff saved to https://phabricator.wikimedia.org/P50358 and previous config saved to /var/cache/conftool/dbconfig/20230809-203041-ladsgroup.json
  • 20:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T342617)', diff saved to https://phabricator.wikimedia.org/P50357 and previous config saved to /var/cache/conftool/dbconfig/20230809-203020-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T342617)', diff saved to https://phabricator.wikimedia.org/P50356 and previous config saved to /var/cache/conftool/dbconfig/20230809-202316-ladsgroup.json
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P50355 and previous config saved to /var/cache/conftool/dbconfig/20230809-202225-ladsgroup.json
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P50354 and previous config saved to /var/cache/conftool/dbconfig/20230809-201514-ladsgroup.json
  • 20:09 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts contint2001.wikimedia.org
  • 20:09 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:09 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - aokoth@cumin1001"
  • 20:08 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - aokoth@cumin1001"
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P50353 and previous config saved to /var/cache/conftool/dbconfig/20230809-200718-ladsgroup.json
  • 20:05 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P50352 and previous config saved to /var/cache/conftool/dbconfig/20230809-200007-ladsgroup.json
  • 19:59 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts contint2001.wikimedia.org
  • 19:58 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
  • 19:58 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P50351 and previous config saved to /var/cache/conftool/dbconfig/20230809-195212-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T342617)', diff saved to https://phabricator.wikimedia.org/P50350 and previous config saved to /var/cache/conftool/dbconfig/20230809-194501-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T342617)', diff saved to https://phabricator.wikimedia.org/P50349 and previous config saved to /var/cache/conftool/dbconfig/20230809-193623-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T342617)', diff saved to https://phabricator.wikimedia.org/P50348 and previous config saved to /var/cache/conftool/dbconfig/20230809-193559-ladsgroup.json
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T343718)', diff saved to https://phabricator.wikimedia.org/P50347 and previous config saved to /var/cache/conftool/dbconfig/20230809-192818-ladsgroup.json
  • 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P50346 and previous config saved to /var/cache/conftool/dbconfig/20230809-192746-ladsgroup.json
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P50345 and previous config saved to /var/cache/conftool/dbconfig/20230809-192053-ladsgroup.json
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P50344 and previous config saved to /var/cache/conftool/dbconfig/20230809-191240-ladsgroup.json
  • 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P50343 and previous config saved to /var/cache/conftool/dbconfig/20230809-190547-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T342617)', diff saved to https://phabricator.wikimedia.org/P50342 and previous config saved to /var/cache/conftool/dbconfig/20230809-185805-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T342617)', diff saved to https://phabricator.wikimedia.org/P50341 and previous config saved to /var/cache/conftool/dbconfig/20230809-185745-ladsgroup.json
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P50340 and previous config saved to /var/cache/conftool/dbconfig/20230809-185734-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T342617)', diff saved to https://phabricator.wikimedia.org/P50339 and previous config saved to /var/cache/conftool/dbconfig/20230809-185040-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P50338 and previous config saved to /var/cache/conftool/dbconfig/20230809-184238-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P50337 and previous config saved to /var/cache/conftool/dbconfig/20230809-184228-ladsgroup.json
  • 18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T343718)', diff saved to https://phabricator.wikimedia.org/P50336 and previous config saved to /var/cache/conftool/dbconfig/20230809-184018-ladsgroup.json
  • 18:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 18:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 18:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P50335 and previous config saved to /var/cache/conftool/dbconfig/20230809-183952-ladsgroup.json
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P50334 and previous config saved to /var/cache/conftool/dbconfig/20230809-182726-ladsgroup.json
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P50333 and previous config saved to /var/cache/conftool/dbconfig/20230809-182446-ladsgroup.json
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T342617)', diff saved to https://phabricator.wikimedia.org/P50332 and previous config saved to /var/cache/conftool/dbconfig/20230809-181219-ladsgroup.json
  • 18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P50331 and previous config saved to /var/cache/conftool/dbconfig/20230809-180940-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T342617)', diff saved to https://phabricator.wikimedia.org/P50330 and previous config saved to /var/cache/conftool/dbconfig/20230809-180143-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T342617)', diff saved to https://phabricator.wikimedia.org/P50329 and previous config saved to /var/cache/conftool/dbconfig/20230809-180122-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P50328 and previous config saved to /var/cache/conftool/dbconfig/20230809-175434-ladsgroup.json
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P50327 and previous config saved to /var/cache/conftool/dbconfig/20230809-174616-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P50326 and previous config saved to /var/cache/conftool/dbconfig/20230809-173110-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T343718)', diff saved to https://phabricator.wikimedia.org/P50325 and previous config saved to /var/cache/conftool/dbconfig/20230809-172803-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T342617)', diff saved to https://phabricator.wikimedia.org/P50324 and previous config saved to /var/cache/conftool/dbconfig/20230809-172507-ladsgroup.json
  • 17:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 17:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T342617)', diff saved to https://phabricator.wikimedia.org/P50323 and previous config saved to /var/cache/conftool/dbconfig/20230809-172447-ladsgroup.json
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T342617)', diff saved to https://phabricator.wikimedia.org/P50322 and previous config saved to /var/cache/conftool/dbconfig/20230809-171604-ladsgroup.json
  • 17:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50321 and previous config saved to /var/cache/conftool/dbconfig/20230809-171533-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P50320 and previous config saved to /var/cache/conftool/dbconfig/20230809-170940-ladsgroup.json
  • 17:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 17:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P50319 and previous config saved to /var/cache/conftool/dbconfig/20230809-170351-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P50318 and previous config saved to /var/cache/conftool/dbconfig/20230809-170027-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P50317 and previous config saved to /var/cache/conftool/dbconfig/20230809-165434-ladsgroup.json
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P50316 and previous config saved to /var/cache/conftool/dbconfig/20230809-164844-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P50315 and previous config saved to /var/cache/conftool/dbconfig/20230809-164520-ladsgroup.json
  • 16:44 elukey: temporarly bump miscweb bugzilla pods from 4 to 8 in k8s wikikube codfw
  • 16:42 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:41 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T342617)', diff saved to https://phabricator.wikimedia.org/P50314 and previous config saved to /var/cache/conftool/dbconfig/20230809-163928-ladsgroup.json
  • 16:38 elukey: temporarly bump miscweb bugzilla pods from 2 to 4 in k8s wikikube codfw
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P50313 and previous config saved to /var/cache/conftool/dbconfig/20230809-163338-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50312 and previous config saved to /var/cache/conftool/dbconfig/20230809-163014-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T342617)', diff saved to https://phabricator.wikimedia.org/P50311 and previous config saved to /var/cache/conftool/dbconfig/20230809-162913-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T342617)', diff saved to https://phabricator.wikimedia.org/P50310 and previous config saved to /var/cache/conftool/dbconfig/20230809-162836-ladsgroup.json
  • 16:22 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P50308 and previous config saved to /var/cache/conftool/dbconfig/20230809-161832-ladsgroup.json
  • 16:17 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-master1001.eqiad.wmnet with OS bullseye
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P50307 and previous config saved to /var/cache/conftool/dbconfig/20230809-161330-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P50306 and previous config saved to /var/cache/conftool/dbconfig/20230809-155824-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T343718)', diff saved to https://phabricator.wikimedia.org/P50305 and previous config saved to /var/cache/conftool/dbconfig/20230809-155137-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T342617)', diff saved to https://phabricator.wikimedia.org/P50304 and previous config saved to /var/cache/conftool/dbconfig/20230809-155127-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50303 and previous config saved to /var/cache/conftool/dbconfig/20230809-155116-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T342617)', diff saved to https://phabricator.wikimedia.org/P50302 and previous config saved to /var/cache/conftool/dbconfig/20230809-155106-ladsgroup.json
  • 15:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: host reimage
  • 15:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:47 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:47 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1001.eqiad.wmnet with reason: host reimage
  • 15:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1089.eqiad.wmnet with OS bullseye
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T342617)', diff saved to https://phabricator.wikimedia.org/P50301 and previous config saved to /var/cache/conftool/dbconfig/20230809-154317-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50300 and previous config saved to /var/cache/conftool/dbconfig/20230809-153610-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P50299 and previous config saved to /var/cache/conftool/dbconfig/20230809-153600-ladsgroup.json
  • 15:29 hnowlan: disabling puppet on A:cp to test r/947372
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50298 and previous config saved to /var/cache/conftool/dbconfig/20230809-152103-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P50297 and previous config saved to /var/cache/conftool/dbconfig/20230809-152053-ladsgroup.json
  • 15:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1089.eqiad.wmnet with reason: host reimage
  • 15:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1089.eqiad.wmnet with reason: host reimage
  • 15:06 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-master1001.eqiad.wmnet with OS bullseye
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50295 and previous config saved to /var/cache/conftool/dbconfig/20230809-150557-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T342617)', diff saved to https://phabricator.wikimedia.org/P50294 and previous config saved to /var/cache/conftool/dbconfig/20230809-150547-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50293 and previous config saved to /var/cache/conftool/dbconfig/20230809-150443-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 15:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
  • 14:57 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1089.eqiad.wmnet with OS bullseye
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T342617)', diff saved to https://phabricator.wikimedia.org/P50292 and previous config saved to /var/cache/conftool/dbconfig/20230809-145714-ladsgroup.json
  • 14:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1088.eqiad.wmnet with OS bullseye
  • 14:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 14:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T342617)', diff saved to https://phabricator.wikimedia.org/P50291 and previous config saved to /var/cache/conftool/dbconfig/20230809-145653-ladsgroup.json
  • 14:49 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki wikifunctionswiki --fix` for T342964
  • 14:48 samtar@deploy1002: Finished scap: Backport for core-namespaces: Remove dupe wikifunctions alias (T342964) (duration: 14m 21s)
  • 14:42 samtar@deploy1002: samtar: Continuing with sync
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P50290 and previous config saved to /var/cache/conftool/dbconfig/20230809-144147-ladsgroup.json
  • 14:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P50289 and previous config saved to /var/cache/conftool/dbconfig/20230809-144022-ladsgroup.json
  • 14:36 samtar@deploy1002: samtar: Backport for core-namespaces: Remove dupe wikifunctions alias (T342964) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:34 samtar@deploy1002: Started scap: Backport for core-namespaces: Remove dupe wikifunctions alias (T342964)
  • 14:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1088.eqiad.wmnet with reason: host reimage
  • 14:31 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1088.eqiad.wmnet with reason: host reimage
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P50288 and previous config saved to /var/cache/conftool/dbconfig/20230809-142640-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P50287 and previous config saved to /var/cache/conftool/dbconfig/20230809-142515-ladsgroup.json
  • 14:24 moritzm: installing sudo bugfix updates from Bookworm 12.1 point release
  • 14:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 14:18 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1088.eqiad.wmnet with OS bullseye
  • 14:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T342617)', diff saved to https://phabricator.wikimedia.org/P50285 and previous config saved to /var/cache/conftool/dbconfig/20230809-141134-ladsgroup.json
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P50284 and previous config saved to /var/cache/conftool/dbconfig/20230809-141009-ladsgroup.json
  • 14:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-master1002.eqiad.wmnet with OS bullseye
  • 14:07 moritzm: restarting FPM on mediawiki canaries to pick up tiff update
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T342617)', diff saved to https://phabricator.wikimedia.org/P50283 and previous config saved to /var/cache/conftool/dbconfig/20230809-140551-ladsgroup.json
  • 14:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 14:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50282 and previous config saved to /var/cache/conftool/dbconfig/20230809-140531-ladsgroup.json
  • 13:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1087.eqiad.wmnet with OS bullseye
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P50281 and previous config saved to /var/cache/conftool/dbconfig/20230809-135503-ladsgroup.json
  • 13:54 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 13:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
  • 13:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T343718)', diff saved to https://phabricator.wikimedia.org/P50280 and previous config saved to /var/cache/conftool/dbconfig/20230809-135356-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P50279 and previous config saved to /var/cache/conftool/dbconfig/20230809-135324-ladsgroup.json
  • 13:52 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 13:52 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 13:52 moritzm: installing tiff security updates
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P50278 and previous config saved to /var/cache/conftool/dbconfig/20230809-135024-ladsgroup.json
  • 13:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: host reimage
  • 13:47 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-master1002.eqiad.wmnet with reason: host reimage
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T342617)', diff saved to https://phabricator.wikimedia.org/P50277 and previous config saved to /var/cache/conftool/dbconfig/20230809-134136-ladsgroup.json
  • 13:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 13:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50276 and previous config saved to /var/cache/conftool/dbconfig/20230809-134115-ladsgroup.json
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P50275 and previous config saved to /var/cache/conftool/dbconfig/20230809-133818-ladsgroup.json
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P50274 and previous config saved to /var/cache/conftool/dbconfig/20230809-133518-ladsgroup.json
  • 13:33 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-master1002.eqiad.wmnet with OS bullseye
  • 13:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1087.eqiad.wmnet with reason: host reimage
  • 13:29 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1087.eqiad.wmnet with reason: host reimage
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P50273 and previous config saved to /var/cache/conftool/dbconfig/20230809-132609-ladsgroup.json
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T342617)', diff saved to https://phabricator.wikimedia.org/P50272 and previous config saved to /var/cache/conftool/dbconfig/20230809-132446-ladsgroup.json
  • 13:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T342617)', diff saved to https://phabricator.wikimedia.org/P50271 and previous config saved to /var/cache/conftool/dbconfig/20230809-132424-ladsgroup.json
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P50270 and previous config saved to /var/cache/conftool/dbconfig/20230809-132312-ladsgroup.json
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50269 and previous config saved to /var/cache/conftool/dbconfig/20230809-132012-ladsgroup.json
  • 13:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 13:12 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1087.eqiad.wmnet with OS bullseye
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P50268 and previous config saved to /var/cache/conftool/dbconfig/20230809-131103-ladsgroup.json
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P50267 and previous config saved to /var/cache/conftool/dbconfig/20230809-130918-ladsgroup.json
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P50266 and previous config saved to /var/cache/conftool/dbconfig/20230809-130805-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T343718)', diff saved to https://phabricator.wikimedia.org/P50265 and previous config saved to /var/cache/conftool/dbconfig/20230809-130557-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P50264 and previous config saved to /var/cache/conftool/dbconfig/20230809-130518-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50263 and previous config saved to /var/cache/conftool/dbconfig/20230809-125555-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P50262 and previous config saved to /var/cache/conftool/dbconfig/20230809-125412-ladsgroup.json
  • 12:53 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 12:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 12:52 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 12:51 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P50261 and previous config saved to /var/cache/conftool/dbconfig/20230809-125012-ladsgroup.json
  • 12:49 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:49 dcausse: restarting blazegraph on wdqs1007 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 12:48 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:48 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:48 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:40 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1086.eqiad.wmnet with OS bullseye
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T342617)', diff saved to https://phabricator.wikimedia.org/P50260 and previous config saved to /var/cache/conftool/dbconfig/20230809-123906-ladsgroup.json
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P50259 and previous config saved to /var/cache/conftool/dbconfig/20230809-123506-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P50258 and previous config saved to /var/cache/conftool/dbconfig/20230809-122000-ladsgroup.json
  • 12:19 jayme@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:19 jayme@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:18 jayme@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 12:18 jayme@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T343718)', diff saved to https://phabricator.wikimedia.org/P50257 and previous config saved to /var/cache/conftool/dbconfig/20230809-121852-ladsgroup.json
  • 12:18 jayme@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 12:18 jayme@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 12:18 jayme@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 12:18 jayme@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P50256 and previous config saved to /var/cache/conftool/dbconfig/20230809-121831-ladsgroup.json
  • 12:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1086.eqiad.wmnet with reason: host reimage
  • 12:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1086.eqiad.wmnet with reason: host reimage
  • 12:13 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 12:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 12:12 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 12:11 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:11 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 12:11 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:11 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:11 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:10 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 12:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 12:08 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:08 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P50255 and previous config saved to /var/cache/conftool/dbconfig/20230809-120325-ladsgroup.json
  • 12:01 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1086.eqiad.wmnet with OS bullseye
  • 12:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1085.eqiad.wmnet with OS bullseye
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T342617)', diff saved to https://phabricator.wikimedia.org/P50254 and previous config saved to /var/cache/conftool/dbconfig/20230809-115534-ladsgroup.json
  • 11:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T342617)', diff saved to https://phabricator.wikimedia.org/P50253 and previous config saved to /var/cache/conftool/dbconfig/20230809-115227-ladsgroup.json
  • 11:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T342617)', diff saved to https://phabricator.wikimedia.org/P50252 and previous config saved to /var/cache/conftool/dbconfig/20230809-115206-ladsgroup.json
  • 11:49 ladsgroup@deploy1002: Finished scap: Backport for sdwiki: set 'wgTranslateNumerals' to false (T268203) (duration: 09m 22s)
  • 11:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P50251 and previous config saved to /var/cache/conftool/dbconfig/20230809-114819-ladsgroup.json
  • 11:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 11:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 11:41 ladsgroup@deploy1002: kaleembhatti and ladsgroup: Continuing with sync
  • 11:41 ladsgroup@deploy1002: kaleembhatti and ladsgroup: Backport for sdwiki: set 'wgTranslateNumerals' to false (T268203) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:39 ladsgroup@deploy1002: Started scap: Backport for sdwiki: set 'wgTranslateNumerals' to false (T268203)
  • 11:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1085.eqiad.wmnet with reason: host reimage
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P50250 and previous config saved to /var/cache/conftool/dbconfig/20230809-113659-ladsgroup.json
  • 11:35 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1085.eqiad.wmnet with reason: host reimage
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P50249 and previous config saved to /var/cache/conftool/dbconfig/20230809-113312-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T343718)', diff saved to https://phabricator.wikimedia.org/P50248 and previous config saved to /var/cache/conftool/dbconfig/20230809-113205-ladsgroup.json
  • 11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P50247 and previous config saved to /var/cache/conftool/dbconfig/20230809-113144-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P50246 and previous config saved to /var/cache/conftool/dbconfig/20230809-112153-ladsgroup.json
  • 11:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1085.eqiad.wmnet with OS bullseye
  • 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P50245 and previous config saved to /var/cache/conftool/dbconfig/20230809-111638-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50244 and previous config saved to /var/cache/conftool/dbconfig/20230809-111141-ladsgroup.json
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T342617)', diff saved to https://phabricator.wikimedia.org/P50243 and previous config saved to /var/cache/conftool/dbconfig/20230809-110647-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P50242 and previous config saved to /var/cache/conftool/dbconfig/20230809-110132-ladsgroup.json
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P50241 and previous config saved to /var/cache/conftool/dbconfig/20230809-105635-ladsgroup.json
  • 10:56 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 10:55 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 10:55 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 10:55 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 10:54 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 10:54 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P50240 and previous config saved to /var/cache/conftool/dbconfig/20230809-104625-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T343718)', diff saved to https://phabricator.wikimedia.org/P50239 and previous config saved to /var/cache/conftool/dbconfig/20230809-104518-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50238 and previous config saved to /var/cache/conftool/dbconfig/20230809-104457-ladsgroup.json
  • 10:44 _joe_: ran requestctl commit, which removed the comma removal from the requestctl output as per T305582
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P50237 and previous config saved to /var/cache/conftool/dbconfig/20230809-104128-ladsgroup.json
  • 10:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1084.eqiad.wmnet with OS bullseye
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P50236 and previous config saved to /var/cache/conftool/dbconfig/20230809-102951-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50235 and previous config saved to /var/cache/conftool/dbconfig/20230809-102622-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T342617)', diff saved to https://phabricator.wikimedia.org/P50234 and previous config saved to /var/cache/conftool/dbconfig/20230809-101946-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P50233 and previous config saved to /var/cache/conftool/dbconfig/20230809-101444-ladsgroup.json
  • 10:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1002.eqiad.wmnet
  • 10:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1084.eqiad.wmnet with reason: host reimage
  • 10:09 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1084.eqiad.wmnet with reason: host reimage
  • 10:08 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-coord1002.eqiad.wmnet
  • 10:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1002.eqiad.wmnet
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50232 and previous config saved to /var/cache/conftool/dbconfig/20230809-095938-ladsgroup.json
  • 09:58 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-master1002.eqiad.wmnet
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T343718)', diff saved to https://phabricator.wikimedia.org/P50231 and previous config saved to /var/cache/conftool/dbconfig/20230809-095730-ladsgroup.json
  • 09:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 09:55 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 09:55 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 09:55 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 09:54 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1084.eqiad.wmnet with OS bullseye
  • 09:48 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 09:48 jayme@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50230 and previous config saved to /var/cache/conftool/dbconfig/20230809-093715-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50229 and previous config saved to /var/cache/conftool/dbconfig/20230809-093341-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 09:31 hnowlan: disabling puppet on A:cp to test 945558
  • 09:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50228 and previous config saved to /var/cache/conftool/dbconfig/20230809-092319-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 09:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50227 and previous config saved to /var/cache/conftool/dbconfig/20230809-092258-ladsgroup.json
  • 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P50226 and previous config saved to /var/cache/conftool/dbconfig/20230809-090750-ladsgroup.json
  • 09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P50225 and previous config saved to /var/cache/conftool/dbconfig/20230809-085244-ladsgroup.json
  • 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50224 and previous config saved to /var/cache/conftool/dbconfig/20230809-083738-ladsgroup.json
  • 08:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 08:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T342617)', diff saved to https://phabricator.wikimedia.org/P50223 and previous config saved to /var/cache/conftool/dbconfig/20230809-083319-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 07:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
  • 07:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 07:12 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 7 Wikipedias (T343211) (duration: 09m 58s)
  • 07:05 kartik@deploy1002: kartik: Continuing with sync
  • 07:03 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 7 Wikipedias (T343211) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:02 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 7 Wikipedias (T343211)
  • 06:52 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jkieserman out of all services on: 33 hosts
  • 06:51 root@cumin2002: START - Cookbook sre.idm.logout Logging Jkieserman out of all services on: 33 hosts
  • 06:51 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jkieserman out of all services on: 716 hosts
  • 06:51 root@cumin2002: START - Cookbook sre.idm.logout Logging Jkieserman out of all services on: 716 hosts
  • 06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jkieserman out of all services on: 1309 hosts
  • 06:46 root@cumin2002: START - Cookbook sre.idm.logout Logging Jkieserman out of all services on: 1309 hosts
  • 06:46 root@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Jmads out of all services on: 1309 hosts
  • 06:46 root@cumin2002: START - Cookbook sre.idm.logout Logging Jmads out of all services on: 1309 hosts
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50222 and previous config saved to /var/cache/conftool/dbconfig/20230809-061826-ladsgroup.json
  • 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50219 and previous config saved to /var/cache/conftool/dbconfig/20230809-013145-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 01:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T342617)', diff saved to https://phabricator.wikimedia.org/P50218 and previous config saved to /var/cache/conftool/dbconfig/20230809-013124-ladsgroup.json
  • 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P50217 and previous config saved to /var/cache/conftool/dbconfig/20230809-011618-ladsgroup.json
  • 01:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P50216 and previous config saved to /var/cache/conftool/dbconfig/20230809-010112-ladsgroup.json
  • 00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T342617)', diff saved to https://phabricator.wikimedia.org/P50215 and previous config saved to /var/cache/conftool/dbconfig/20230809-004605-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50214 and previous config saved to /var/cache/conftool/dbconfig/20230809-003817-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P50213 and previous config saved to /var/cache/conftool/dbconfig/20230809-002310-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P50212 and previous config saved to /var/cache/conftool/dbconfig/20230809-000804-ladsgroup.json

2023-08-08

  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50211 and previous config saved to /var/cache/conftool/dbconfig/20230808-235258-ladsgroup.json
  • 22:33 urbanecm: mwmaint1002: stop persistRevisionThreadItems.php frwiki instance because of T343859 (cc T315510)
  • 22:04 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 17s)
  • 22:03 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177
  • 21:57 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:46 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs1003.eqiad.wmnet with OS bullseye
  • 21:22 brett: Exported varnish-modules 0.15.0-4 for bookworm-wikimedia (T342154)
  • 21:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage
  • 21:15 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage
  • 21:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 108
  • 21:06 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 108
  • 21:04 bking@cumin1001: conftool action : set/pooled=no; selector: name=wcqs1003.eqiad.wmnet,service=wcqs
  • 21:02 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wcqs,name=eqiad
  • 21:02 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye
  • 20:58 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 17s)
  • 20:58 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177
  • 20:57 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs1002.eqiad.wmnet with OS bullseye
  • 20:52 bking@deploy1002: Finished deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177 (duration: 00m 18s)
  • 20:52 bking@deploy1002: Started deploy [wdqs/wdqs@f1a6177] (wcqs): f1a6177
  • 20:43 urbanecm@deploy1002: Finished scap: Backport for Deploy to CN language wikis (T335886) (duration: 09m 08s)
  • 20:41 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@f1a6177]: whitelist new qlever endpoints take 4 (forgot git pull) T339347 (duration: 10m 44s)
  • 20:37 urbanecm@deploy1002: ksarabia and urbanecm: Continuing with sync
  • 20:36 urbanecm@deploy1002: ksarabia and urbanecm: Backport for Deploy to CN language wikis (T335886) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:34 urbanecm@deploy1002: Started scap: Backport for Deploy to CN language wikis (T335886)
  • 20:31 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s6' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315510)
  • 20:30 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s5' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353)
  • 20:30 ryankemper@deploy1002: Started deploy [wdqs/wdqs@f1a6177]: whitelist new qlever endpoints take 4 (forgot git pull) T339347
  • 20:30 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s3' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353)
  • 20:29 urbanecm: mwmaint1002: `foreachwikiindblist 'group2 & s2' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all --touched-after=20230615000000` (T315353)
  • 20:24 urbanecm@deploy1002: Finished scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353) (duration: 10m 55s)
  • 20:17 urbanecm@deploy1002: urbanecm and matmarex: Continuing with sync
  • 20:16 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 3 T339347 (duration: 02m 54s)
  • 20:14 urbanecm@deploy1002: urbanecm and matmarex: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:14 ryankemper: [WDQS] Lag caught up on `wdqs1006`; repooled -> `ryankemper@wdqs1006:~$ sudo pool`
  • 20:13 urbanecm@deploy1002: Started scap: Backport for Enable wgDiscussionToolsEnablePermalinksBackend on s2/s3/s5/s6 group2 (T315353)
  • 20:13 ryankemper@deploy1002: Started deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 3 T339347
  • 19:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs[1001-1003].eqiad.wmnet with reason: T331300
  • 19:28 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs[1001-1003].eqiad.wmnet with reason: T331300
  • 19:23 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:06 ryankemper: [WDQS] Depooled `wdqs1006` while it catches up on 7 hours of lag
  • 19:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 2 (duration: 11m 34s)
  • 18:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4001.ulsfo.wmnet with OS bullseye
  • 18:54 ryankemper@deploy1002: Started deploy [wdqs/wdqs@aa5f5b7]: whitelist new qlever endpoints take 2
  • 18:49 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wcqs,name=eqiad
  • 18:48 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: whitelist new qlever endpoints (duration: 03m 08s)
  • 18:45 ryankemper@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: whitelist new qlever endpoints
  • 18:45 ryankemper@deploy1002: deploy aborted: 0.3.124 (duration: 01m 50s)
  • 18:43 ryankemper@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 18:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 18:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 18:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum4001.ulsfo.wmnet with OS bullseye
  • 18:12 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host durum4001.ulsfo.wmnet with OS bookworm
  • 17:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1001.eqiad.wmnet with OS bullseye
  • 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 17:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T342617)', diff saved to https://phabricator.wikimedia.org/P50209 and previous config saved to /var/cache/conftool/dbconfig/20230808-175101-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 17:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T342617)', diff saved to https://phabricator.wikimedia.org/P50208 and previous config saved to /var/cache/conftool/dbconfig/20230808-175040-ladsgroup.json
  • 17:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1002.eqiad.wmnet with reason: host reimage
  • 17:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1001.eqiad.wmnet with reason: host reimage
  • 17:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1002.eqiad.wmnet with reason: host reimage
  • 17:35 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1001.eqiad.wmnet with reason: host reimage
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P50207 and previous config saved to /var/cache/conftool/dbconfig/20230808-173534-ladsgroup.json
  • 17:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum4001.ulsfo.wmnet with OS bookworm
  • 17:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1083.eqiad.wmnet with OS bullseye
  • 17:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs1002.eqiad.wmnet with OS bullseye
  • 17:22 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs1001.eqiad.wmnet with OS bullseye
  • 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P50206 and previous config saved to /var/cache/conftool/dbconfig/20230808-172027-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T342617)', diff saved to https://phabricator.wikimedia.org/P50205 and previous config saved to /var/cache/conftool/dbconfig/20230808-170521-ladsgroup.json
  • 17:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1083.eqiad.wmnet with reason: host reimage
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T342617)', diff saved to https://phabricator.wikimedia.org/P50204 and previous config saved to /var/cache/conftool/dbconfig/20230808-165824-ladsgroup.json
  • 16:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T342617)', diff saved to https://phabricator.wikimedia.org/P50203 and previous config saved to /var/cache/conftool/dbconfig/20230808-165803-ladsgroup.json
  • 16:58 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1083.eqiad.wmnet with reason: host reimage
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P50202 and previous config saved to /var/cache/conftool/dbconfig/20230808-164256-ladsgroup.json
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P50201 and previous config saved to /var/cache/conftool/dbconfig/20230808-162750-ladsgroup.json
  • 16:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6002.drmrs.wmnet with OS bookworm
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T342617)', diff saved to https://phabricator.wikimedia.org/P50200 and previous config saved to /var/cache/conftool/dbconfig/20230808-161244-ladsgroup.json
  • 15:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1083.eqiad.wmnet with OS bullseye
  • 15:44 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1082.eqiad.wmnet with OS bullseye
  • 15:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 15:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 15:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1082.eqiad.wmnet with reason: host reimage
  • 15:19 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1082.eqiad.wmnet with reason: host reimage
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50197 and previous config saved to /var/cache/conftool/dbconfig/20230808-151637-ladsgroup.json
  • 15:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6002.drmrs.wmnet with OS bookworm
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50196 and previous config saved to /var/cache/conftool/dbconfig/20230808-150131-ladsgroup.json
  • 14:54 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6001.drmrs.wmnet with OS bookworm
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P50195 and previous config saved to /var/cache/conftool/dbconfig/20230808-144625-ladsgroup.json
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50194 and previous config saved to /var/cache/conftool/dbconfig/20230808-143119-ladsgroup.json
  • 14:10 _joe_: updated conftool, requestctl on puppetmasters to 2.3.1 to fix bugs with requestctl log
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50192 and previous config saved to /var/cache/conftool/dbconfig/20230808-140331-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1082.eqiad.wmnet with OS bullseye
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50190 and previous config saved to /var/cache/conftool/dbconfig/20230808-135847-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343718)', diff saved to https://phabricator.wikimedia.org/P50189 and previous config saved to /var/cache/conftool/dbconfig/20230808-135636-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:47 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to old columns of externallinks in ruwikinews (T342683) (duration: 10m 00s)
  • 13:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 13:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 13:41 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 13:39 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to old columns of externallinks in ruwikinews (T342683) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:37 ladsgroup@deploy1002: Started scap: Backport for Stop writing to old columns of externallinks in ruwikinews (T342683)
  • 13:36 taavi@deploy1002: Finished scap: Backport for newiki: Fix templateeditor config (T343257) (duration: 09m 49s)
  • 13:36 volans: set platform to null on all devices and VMs in Netbox - T336623
  • 13:29 taavi@deploy1002: taavi and stang: Continuing with sync
  • 13:27 taavi@deploy1002: taavi and stang: Backport for newiki: Fix templateeditor config (T343257) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:26 taavi@deploy1002: Started scap: Backport for newiki: Fix templateeditor config (T343257)
  • 13:21 sukhe: reprepro -C main include bookworm-wikimedia gdnsd_3.99.0~alpha2-2_amd64.changes: T342154
  • 13:19 taavi@deploy1002: Finished scap: Backport for Update piwiki legacy vector logo (T305950), Update idwiktionary old vector logo (T341175) (duration: 10m 48s)
  • 13:18 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 13:18 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 13:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
  • 13:17 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host durum6001.drmrs.wmnet with OS bookworm
  • 13:12 taavi@deploy1002: anzx and taavi: Continuing with sync
  • 13:09 taavi@deploy1002: anzx and taavi: Backport for Update piwiki legacy vector logo (T305950), Update idwiktionary old vector logo (T341175) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:08 taavi@deploy1002: Started scap: Backport for Update piwiki legacy vector logo (T305950), Update idwiktionary old vector logo (T341175)
  • 13:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
  • 13:02 sukhe: reprepro -C main include bookworm-wikimedia anycast-healthchecker_0.9.1-1+wmf12u1_amd64.changes: T342154
  • 12:57 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 46s)
  • 12:57 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
  • 12:40 samtar@deploy1002: Finished scap: Backport for IS: Ensure edit recovery is disabled (T342858) (duration: 08m 18s)
  • 12:34 samtar@deploy1002: samtar: Continuing with sync
  • 12:34 samtar@deploy1002: samtar: Backport for IS: Ensure edit recovery is disabled (T342858) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:32 samtar@deploy1002: Started scap: Backport for IS: Ensure edit recovery is disabled (T342858)
  • 12:28 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:25 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:24 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:36 claime: deploying mw-on-k8s - https://gerrit.wikimedia.org/r/945798
  • 10:21 taavi: update T343294 mitigations
  • 10:00 volans: restart ferm on mirror1001 to pick new IP address for debian syncproxy2
  • 09:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 09:52 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 09:44 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 09:43 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T342617)', diff saved to https://phabricator.wikimedia.org/P50188 and previous config saved to /var/cache/conftool/dbconfig/20230808-093835-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T342617)', diff saved to https://phabricator.wikimedia.org/P50187 and previous config saved to /var/cache/conftool/dbconfig/20230808-093814-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P50186 and previous config saved to /var/cache/conftool/dbconfig/20230808-092308-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T342617)', diff saved to https://phabricator.wikimedia.org/P50185 and previous config saved to /var/cache/conftool/dbconfig/20230808-091119-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T342617)', diff saved to https://phabricator.wikimedia.org/P50184 and previous config saved to /var/cache/conftool/dbconfig/20230808-091058-ladsgroup.json
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P50183 and previous config saved to /var/cache/conftool/dbconfig/20230808-090801-ladsgroup.json
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P50182 and previous config saved to /var/cache/conftool/dbconfig/20230808-085551-ladsgroup.json
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T342617)', diff saved to https://phabricator.wikimedia.org/P50181 and previous config saved to /var/cache/conftool/dbconfig/20230808-085255-ladsgroup.json
  • 08:45 jynus: restart debmonitor2003 services
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P50180 and previous config saved to /var/cache/conftool/dbconfig/20230808-084045-ladsgroup.json
  • 08:33 elukey: powercycle ml-serve2004 - mgmt console without tty available, DIMM errors in getsel
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T342617)', diff saved to https://phabricator.wikimedia.org/P50179 and previous config saved to /var/cache/conftool/dbconfig/20230808-082539-ladsgroup.json
  • 07:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:07 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:07 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:06 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:06 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T342617)', diff saved to https://phabricator.wikimedia.org/P50178 and previous config saved to /var/cache/conftool/dbconfig/20230808-022547-ladsgroup.json
  • 02:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 02:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T342617)', diff saved to https://phabricator.wikimedia.org/P50177 and previous config saved to /var/cache/conftool/dbconfig/20230808-022526-ladsgroup.json
  • 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P50176 and previous config saved to /var/cache/conftool/dbconfig/20230808-021020-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P50175 and previous config saved to /var/cache/conftool/dbconfig/20230808-015513-ladsgroup.json
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T342617)', diff saved to https://phabricator.wikimedia.org/P50174 and previous config saved to /var/cache/conftool/dbconfig/20230808-014007-ladsgroup.json
  • 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T342617)', diff saved to https://phabricator.wikimedia.org/P50173 and previous config saved to /var/cache/conftool/dbconfig/20230808-005439-ladsgroup.json
  • 00:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 00:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T342617)', diff saved to https://phabricator.wikimedia.org/P50172 and previous config saved to /var/cache/conftool/dbconfig/20230808-005418-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P50171 and previous config saved to /var/cache/conftool/dbconfig/20230808-003911-ladsgroup.json
  • 00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P50170 and previous config saved to /var/cache/conftool/dbconfig/20230808-002405-ladsgroup.json
  • 00:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T342617)', diff saved to https://phabricator.wikimedia.org/P50169 and previous config saved to /var/cache/conftool/dbconfig/20230808-000859-ladsgroup.json

2023-08-07

  • 23:28 krinkle@deploy1002: Finished scap: Backport for api: Fix broken /api/index.html rendering (T113114) (duration: 09m 00s)
  • 23:23 krinkle@deploy1002: krinkle: Continuing with sync
  • 23:21 krinkle@deploy1002: krinkle: Backport for api: Fix broken /api/index.html rendering (T113114) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 23:19 krinkle@deploy1002: Started scap: Backport for api: Fix broken /api/index.html rendering (T113114)
  • 22:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1015.eqiad.wmnet
  • 22:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1015.eqiad.wmnet
  • 22:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1014.eqiad.wmnet
  • 22:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1014.eqiad.wmnet
  • 22:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1013.eqiad.wmnet
  • 22:38 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1013.eqiad.wmnet
  • 22:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1012.eqiad.wmnet
  • 22:30 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1012.eqiad.wmnet
  • 22:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1011.eqiad.wmnet
  • 22:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1011.eqiad.wmnet
  • 22:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-jumbo1010.eqiad.wmnet
  • 22:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-jumbo1010.eqiad.wmnet
  • 22:04 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=wcqs2003.codfw.wmnet
  • 21:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1081.eqiad.wmnet with OS bullseye
  • 21:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1081.eqiad.wmnet with reason: host reimage
  • 21:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1081.eqiad.wmnet with reason: host reimage
  • 21:05 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1081.eqiad.wmnet with OS bullseye
  • 21:03 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs2003.codfw.wmnet with OS bullseye
  • 21:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1080.eqiad.wmnet with OS bullseye
  • 20:53 urbanecm@deploy1002: Finished scap: Backport for unset orwikisource logo and resize pawikisource logo (T341255) (duration: 08m 09s)
  • 20:47 urbanecm@deploy1002: jdlrobson and urbanecm: Continuing with sync
  • 20:46 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for unset orwikisource logo and resize pawikisource logo (T341255) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:45 urbanecm@deploy1002: Started scap: Backport for unset orwikisource logo and resize pawikisource logo (T341255)
  • 20:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1080.eqiad.wmnet with reason: host reimage
  • 20:38 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1080.eqiad.wmnet with reason: host reimage
  • 20:24 urbanecm: mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki=enwiki --current --all --start '["18618299"]' # T315510
  • 20:24 urbanecm@deploy1002: Finished scap: Backport for ThreadItemStore: Ignore duplicates caused by duplicate executions (T323080 T341811), Update wikisource wordmarks and taglines (T341255), update idwiktionary legacy vector logo (T341175) (duration: 10m 22s)
  • 20:21 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1080.eqiad.wmnet with OS bullseye
  • 20:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs2003.codfw.wmnet with reason: host reimage
  • 20:18 urbanecm@deploy1002: urbanecm and jdlrobson and anzx and matmarex: Continuing with sync
  • 20:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2003.codfw.wmnet with reason: host reimage
  • 20:15 urbanecm@deploy1002: urbanecm and jdlrobson and anzx and matmarex: Backport for ThreadItemStore: Ignore duplicates caused by duplicate executions (T323080 T341811), Update wikisource wordmarks and taglines (T341255), update idwiktionary legacy vector logo (T341175) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet,
  • 20:14 urbanecm@deploy1002: Started scap: Backport for ThreadItemStore: Ignore duplicates caused by duplicate executions (T323080 T341811), Update wikisource wordmarks and taglines (T341255), update idwiktionary legacy vector logo (T341175)
  • 20:13 urbanecm@deploy1002: Finished scap: Backport for Fix finnish projects, remove unused SVG/PNGs, resize wikiversity (T343278), Wikivoyage logos should always be on a single line (T343279) (duration: 11m 18s)
  • 20:08 urbanecm@deploy1002: jdlrobson and urbanecm: Continuing with sync
  • 20:04 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Fix finnish projects, remove unused SVG/PNGs, resize wikiversity (T343278), Wikivoyage logos should always be on a single line (T343279) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimen
  • 20:02 urbanecm@deploy1002: Started scap: Backport for Fix finnish projects, remove unused SVG/PNGs, resize wikiversity (T343278), Wikivoyage logos should always be on a single line (T343279)
  • 20:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs2003.codfw.wmnet with OS bullseye
  • 19:18 cstone: payments-wiki upgraded from 32fe72a9 to 5b250aed
  • 19:15 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:15 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 19:14 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 19:12 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 19:12 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:12 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 19:11 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frbast1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 19:09 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T342617)', diff saved to https://phabricator.wikimedia.org/P50168 and previous config saved to /var/cache/conftool/dbconfig/20230807-185732-ladsgroup.json
  • 18:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T342617)', diff saved to https://phabricator.wikimedia.org/P50167 and previous config saved to /var/cache/conftool/dbconfig/20230807-185710-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P50166 and previous config saved to /var/cache/conftool/dbconfig/20230807-184204-ladsgroup.json
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P50165 and previous config saved to /var/cache/conftool/dbconfig/20230807-182657-ladsgroup.json
  • 18:21 krinkle@deploy1002: Finished scap: Backport for mc: Remove mcrouter-with-onhost-tier from ParserCache (T264604) (duration: 09m 07s)
  • 18:16 krinkle@deploy1002: krinkle: Continuing with sync
  • 18:14 krinkle@deploy1002: krinkle: Backport for mc: Remove mcrouter-with-onhost-tier from ParserCache (T264604) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 18:12 krinkle@deploy1002: Started scap: Backport for mc: Remove mcrouter-with-onhost-tier from ParserCache (T264604)
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T342617)', diff saved to https://phabricator.wikimedia.org/P50164 and previous config saved to /var/cache/conftool/dbconfig/20230807-181151-ladsgroup.json
  • 17:59 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:59 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 17:58 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 17:56 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:55 jgreen@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 17:54 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:47 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 17:46 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frmon1001.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 17:42 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:36 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:34 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:34 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frdev1001 from DNS for decommissioning - jgreen@cumin1001"
  • 17:33 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove frdev1001 from DNS for decommissioning - jgreen@cumin1001"
  • 17:31 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:22 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:22 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: civi1001.frack.eqiad.wmnet - jgreen@cumin1001"
  • 17:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1079.eqiad.wmnet with OS bullseye
  • 17:22 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: civi1001.frack.eqiad.wmnet - jgreen@cumin1001"
  • 17:19 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:02 inflatador: bking@puppetmaster1001 removing unused(?) puppet cert search.svc.eqiad.wmnet T343319
  • 16:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1079.eqiad.wmnet with reason: host reimage
  • 16:56 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1079.eqiad.wmnet with reason: host reimage
  • 16:47 inflatador: bking@puppetmaster1001 removing unused(?) puppet cert search.svc.codfw.wmnet T343319
  • 16:40 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1079.eqiad.wmnet with OS bullseye
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T342617)', diff saved to https://phabricator.wikimedia.org/P50163 and previous config saved to /var/cache/conftool/dbconfig/20230807-163421-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 16:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1078.eqiad.wmnet with OS bullseye
  • 16:18 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Allow logged-in users to edit object labels, aliases, and descriptions (T343400) (duration: 07m 11s)
  • 16:13 jforrester@deploy1002: jforrester: Continuing with sync
  • 16:13 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Allow logged-in users to edit object labels, aliases, and descriptions (T343400) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:11 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Allow logged-in users to edit object labels, aliases, and descriptions (T343400)
  • 15:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1078.eqiad.wmnet with reason: host reimage
  • 15:55 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1078.eqiad.wmnet with reason: host reimage
  • 15:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
  • 15:50 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1078.eqiad.wmnet with OS bullseye
  • 15:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:41 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:35 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
  • 15:35 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:35 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1078.eqiad.wmnet with OS bullseye
  • 15:34 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:34 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:36 zabe@deploy1002: Finished scap: T343294 (duration: 07m 13s)
  • 14:29 zabe@deploy1002: Started scap: T343294
  • 14:14 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
  • 14:14 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-worker1078.eqiad.wmnet
  • 14:10 btullis@cumin1001: START - Cookbook sre.hosts.dhcp for host an-worker1078.eqiad.wmnet
  • 14:08 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1078.eqiad.wmnet']
  • 14:08 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1078.eqiad.wmnet']
  • 14:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1078.eqiad.wmnet with OS bullseye
  • 14:02 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
  • 13:59 elukey@deploy1002: Finished scap: Backport for ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308) (duration: 06m 49s)
  • 13:58 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1078.eqiad.wmnet with OS bullseye
  • 13:56 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
  • 13:53 elukey@deploy1002: elukey: Continuing with sync
  • 13:53 elukey@deploy1002: elukey: Backport for ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:52 elukey@deploy1002: Started scap: Backport for ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308)
  • 13:51 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php idwiktionary --fix --add-prefix=BROKEN # T341175
  • 13:51 urbanecm@deploy1002: Finished scap: Backport for idwiktionary change wgSiteName, wgMetaNamespace and add project namespace alias (T341175) (duration: 09m 12s)
  • 13:45 urbanecm@deploy1002: urbanecm and anzx: Continuing with sync
  • 13:43 urbanecm@deploy1002: urbanecm and anzx: Backport for idwiktionary change wgSiteName, wgMetaNamespace and add project namespace alias (T341175) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:41 urbanecm@deploy1002: Started scap: Backport for idwiktionary change wgSiteName, wgMetaNamespace and add project namespace alias (T341175)
  • 13:26 urbanecm@deploy1002: Finished scap: Backport for Revert "enwiki: temp enable emergencyCaptcha" (duration: 06m 59s)
  • 13:19 urbanecm@deploy1002: Started scap: Backport for Revert "enwiki: temp enable emergencyCaptcha"
  • 13:19 urbanecm@deploy1002: Finished scap: Backport for Update knwiktionary logos (T343662), Write new for event table migration on all wikis (T330158), zhwiki: Grant "suppressredirect"to autoreviewer (T343711) (duration: 13m 54s)
  • 13:13 urbanecm@deploy1002: anzx and dreamyjazz and stang and urbanecm: Continuing with sync
  • 13:06 urbanecm@deploy1002: anzx and dreamyjazz and stang and urbanecm: Backport for Update knwiktionary logos (T343662), Write new for event table migration on all wikis (T330158), zhwiki: Grant "suppressredirect"to autoreviewer (T343711) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-d
  • 13:05 urbanecm@deploy1002: Started scap: Backport for Update knwiktionary logos (T343662), Write new for event table migration on all wikis (T330158), zhwiki: Grant "suppressredirect"to autoreviewer (T343711)
  • 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:17 dcausse: repooling wdqs1004
  • 11:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 11:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:53 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old externallinks columns in testwiki (T342683) (duration: 08m 06s)
  • 10:48 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 10:47 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old externallinks columns in testwiki (T342683) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 10:45 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old externallinks columns in testwiki (T342683)
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T342617)', diff saved to https://phabricator.wikimedia.org/P50158 and previous config saved to /var/cache/conftool/dbconfig/20230807-100805-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 09:23 dcausse: restarting blazegraph on wdqs1004
  • 08:31 elukey@deploy1002: Finished scap: Backport for ext-ORES: force cswiki to use the ORES settings/backend (T343308) (duration: 14m 50s)
  • 08:25 elukey@deploy1002: elukey: Continuing with sync
  • 08:24 elukey@deploy1002: elukey: Backport for ext-ORES: force cswiki to use the ORES settings/backend (T343308) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 100%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50157 and previous config saved to /var/cache/conftool/dbconfig/20230807-081639-root.json
  • 08:16 elukey@deploy1002: Started scap: Backport for ext-ORES: force cswiki to use the ORES settings/backend (T343308)
  • 08:08 godog: start docker-image-prune-old on alert hosts - T329939
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 75%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50156 and previous config saved to /var/cache/conftool/dbconfig/20230807-080133-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 50%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50155 and previous config saved to /var/cache/conftool/dbconfig/20230807-074628-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 25%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50154 and previous config saved to /var/cache/conftool/dbconfig/20230807-073123-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 10%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50153 and previous config saved to /var/cache/conftool/dbconfig/20230807-071618-root.json
  • 07:11 marostegui: Depool clouddb1015 T334650
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 5%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50152 and previous config saved to /var/cache/conftool/dbconfig/20230807-070113-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 3%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50151 and previous config saved to /var/cache/conftool/dbconfig/20230807-064608-root.json
  • 06:33 kart_: Updated cxserver to 2023-08-03-132800-production (T338602, T333969, T343211)
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 1%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50150 and previous config saved to /var/cache/conftool/dbconfig/20230807-063104-root.json
  • 06:28 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:25 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:22 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:22 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1224 upgrade to mariadb 10.6', diff saved to https://phabricator.wikimedia.org/P50149 and previous config saved to /var/cache/conftool/dbconfig/20230807-061653-root.json
  • 06:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wheels for Aerleon 1.6.0 upgrade - ayounsi@cumin1001
  • 06:09 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Update wheels for Aerleon 1.6.0 upgrade - ayounsi@cumin1001

2023-08-05

  • 05:57 _joe_: mounting the volume under /srv/dataimport on both puppetmaster frontends
  • 05:53 _joe_: creating logical volume "dataimport" on the puppetmaster frontends
  • 02:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 02:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T342617)', diff saved to https://phabricator.wikimedia.org/P50148 and previous config saved to /var/cache/conftool/dbconfig/20230805-013831-ladsgroup.json
  • 01:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P50147 and previous config saved to /var/cache/conftool/dbconfig/20230805-012325-ladsgroup.json
  • 01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P50146 and previous config saved to /var/cache/conftool/dbconfig/20230805-010819-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T342617)', diff saved to https://phabricator.wikimedia.org/P50145 and previous config saved to /var/cache/conftool/dbconfig/20230805-005312-ladsgroup.json
  • 00:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T342617)', diff saved to https://phabricator.wikimedia.org/P50144 and previous config saved to /var/cache/conftool/dbconfig/20230805-003155-ladsgroup.json
  • 00:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P50143 and previous config saved to /var/cache/conftool/dbconfig/20230805-001649-ladsgroup.json
  • 00:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P50142 and previous config saved to /var/cache/conftool/dbconfig/20230805-000143-ladsgroup.json

2023-08-04

  • 23:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T342617)', diff saved to https://phabricator.wikimedia.org/P50141 and previous config saved to /var/cache/conftool/dbconfig/20230804-234637-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T342617)', diff saved to https://phabricator.wikimedia.org/P50140 and previous config saved to /var/cache/conftool/dbconfig/20230804-234121-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 23:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T342617)', diff saved to https://phabricator.wikimedia.org/P50139 and previous config saved to /var/cache/conftool/dbconfig/20230804-234101-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P50138 and previous config saved to /var/cache/conftool/dbconfig/20230804-232555-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P50137 and previous config saved to /var/cache/conftool/dbconfig/20230804-231048-ladsgroup.json
  • 23:00 tzatziki: removing 1 file for legal compliance
  • 22:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T342617)', diff saved to https://phabricator.wikimedia.org/P50136 and previous config saved to /var/cache/conftool/dbconfig/20230804-225542-ladsgroup.json
  • 22:33 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 54s)
  • 22:32 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T342617)', diff saved to https://phabricator.wikimedia.org/P50135 and previous config saved to /var/cache/conftool/dbconfig/20230804-222905-ladsgroup.json
  • 22:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50134 and previous config saved to /var/cache/conftool/dbconfig/20230804-222845-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T342617)', diff saved to https://phabricator.wikimedia.org/P50133 and previous config saved to /var/cache/conftool/dbconfig/20230804-221915-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T342617)', diff saved to https://phabricator.wikimedia.org/P50132 and previous config saved to /var/cache/conftool/dbconfig/20230804-221855-ladsgroup.json
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P50131 and previous config saved to /var/cache/conftool/dbconfig/20230804-221338-ladsgroup.json
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P50130 and previous config saved to /var/cache/conftool/dbconfig/20230804-220348-ladsgroup.json
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P50129 and previous config saved to /var/cache/conftool/dbconfig/20230804-215832-ladsgroup.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P50128 and previous config saved to /var/cache/conftool/dbconfig/20230804-214842-ladsgroup.json
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50127 and previous config saved to /var/cache/conftool/dbconfig/20230804-214326-ladsgroup.json
  • 21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T342617)', diff saved to https://phabricator.wikimedia.org/P50126 and previous config saved to /var/cache/conftool/dbconfig/20230804-213336-ladsgroup.json
  • 21:20 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 44s)
  • 21:19 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
  • 21:16 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 09s)
  • 21:16 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
  • 21:16 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124 (duration: 00m 15s)
  • 21:15 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7] (wcqs): 0.3.124
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T342617)', diff saved to https://phabricator.wikimedia.org/P50125 and previous config saved to /var/cache/conftool/dbconfig/20230804-205647-ladsgroup.json
  • 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50124 and previous config saved to /var/cache/conftool/dbconfig/20230804-205626-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P50123 and previous config saved to /var/cache/conftool/dbconfig/20230804-204120-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50122 and previous config saved to /var/cache/conftool/dbconfig/20230804-203351-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 20:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50121 and previous config saved to /var/cache/conftool/dbconfig/20230804-203330-ladsgroup.json
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P50120 and previous config saved to /var/cache/conftool/dbconfig/20230804-202613-ladsgroup.json
  • 20:21 brett: imported libvmod-querysort package in bookworm-wikimedia (T342154)
  • 20:18 jforrester@deploy1002: Finished scap: Backport for ApiFunctionCall: Check calls for Z16K2 and deny those too (duration: 34m 04s)
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P50119 and previous config saved to /var/cache/conftool/dbconfig/20230804-201824-ladsgroup.json
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50118 and previous config saved to /var/cache/conftool/dbconfig/20230804-201107-ladsgroup.json
  • 20:08 jforrester@deploy1002: jforrester: Continuing with sync
  • 20:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
  • 20:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P50116 and previous config saved to /var/cache/conftool/dbconfig/20230804-200317-ladsgroup.json
  • 20:02 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:58 jforrester@deploy1002: jforrester: Backport for ApiFunctionCall: Check calls for Z16K2 and deny those too synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50115 and previous config saved to /var/cache/conftool/dbconfig/20230804-194811-ladsgroup.json
  • 19:44 jforrester@deploy1002: Started scap: Backport for ApiFunctionCall: Check calls for Z16K2 and deny those too
  • 19:17 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
  • 19:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: T323921
  • 19:11 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs2002.codfw.wmnet with OS bullseye
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50114 and previous config saved to /var/cache/conftool/dbconfig/20230804-190152-ladsgroup.json
  • 19:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50113 and previous config saved to /var/cache/conftool/dbconfig/20230804-190131-ladsgroup.json
  • 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P50112 and previous config saved to /var/cache/conftool/dbconfig/20230804-184625-ladsgroup.json
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T342617)', diff saved to https://phabricator.wikimedia.org/P50111 and previous config saved to /var/cache/conftool/dbconfig/20230804-183927-ladsgroup.json
  • 18:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50110 and previous config saved to /var/cache/conftool/dbconfig/20230804-183906-ladsgroup.json
  • 18:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs2002.codfw.wmnet with reason: host reimage
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P50109 and previous config saved to /var/cache/conftool/dbconfig/20230804-183118-ladsgroup.json
  • 18:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2002.codfw.wmnet with reason: host reimage
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P50108 and previous config saved to /var/cache/conftool/dbconfig/20230804-182400-ladsgroup.json
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50107 and previous config saved to /var/cache/conftool/dbconfig/20230804-181612-ladsgroup.json
  • 18:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs2002.codfw.wmnet with OS bullseye
  • 18:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wcqs2001.codfw.wmnet with reason: T323921
  • 18:13 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wcqs2001.codfw.wmnet with reason: T323921
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P50106 and previous config saved to /var/cache/conftool/dbconfig/20230804-180854-ladsgroup.json
  • 18:08 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50105 and previous config saved to /var/cache/conftool/dbconfig/20230804-175348-ladsgroup.json
  • 17:27 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:24 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wcqs2001.codfw.wmnet with OS bullseye
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50104 and previous config saved to /var/cache/conftool/dbconfig/20230804-165753-ladsgroup.json
  • 16:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T342617)', diff saved to https://phabricator.wikimedia.org/P50103 and previous config saved to /var/cache/conftool/dbconfig/20230804-165731-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50102 and previous config saved to /var/cache/conftool/dbconfig/20230804-164356-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T342617)', diff saved to https://phabricator.wikimedia.org/P50101 and previous config saved to /var/cache/conftool/dbconfig/20230804-164335-ladsgroup.json
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P50100 and previous config saved to /var/cache/conftool/dbconfig/20230804-164225-ladsgroup.json
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P50099 and previous config saved to /var/cache/conftool/dbconfig/20230804-162829-ladsgroup.json
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P50098 and previous config saved to /var/cache/conftool/dbconfig/20230804-162719-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P50097 and previous config saved to /var/cache/conftool/dbconfig/20230804-161322-ladsgroup.json
  • 16:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T342617)', diff saved to https://phabricator.wikimedia.org/P50096 and previous config saved to /var/cache/conftool/dbconfig/20230804-161212-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T342617)', diff saved to https://phabricator.wikimedia.org/P50095 and previous config saved to /var/cache/conftool/dbconfig/20230804-155816-ladsgroup.json
  • 15:18 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs2001.codfw.wmnet with reason: host reimage
  • 15:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs2001.codfw.wmnet with reason: host reimage
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T342617)', diff saved to https://phabricator.wikimedia.org/P50094 and previous config saved to /var/cache/conftool/dbconfig/20230804-151435-ladsgroup.json
  • 15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T342617)', diff saved to https://phabricator.wikimedia.org/P50093 and previous config saved to /var/cache/conftool/dbconfig/20230804-151409-ladsgroup.json
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T342617)', diff saved to https://phabricator.wikimedia.org/P50092 and previous config saved to /var/cache/conftool/dbconfig/20230804-150310-ladsgroup.json
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50091 and previous config saved to /var/cache/conftool/dbconfig/20230804-150232-ladsgroup.json
  • 15:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wcqs2001.codfw.wmnet with OS bullseye
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P50090 and previous config saved to /var/cache/conftool/dbconfig/20230804-145903-ladsgroup.json
  • 14:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2195.codfw.wmnet with OS bullseye
  • 14:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P50089 and previous config saved to /var/cache/conftool/dbconfig/20230804-144726-ladsgroup.json
  • 14:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P50088 and previous config saved to /var/cache/conftool/dbconfig/20230804-144357-ladsgroup.json
  • 14:40 sbassett: Deployed updated mitigation for T336027
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P50087 and previous config saved to /var/cache/conftool/dbconfig/20230804-143219-ladsgroup.json
  • 14:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2190.codfw.wmnet with OS bullseye
  • 14:31 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T342617)', diff saved to https://phabricator.wikimedia.org/P50086 and previous config saved to /var/cache/conftool/dbconfig/20230804-142851-ladsgroup.json
  • 14:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4005.wikimedia.org
  • 14:25 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync Hiera after adding bast4005 - jmm@cumin2002"
  • 14:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2195.codfw.wmnet with reason: host reimage
  • 14:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2193.codfw.wmnet with OS bullseye
  • 14:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync Hiera after adding bast4005 - jmm@cumin2002"
  • 14:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4005.wikimedia.org
  • 14:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:18 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50085 and previous config saved to /var/cache/conftool/dbconfig/20230804-141713-ladsgroup.json
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast4005.wikimedia.org
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4005.wikimedia.org with OS bookworm
  • 14:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2190.codfw.wmnet with reason: host reimage
  • 14:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2193.codfw.wmnet with reason: host reimage
  • 14:08 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2190.codfw.wmnet with reason: host reimage
  • 14:07 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:07 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2193.codfw.wmnet with reason: host reimage
  • 14:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2195.codfw.wmnet with OS bullseye
  • 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4005.wikimedia.org with reason: host reimage
  • 14:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2195']
  • 13:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4005.wikimedia.org with reason: host reimage
  • 13:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2195']
  • 13:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2195.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
  • 13:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2190.codfw.wmnet with OS bullseye
  • 13:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
  • 13:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2193.codfw.wmnet with OS bullseye
  • 13:39 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4005.wikimedia.org with OS bookworm
  • 13:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2195.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast4005.wikimedia.org - jmm@cumin2002"
  • 13:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast4005.wikimedia.org - jmm@cumin2002"
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast4005.wikimedia.org on all recursors
  • 13:12 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast4005.wikimedia.org on all recursors
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4005.wikimedia.org - jmm@cumin2002"
  • 13:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4005.wikimedia.org - jmm@cumin2002"
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T342617)', diff saved to https://phabricator.wikimedia.org/P50084 and previous config saved to /var/cache/conftool/dbconfig/20230804-130622-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T342617)', diff saved to https://phabricator.wikimedia.org/P50083 and previous config saved to /var/cache/conftool/dbconfig/20230804-130601-ladsgroup.json
  • 13:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T342617)', diff saved to https://phabricator.wikimedia.org/P50082 and previous config saved to /var/cache/conftool/dbconfig/20230804-130142-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 13:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast4005.wikimedia.org
  • 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast3007.wikimedia.org
  • 12:59 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:57 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P50081 and previous config saved to /var/cache/conftool/dbconfig/20230804-125055-ladsgroup.json
  • 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast3007.wikimedia.org
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3007.wikimedia.org
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P50080 and previous config saved to /var/cache/conftool/dbconfig/20230804-123548-ladsgroup.json
  • 12:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3007.wikimedia.org
  • 12:32 godog: bounce prometheus@k8s on prometheus100[56] to test failure to reload certs
  • 12:25 jforrester@deploy1002: Synchronized php-1.41.0-wmf.20/extensions/WikiLambda: T343380 and T343400 (duration: 10m 12s)
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T342617)', diff saved to https://phabricator.wikimedia.org/P50079 and previous config saved to /var/cache/conftool/dbconfig/20230804-122042-ladsgroup.json
  • 12:16 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 12:14 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 12:14 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 12:13 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast3007.wikimedia.org
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast3007.wikimedia.org with OS bookworm
  • 12:05 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 12:04 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50077 and previous config saved to /var/cache/conftool/dbconfig/20230804-115224-ladsgroup.json
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast3007.wikimedia.org with reason: host reimage
  • 11:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast3007.wikimedia.org with reason: host reimage
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T342617)', diff saved to https://phabricator.wikimedia.org/P50076 and previous config saved to /var/cache/conftool/dbconfig/20230804-113848-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P50075 and previous config saved to /var/cache/conftool/dbconfig/20230804-113718-ladsgroup.json
  • 11:30 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
  • 11:30 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on contint2001.wikimedia.org with reason: Decommissioning
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P50074 and previous config saved to /var/cache/conftool/dbconfig/20230804-112212-ladsgroup.json
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50073 and previous config saved to /var/cache/conftool/dbconfig/20230804-110705-ladsgroup.json
  • 11:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast3007.wikimedia.org with OS bookworm
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:38 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast3007.wikimedia.org on all recursors
  • 10:38 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast3007.wikimedia.org on all recursors
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:37 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:33 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast3007.wikimedia.org
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50072 and previous config saved to /var/cache/conftool/dbconfig/20230804-102347-ladsgroup.json
  • 10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:15 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
  • 10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
  • 08:00 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1026.eqiad.wmnet with OS bullseye
  • 07:51 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 398203
  • 07:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 398203
  • 07:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 139901
  • 07:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 139901
  • 07:37 moritzm: installing Django security updates
  • 07:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1026.eqiad.wmnet with reason: host reimage
  • 07:31 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1026.eqiad.wmnet with reason: host reimage
  • 07:19 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1026.eqiad.wmnet with OS bullseye
  • 03:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2194.codfw.wmnet with OS bullseye
  • 03:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2192.codfw.wmnet with OS bullseye
  • 03:03 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 03:00 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 02:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 02:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2192.codfw.wmnet with reason: host reimage
  • 02:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2192.codfw.wmnet with reason: host reimage
  • 02:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2194.codfw.wmnet with OS bullseye
  • 02:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2194']
  • 02:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2194']
  • 02:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2194']
  • 02:26 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db2193.codfw.wmnet with OS bullseye
  • 02:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2193.codfw.wmnet with OS bullseye
  • 02:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2192.codfw.wmnet with OS bullseye
  • 02:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2194']
  • 02:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2193']
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2194.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2192']
  • 00:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2193']
  • 00:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2193.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2192']
  • 00:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2192']
  • 00:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2192']
  • 00:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2192.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2190.codfw.wmnet with OS bullseye
  • 00:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
  • 00:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2194.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2190.codfw.wmnet with OS bullseye
  • 00:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2193.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:26 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2192.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2191.codfw.wmnet with OS bullseye
  • 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:24 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2189.codfw.wmnet with OS bullseye
  • 00:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:15 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2188.codfw.wmnet with OS bullseye
  • 00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2191.codfw.wmnet with reason: host reimage
  • 00:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2191.codfw.wmnet with reason: host reimage

2023-08-03

  • 23:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2189.codfw.wmnet with reason: host reimage
  • 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2189.codfw.wmnet with reason: host reimage
  • 23:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2188.codfw.wmnet with reason: host reimage
  • 23:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2188.codfw.wmnet with reason: host reimage
  • 23:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2191.codfw.wmnet with OS bullseye
  • 23:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2190.codfw.wmnet with OS bullseye
  • 23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2190']
  • 23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2191']
  • 23:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2189.codfw.wmnet with OS bullseye
  • 23:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2191']
  • 23:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2190']
  • 23:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2188.codfw.wmnet with OS bullseye
  • 23:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2190.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2191.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2189']
  • 23:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2189']
  • 23:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2188']
  • 23:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2188']
  • 23:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2188']
  • 23:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2189']
  • 23:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2189']
  • 23:04 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2188']
  • 22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2188.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2189.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:39 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2191.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2190.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2189.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host db2188.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:19 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:19 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for db2188-db2195 - pt1979@cumin2002"
  • 22:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for db2188-db2195 - pt1979@cumin2002"
  • 22:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:59 jforrester@deploy1002: Synchronized php-1.41.0-wmf.20/extensions/WikiLambda/: T343402 and T343380 (duration: 07m 50s)
  • 20:56 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 20:55 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 20:55 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 20:54 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 20:52 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:51 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:49 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:49 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:49 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:49 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:39 thcipriani: end UTC late backport
  • 20:36 thcipriani@deploy1002: Finished scap: Backport for pawikisource: add audiobook namespace alias (T343410) (duration: 10m 39s)
  • 20:30 thcipriani@deploy1002: anzx and thcipriani: Continuing with sync
  • 20:27 thcipriani@deploy1002: anzx and thcipriani: Backport for pawikisource: add audiobook namespace alias (T343410) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:26 thcipriani@deploy1002: Started scap: Backport for pawikisource: add audiobook namespace alias (T343410)
  • 20:23 thcipriani@deploy1002: Finished scap: Backport for Write new on group1 except wikidatawiki for event table migration (T330158) (duration: 15m 54s)
  • 20:17 thcipriani@deploy1002: dreamyjazz and thcipriani: Continuing with sync
  • 20:09 thcipriani@deploy1002: dreamyjazz and thcipriani: Backport for Write new on group1 except wikidatawiki for event table migration (T330158) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:07 thcipriani@deploy1002: Started scap: Backport for Write new on group1 except wikidatawiki for event table migration (T330158)
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lists2001.codfw.wmnet with OS bookworm
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:53 dancy: dancy@deploy1002 rebuilt and synchronized wikiversions files group2 wikis to 1.41.0-wmf.20 refs T340248
  • 19:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists2001.codfw.wmnet with reason: host reimage
  • 19:35 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists2001.codfw.wmnet with reason: host reimage
  • 19:31 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 19:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 19:26 dancy@deploy1002: Finished scap: Backport for Fix mobile search text overlapping (T343397) (duration: 09m 33s)
  • 19:20 dancy@deploy1002: jdlrobson and dancy: Continuing with sync
  • 19:20 dancy@deploy1002: jdlrobson and dancy: Backport for Fix mobile search text overlapping (T343397) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 19:16 dancy@deploy1002: Started scap: Backport for Fix mobile search text overlapping (T343397)
  • 19:12 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 19:12 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 19:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lists2001.codfw.wmnet with OS bookworm
  • 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2002.codfw.wmnet with OS bookworm
  • 17:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:17 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:17 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1025.eqiad.wmnet with OS bullseye
  • 16:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1001"
  • 16:40 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1001"
  • 16:28 jforrester@deploy1002: Finished scap: Backport for Fix unsafe validator to not reach into undefined keys (T343393) (duration: 10m 57s)
  • 16:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:22 jforrester@deploy1002: jforrester: Continuing with sync
  • 16:19 jforrester@deploy1002: jforrester: Backport for Fix unsafe validator to not reach into undefined keys (T343393) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1025.eqiad.wmnet with reason: host reimage
  • 16:18 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:18 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:17 jforrester@deploy1002: Started scap: Backport for Fix unsafe validator to not reach into undefined keys (T343393)
  • 16:15 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1025.eqiad.wmnet with reason: host reimage
  • 16:14 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Rename kubernetes10[25-26] - cgoubert@cumin1001 - T343306"
  • 16:13 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Rename kubernetes10[25-26] - cgoubert@cumin1001 - T343306"
  • 16:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
  • 16:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
  • 16:02 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 16:01 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:47 moritzm: installing pandoc security updates
  • 15:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
  • 15:40 fabfur: imported `varnishkafka` package in bookworm-wikimedia (T342154)
  • 15:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
  • 15:30 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
  • 15:24 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 15:23 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 15:23 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 15:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 15:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 15:22 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 15:22 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 15:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 15:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 15:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 15:20 moritzm: installing glibc security updates on bookworm
  • 15:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 15:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 15:19 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 15:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 15:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 15:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:11 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 15:11 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 15:11 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 15:10 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 15:10 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:10 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 15:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:09 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
  • 15:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
  • 15:05 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1025.eqiad.wmnet with OS bullseye
  • 15:02 claime: Run homer on lsw1-f3-eqiad for kubernetes102[5-6] imaging - T343306
  • 14:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2001.codfw.wmnet with OS bookworm
  • 14:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:22 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 14:22 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 14:21 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 14:21 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 14:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
  • 14:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
  • 13:58 jforrester@deploy1002: Finished scap: Backport for [Wikifunctions] Allow logged-in users to make function calls again (duration: 08m 24s)
  • 13:51 jforrester@deploy1002: jforrester: Continuing with sync
  • 13:51 jforrester@deploy1002: jforrester: Backport for [Wikifunctions] Allow logged-in users to make function calls again synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:49 jforrester@deploy1002: Started scap: Backport for [Wikifunctions] Allow logged-in users to make function calls again
  • 13:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
  • 13:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
  • 13:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
  • 13:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
  • 13:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2001']
  • 13:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2001']
  • 13:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host titan2001.codfw.wmnet with OS bookworm
  • 13:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2002']
  • 13:26 taavi: taavi@mwmaint1002 ~ $ mwscript namespaceDupes.php pawikisource --fix --add-prefix "BROKEN " # T343410
  • 13:23 taavi@deploy1002: Finished scap: Backport for pawikisource: create audiobook namespace (T343410) (duration: 13m 01s)
  • 13:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['titan2001']
  • 13:17 taavi@deploy1002: taavi and anzx: Continuing with sync
  • 13:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2002']
  • 13:12 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan2001']
  • 13:12 taavi@deploy1002: taavi and anzx: Backport for pawikisource: create audiobook namespace (T343410) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:12 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 13:12 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 13:10 taavi@deploy1002: Started scap: Backport for pawikisource: create audiobook namespace (T343410)
  • 12:41 jforrester@deploy1002: Finished scap: Backport for WikiLambda: Add PHP code for Z2K5/'short descriptions' (T343396) (duration: 09m 41s)
  • 12:34 jforrester@deploy1002: jforrester: Continuing with sync
  • 12:33 jforrester@deploy1002: jforrester: Backport for WikiLambda: Add PHP code for Z2K5/'short descriptions' (T343396) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:31 taavi: updated T343294 migitations
  • 12:31 jforrester@deploy1002: Started scap: Backport for WikiLambda: Add PHP code for Z2K5/'short descriptions' (T343396)
  • 12:15 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@54c0898] (releasing): (no justification provided) (duration: 00m 42s)
  • 12:15 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@54c0898] (releasing): (no justification provided)
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 12:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 12:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 12:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 12:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 12:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 12:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 12:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:49 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 11:48 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 11:48 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 11:48 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:48 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 11:48 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 11:47 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:47 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:47 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 11:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 11:46 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:46 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:45 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:45 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:45 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:45 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50070 and previous config saved to /var/cache/conftool/dbconfig/20230803-110028-ladsgroup.json
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50069 and previous config saved to /var/cache/conftool/dbconfig/20230803-110000-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P50068 and previous config saved to /var/cache/conftool/dbconfig/20230803-104521-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P50067 and previous config saved to /var/cache/conftool/dbconfig/20230803-104454-ladsgroup.json
  • 10:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P50066 and previous config saved to /var/cache/conftool/dbconfig/20230803-103015-ladsgroup.json
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P50065 and previous config saved to /var/cache/conftool/dbconfig/20230803-102948-ladsgroup.json
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50062 and previous config saved to /var/cache/conftool/dbconfig/20230803-101509-ladsgroup.json
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50061 and previous config saved to /var/cache/conftool/dbconfig/20230803-101441-ladsgroup.json
  • 10:13 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 10:11 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 10:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:11 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 10:11 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 10:10 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 10:10 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 10:09 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 09:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T342617)', diff saved to https://phabricator.wikimedia.org/P50059 and previous config saved to /var/cache/conftool/dbconfig/20230803-092338-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 09:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 09:21 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:17 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:16 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:03 claime: Deploying rename changes for mw149[7-8] to kubernetes102[5-6] - T343306
  • 09:03 moritzm: installing systemd bugfix updates from Bookworm 12.1 point release
  • 08:55 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 08:55 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 08:53 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 08:53 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 08:44 moritzm: installing yajl security updates
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50058 and previous config saved to /var/cache/conftool/dbconfig/20230803-084103-root.json
  • 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T342617)', diff saved to https://phabricator.wikimedia.org/P50057 and previous config saved to /var/cache/conftool/dbconfig/20230803-083845-ladsgroup.json
  • 08:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 08:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50056 and previous config saved to /var/cache/conftool/dbconfig/20230803-083824-ladsgroup.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50055 and previous config saved to /var/cache/conftool/dbconfig/20230803-082558-root.json
  • 08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P50054 and previous config saved to /var/cache/conftool/dbconfig/20230803-082318-ladsgroup.json
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50053 and previous config saved to /var/cache/conftool/dbconfig/20230803-081053-root.json
  • 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P50052 and previous config saved to /var/cache/conftool/dbconfig/20230803-080812-ladsgroup.json
  • 07:59 moritzm: installing Linux 5.10.179 on Buster hosts with Linux 5.10
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50051 and previous config saved to /var/cache/conftool/dbconfig/20230803-075548-root.json
  • 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50050 and previous config saved to /var/cache/conftool/dbconfig/20230803-075305-ladsgroup.json
  • 07:51 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab 16 major version upgrade
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50049 and previous config saved to /var/cache/conftool/dbconfig/20230803-074044-root.json
  • 07:39 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 07:38 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 07:36 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 07:36 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50048 and previous config saved to /var/cache/conftool/dbconfig/20230803-072539-root.json
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 3%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50047 and previous config saved to /var/cache/conftool/dbconfig/20230803-071034-root.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P50046 and previous config saved to /var/cache/conftool/dbconfig/20230803-065529-root.json
  • 06:35 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab 16 major version upgrade
  • 06:33 oblivian@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 06:33 oblivian@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 06:33 oblivian@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 06:33 oblivian@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 06:31 kart_: Updated MinT to 2023-08-02-142037-production (T338292)
  • 06:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:25 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50045 and previous config saved to /var/cache/conftool/dbconfig/20230803-061827-ladsgroup.json
  • 06:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50044 and previous config saved to /var/cache/conftool/dbconfig/20230803-061817-ladsgroup.json
  • 06:17 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:11 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 06:07 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:05 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 06:04 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:03 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P50043 and previous config saved to /var/cache/conftool/dbconfig/20230803-060311-ladsgroup.json
  • 06:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2129 T343296', diff saved to https://phabricator.wikimedia.org/P50042 and previous config saved to /var/cache/conftool/dbconfig/20230803-060241-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2114 to s6 primary T343296', diff saved to https://phabricator.wikimedia.org/P50041 and previous config saved to /var/cache/conftool/dbconfig/20230803-060055-marostegui.json
  • 06:00 marostegui: Starting s6 codfw failover from db2129 to db2114 - T343296
  • 05:52 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 05:52 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 05:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P50040 and previous config saved to /var/cache/conftool/dbconfig/20230803-054805-ladsgroup.json
  • 05:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 05:46 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2114 with weight 0 T343296', diff saved to https://phabricator.wikimedia.org/P50039 and previous config saved to /var/cache/conftool/dbconfig/20230803-054418-marostegui.json
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s6 T343296
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s6 T343296
  • 05:34 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 05:34 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 05:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50038 and previous config saved to /var/cache/conftool/dbconfig/20230803-053259-ladsgroup.json
  • 03:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P50037 and previous config saved to /var/cache/conftool/dbconfig/20230803-035940-ladsgroup.json
  • 03:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T342617)', diff saved to https://phabricator.wikimedia.org/P50036 and previous config saved to /var/cache/conftool/dbconfig/20230803-035917-ladsgroup.json
  • 03:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P50035 and previous config saved to /var/cache/conftool/dbconfig/20230803-034411-ladsgroup.json
  • 03:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P50034 and previous config saved to /var/cache/conftool/dbconfig/20230803-032905-ladsgroup.json
  • 03:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T342617)', diff saved to https://phabricator.wikimedia.org/P50033 and previous config saved to /var/cache/conftool/dbconfig/20230803-031359-ladsgroup.json
  • 02:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 02:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T342617)', diff saved to https://phabricator.wikimedia.org/P50032 and previous config saved to /var/cache/conftool/dbconfig/20230803-021643-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P50031 and previous config saved to /var/cache/conftool/dbconfig/20230803-020137-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P50030 and previous config saved to /var/cache/conftool/dbconfig/20230803-014629-ladsgroup.json
  • 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T342617)', diff saved to https://phabricator.wikimedia.org/P50029 and previous config saved to /var/cache/conftool/dbconfig/20230803-014503-ladsgroup.json
  • 01:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T342617)', diff saved to https://phabricator.wikimedia.org/P50028 and previous config saved to /var/cache/conftool/dbconfig/20230803-014426-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T342617)', diff saved to https://phabricator.wikimedia.org/P50027 and previous config saved to /var/cache/conftool/dbconfig/20230803-013123-ladsgroup.json
  • 01:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P50026 and previous config saved to /var/cache/conftool/dbconfig/20230803-012920-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P50025 and previous config saved to /var/cache/conftool/dbconfig/20230803-011414-ladsgroup.json
  • 00:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T342617)', diff saved to https://phabricator.wikimedia.org/P50024 and previous config saved to /var/cache/conftool/dbconfig/20230803-005908-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T342617)', diff saved to https://phabricator.wikimedia.org/P50023 and previous config saved to /var/cache/conftool/dbconfig/20230803-003939-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T342617)', diff saved to https://phabricator.wikimedia.org/P50022 and previous config saved to /var/cache/conftool/dbconfig/20230803-003916-ladsgroup.json
  • 00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P50021 and previous config saved to /var/cache/conftool/dbconfig/20230803-002410-ladsgroup.json
  • 00:13 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host titan2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P50020 and previous config saved to /var/cache/conftool/dbconfig/20230803-000904-ladsgroup.json

2023-08-02

  • 23:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T342617)', diff saved to https://phabricator.wikimedia.org/P50019 and previous config saved to /var/cache/conftool/dbconfig/20230802-235358-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T342617)', diff saved to https://phabricator.wikimedia.org/P50018 and previous config saved to /var/cache/conftool/dbconfig/20230802-232528-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T342617)', diff saved to https://phabricator.wikimedia.org/P50017 and previous config saved to /var/cache/conftool/dbconfig/20230802-232507-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P50016 and previous config saved to /var/cache/conftool/dbconfig/20230802-231001-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T342617)', diff saved to https://phabricator.wikimedia.org/P50015 and previous config saved to /var/cache/conftool/dbconfig/20230802-230127-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 23:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T342617)', diff saved to https://phabricator.wikimedia.org/P50014 and previous config saved to /var/cache/conftool/dbconfig/20230802-230106-ladsgroup.json
  • 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P50013 and previous config saved to /var/cache/conftool/dbconfig/20230802-225454-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P50012 and previous config saved to /var/cache/conftool/dbconfig/20230802-224559-ladsgroup.json
  • 22:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists2001']
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists2001']
  • 22:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lists2001']
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T342617)', diff saved to https://phabricator.wikimedia.org/P50011 and previous config saved to /var/cache/conftool/dbconfig/20230802-223948-ladsgroup.json
  • 22:39 krinkle@deploy1002: Finished scap: Backport for noc: Remove ?blame=1 from highlight.php URLs (duration: 08m 07s)
  • 22:36 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host titan2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists2001']
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for titan200[1-2] - pt1979@cumin2002"
  • 22:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lists2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for titan200[1-2] - pt1979@cumin2002"
  • 22:32 krinkle@deploy1002: reedy and krinkle: Continuing with sync
  • 22:32 krinkle@deploy1002: reedy and krinkle: Backport for noc: Remove ?blame=1 from highlight.php URLs synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 22:31 krinkle@deploy1002: Started scap: Backport for noc: Remove ?blame=1 from highlight.php URLs
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P50010 and previous config saved to /var/cache/conftool/dbconfig/20230802-223053-ladsgroup.json
  • 22:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lists2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add switch interface and DNS for lists2001 - pt1979@cumin2002"
  • 22:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add switch interface and DNS for lists2001 - pt1979@cumin2002"
  • 22:18 krinkle@deploy1002: Finished scap: Backport for Profiler: Sync minor changes with arc-lamp.git package (T337873) (duration: 11m 02s)
  • 22:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T342617)', diff saved to https://phabricator.wikimedia.org/P50009 and previous config saved to /var/cache/conftool/dbconfig/20230802-221547-ladsgroup.json
  • 22:12 krinkle@deploy1002: krinkle: Continuing with sync
  • 22:09 krinkle@deploy1002: krinkle: Backport for Profiler: Sync minor changes with arc-lamp.git package (T337873) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 22:07 krinkle@deploy1002: Started scap: Backport for Profiler: Sync minor changes with arc-lamp.git package (T337873)
  • 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T342617)', diff saved to https://phabricator.wikimedia.org/P50008 and previous config saved to /var/cache/conftool/dbconfig/20230802-212412-ladsgroup.json
  • 21:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 21:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50007 and previous config saved to /var/cache/conftool/dbconfig/20230802-212352-ladsgroup.json
  • 21:10 dancy@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.20 refs T340248 (duration: 06m 21s)
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P50006 and previous config saved to /var/cache/conftool/dbconfig/20230802-210846-ladsgroup.json
  • 21:04 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.20 refs T340248
  • 20:55 dancy@deploy1002: Finished scap: Backport for Revert "LocalisationCache: Load only core data if possible" (T342418 T343375) (duration: 08m 47s)
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P50005 and previous config saved to /var/cache/conftool/dbconfig/20230802-205339-ladsgroup.json
  • 20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T342617)', diff saved to https://phabricator.wikimedia.org/P50004 and previous config saved to /var/cache/conftool/dbconfig/20230802-204941-ladsgroup.json
  • 20:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 20:49 dancy@deploy1002: dancy: Continuing with sync
  • 20:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T342617)', diff saved to https://phabricator.wikimedia.org/P50003 and previous config saved to /var/cache/conftool/dbconfig/20230802-204919-ladsgroup.json
  • 20:48 dancy@deploy1002: dancy: Backport for Revert "LocalisationCache: Load only core data if possible" (T342418 T343375) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:46 dancy@deploy1002: Started scap: Backport for Revert "LocalisationCache: Load only core data if possible" (T342418 T343375)
  • 20:41 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@cbce175]: Deploy latest for Airflow analytics instance. (duration: 00m 20s)
  • 20:41 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@cbce175]: Deploy latest for Airflow analytics instance.
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T342617)', diff saved to https://phabricator.wikimedia.org/P50002 and previous config saved to /var/cache/conftool/dbconfig/20230802-203833-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P50001 and previous config saved to /var/cache/conftool/dbconfig/20230802-203413-ladsgroup.json
  • 20:29 dancy@deploy1002: Finished scap: Backport for Add validator userright for pawikisource (T341428) (duration: 20m 49s)
  • 20:23 dancy@deploy1002: dancy and soda: Continuing with sync
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P50000 and previous config saved to /var/cache/conftool/dbconfig/20230802-201907-ladsgroup.json
  • 20:10 dancy@deploy1002: dancy and soda: Backport for Add validator userright for pawikisource (T341428) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:08 dancy@deploy1002: Started scap: Backport for Add validator userright for pawikisource (T341428)
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T342617)', diff saved to https://phabricator.wikimedia.org/P49999 and previous config saved to /var/cache/conftool/dbconfig/20230802-200401-ladsgroup.json
  • 19:46 xcollazo@deploy1002: Finished deploy [analytics/refinery@27def33] (hadoop-test): Special refinery deploy to fix mediwiki_history_denormalize TEST [analytics/refinery@27def33] (duration: 01m 59s)
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T342617)', diff saved to https://phabricator.wikimedia.org/P49998 and previous config saved to /var/cache/conftool/dbconfig/20230802-194518-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 19:44 xcollazo@deploy1002: Started deploy [analytics/refinery@27def33] (hadoop-test): Special refinery deploy to fix mediwiki_history_denormalize TEST [analytics/refinery@27def33]
  • 19:43 xcollazo@deploy1002: Finished deploy [analytics/refinery@27def33] (thin): Special refinery deploy to fix mediwiki_history_denormalize THIN [analytics/refinery@27def33] (duration: 00m 04s)
  • 19:43 xcollazo@deploy1002: Started deploy [analytics/refinery@27def33] (thin): Special refinery deploy to fix mediwiki_history_denormalize THIN [analytics/refinery@27def33]
  • 19:41 xcollazo@deploy1002: Finished deploy [analytics/refinery@27def33]: Special refinery deploy to fix mediwiki_history_denormalize [analytics/refinery@27def33] (duration: 07m 48s)
  • 19:39 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 19:39 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 19:34 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 19:34 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 19:34 xcollazo@deploy1002: Started deploy [analytics/refinery@27def33]: Special refinery deploy to fix mediwiki_history_denormalize [analytics/refinery@27def33]
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2025']
  • 18:32 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.20 refs T340248
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T342617)', diff saved to https://phabricator.wikimedia.org/P49997 and previous config saved to /var/cache/conftool/dbconfig/20230802-182059-ladsgroup.json
  • 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T342617)', diff saved to https://phabricator.wikimedia.org/P49996 and previous config saved to /var/cache/conftool/dbconfig/20230802-182038-ladsgroup.json
  • 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P49995 and previous config saved to /var/cache/conftool/dbconfig/20230802-181724-ladsgroup.json
  • 18:16 dancy@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.20 refs T340248 (duration: 06m 38s)
  • 18:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.20 refs T340248
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P49994 and previous config saved to /var/cache/conftool/dbconfig/20230802-180532-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P49993 and previous config saved to /var/cache/conftool/dbconfig/20230802-180218-ladsgroup.json
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P49991 and previous config saved to /var/cache/conftool/dbconfig/20230802-175026-ladsgroup.json
  • 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P49990 and previous config saved to /var/cache/conftool/dbconfig/20230802-174712-ladsgroup.json
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T342617)', diff saved to https://phabricator.wikimedia.org/P49989 and previous config saved to /var/cache/conftool/dbconfig/20230802-173520-ladsgroup.json
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P49988 and previous config saved to /var/cache/conftool/dbconfig/20230802-173206-ladsgroup.json
  • 16:58 samtar@deploy1002: Finished scap: Backport for enwiki: temp enable emergencyCaptcha (duration: 07m 48s)
  • 16:52 samtar@deploy1002: samtar: Continuing with sync
  • 16:52 samtar@deploy1002: samtar: Backport for enwiki: temp enable emergencyCaptcha synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:51 samtar@deploy1002: Started scap: Backport for enwiki: temp enable emergencyCaptcha
  • 16:46 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 16:46 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 16:46 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 16:46 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 16:41 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
  • 16:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 16:02 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 15:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2016.codfw.wmnet with OS bullseye
  • 15:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T342617)', diff saved to https://phabricator.wikimedia.org/P49985 and previous config saved to /var/cache/conftool/dbconfig/20230802-155618-ladsgroup.json
  • 15:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49984 and previous config saved to /var/cache/conftool/dbconfig/20230802-155558-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T342617)', diff saved to https://phabricator.wikimedia.org/P49983 and previous config saved to /var/cache/conftool/dbconfig/20230802-155319-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 15:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 15:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T342617)', diff saved to https://phabricator.wikimedia.org/P49982 and previous config saved to /var/cache/conftool/dbconfig/20230802-155258-ladsgroup.json
  • 15:51 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 15:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1026
  • 15:45 cgoubert@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1026
  • 15:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1025
  • 15:45 cgoubert@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1025
  • 15:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix kubernetes10[25-26] main interfaces - cgoubert@cumin1001"
  • 15:43 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix kubernetes10[25-26] main interfaces - cgoubert@cumin1001"
  • 15:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3002.wikimedia.org
  • 15:41 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P49981 and previous config saved to /var/cache/conftool/dbconfig/20230802-154051-ladsgroup.json
  • 15:40 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2016.codfw.wmnet with reason: host reimage
  • 15:38 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns3002.wikimedia.org
  • 15:37 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P49980 and previous config saved to /var/cache/conftool/dbconfig/20230802-153751-ladsgroup.json
  • 15:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2016.codfw.wmnet with reason: host reimage
  • 15:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2025']
  • 15:30 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
  • 15:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2025']
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P49979 and previous config saved to /var/cache/conftool/dbconfig/20230802-152545-ladsgroup.json
  • 15:25 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:24 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:24 brett: Remove dns3002 from cr2-esams and cr3-esams routes in prep for reboot - T335835
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P49978 and previous config saved to /var/cache/conftool/dbconfig/20230802-152245-ladsgroup.json
  • 15:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pc2016.codfw.wmnet with OS bullseye
  • 15:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on config-master2001.codfw.wmnet,config-master1001.eqiad.wmnet with reason: WIP hosts to be setup
  • 15:15 volans@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on config-master2001.codfw.wmnet,config-master1001.eqiad.wmnet with reason: WIP hosts to be setup
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49977 and previous config saved to /var/cache/conftool/dbconfig/20230802-151038-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T342617)', diff saved to https://phabricator.wikimedia.org/P49976 and previous config saved to /var/cache/conftool/dbconfig/20230802-150739-ladsgroup.json
  • 15:07 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:04 moritzm: installing gst-plugins-base1.0 security updates
  • 14:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc2016']
  • 14:58 elukey@deploy1002: Finished scap: Backport for ext-ORES: avoid Lift Wing calls for fiwiki (T343308) (duration: 09m 08s)
  • 14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 14:52 elukey@deploy1002: elukey: Continuing with sync
  • 14:52 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:51 elukey@deploy1002: elukey: Backport for ext-ORES: avoid Lift Wing calls for fiwiki (T343308) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:50 volans@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1026.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:49 elukey@deploy1002: Started scap: Backport for ext-ORES: avoid Lift Wing calls for fiwiki (T343308)
  • 14:48 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2016']
  • 14:44 moritzm: installing iperf3 security updates
  • 14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 14:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 14:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudservices1006.eqiad.wmnet']
  • 14:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudservices1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:42 volans@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:41 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:41 volans@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:39 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mw[1497-1498] to kubernetes[1025-1026] - cgoubert@cumin1001"
  • 14:38 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 14:38 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rename mw[1497-1498] to kubernetes[1025-1026] - cgoubert@cumin1001"
  • 14:35 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 14:35 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 14:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:26 sbassett: Deployed updated mitigation for T336027
  • 14:19 fabfur: importing python-logstash in bookworm-wikimedia (T342154)
  • 14:19 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1497-1498].eqiad.wmnet
  • 14:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1497-1498].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1001"
  • 14:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2025']
  • 14:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
  • 14:18 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1497-1498].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1001"
  • 14:17 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on people1004.eqiad.wmnet with reason: Resizing disk
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49975 and previous config saved to /var/cache/conftool/dbconfig/20230802-141719-ladsgroup.json
  • 14:17 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on people1004.eqiad.wmnet with reason: Resizing disk
  • 14:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T342617)', diff saved to https://phabricator.wikimedia.org/P49974 and previous config saved to /var/cache/conftool/dbconfig/20230802-141640-ladsgroup.json
  • 14:15 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 14:15 fabfur: importing varnish and libvarnishapi2 in bookworm-wikimedia (T342154)
  • 14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2015.codfw.wmnet with OS bullseye
  • 14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:06 cgoubert@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1497-1498].eqiad.wmnet
  • 14:05 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1497-1498].eqiad.wment
  • 14:05 cgoubert@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:03 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 14:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti2014']
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P49973 and previous config saved to /var/cache/conftool/dbconfig/20230802-140134-ladsgroup.json
  • 13:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2025']
  • 13:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2025']
  • 13:56 claime: Decomissioning mw1497 and mw1498 - T343306
  • 13:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudservices1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:54 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices1006
  • 13:54 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices1006
  • 13:52 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:52 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudservices1006 - jclark@cumin1001"
  • 13:51 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudservices1006 - jclark@cumin1001"
  • 13:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2015.codfw.wmnet with reason: host reimage
  • 13:49 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2015.codfw.wmnet with reason: host reimage
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P49971 and previous config saved to /var/cache/conftool/dbconfig/20230802-134628-ladsgroup.json
  • 13:36 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:35 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Inject LanguageNameLookupFactory into WikibaseValueFormatterBuilders (T281726) (duration: 08m 39s)
  • 13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T342617)', diff saved to https://phabricator.wikimedia.org/P49970 and previous config saved to /var/cache/conftool/dbconfig/20230802-133122-ladsgroup.json
  • 13:29 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Continuing with sync
  • 13:28 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Inject LanguageNameLookupFactory into WikibaseValueFormatterBuilders (T281726) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49969 and previous config saved to /var/cache/conftool/dbconfig/20230802-132819-ladsgroup.json
  • 13:26 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:26 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Inject LanguageNameLookupFactory into WikibaseValueFormatterBuilders (T281726)
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T342617)', diff saved to https://phabricator.wikimedia.org/P49968 and previous config saved to /var/cache/conftool/dbconfig/20230802-132632-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 13:26 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for uzwiki: Install WikiLove (T343270) (duration: 09m 58s)
  • 13:26 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 13:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pc2015.codfw.wmnet with OS bullseye
  • 13:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti2014']
  • 13:20 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Continuing with sync
  • 13:17 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Backport for uzwiki: Install WikiLove (T343270) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:16 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for uzwiki: Install WikiLove (T343270)
  • 13:13 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php uzwiki wikilove # Create extension tables for Wikilove on uzwiki (T343270)
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49967 and previous config saved to /var/cache/conftool/dbconfig/20230802-131314-ladsgroup.json
  • 13:12 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/simplewiktionary.png\n' | mwscript purgeList.php # T343084
  • 13:11 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for simplewiktionary: Update project logo (T343084) (duration: 08m 13s)
  • 13:06 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Continuing with sync
  • 13:05 lucaswerkmeister-wmde@deploy1002: stang and lucaswerkmeister-wmde: Backport for simplewiktionary: Update project logo (T343084) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:03 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for simplewiktionary: Update project logo (T343084)
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49965 and previous config saved to /var/cache/conftool/dbconfig/20230802-125810-ladsgroup.json
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49964 and previous config saved to /var/cache/conftool/dbconfig/20230802-124305-ladsgroup.json
  • 12:42 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics_product@8bba01c]: Redeploy of analytics_product Airflow instance (duration: 00m 08s)
  • 12:42 xcollazo@deploy1002: Started deploy [airflow-dags/analytics_product@8bba01c]: Redeploy of analytics_product Airflow instance
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1184 T342284', diff saved to https://phabricator.wikimedia.org/P49963 and previous config saved to /var/cache/conftool/dbconfig/20230802-123228-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T342617)', diff saved to https://phabricator.wikimedia.org/P49962 and previous config saved to /var/cache/conftool/dbconfig/20230802-122816-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 12:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49961 and previous config saved to /var/cache/conftool/dbconfig/20230802-122756-ladsgroup.json
  • 12:21 dcausse@deploy1002: Finished deploy [airflow-dags/search@8bba01c]: search: do not use hive partitions to wait for wmf_raw.mediawiki_page (duration: 00m 11s)
  • 12:21 dcausse@deploy1002: Started deploy [airflow-dags/search@8bba01c]: search: do not use hive partitions to wait for wmf_raw.mediawiki_page
  • 12:19 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1498.eqiad.wmnet
  • 12:19 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1497.eqiad.wmnet
  • 12:19 claime: Depool mw1497 and mw1498 for reimage as wikikube nodes - T343306
  • 12:18 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on people2003.codfw.wmnet with reason: Resizing disk
  • 12:17 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on people2003.codfw.wmnet with reason: Resizing disk
  • 12:13 claime: Repool mw1451 and mw1452, more recent servers will be used - T343306
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P49960 and previous config saved to /var/cache/conftool/dbconfig/20230802-121249-ladsgroup.json
  • 12:11 jelto: update gitlab-ce package to 16.0.8-ce.0
  • 12:09 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1452.eqiad.wmnet
  • 12:09 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1451.eqiad.wmnet
  • 12:09 claime: Depool mw1451 and mw1452 for reimage as wikikube nodes - T343306
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P49959 and previous config saved to /var/cache/conftool/dbconfig/20230802-115743-ladsgroup.json
  • 11:57 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49958 and previous config saved to /var/cache/conftool/dbconfig/20230802-114237-ladsgroup.json
  • 11:41 moritzm: installing libxml2 security updates
  • 11:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 11:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 11:40 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest2002.codfw.wmnet
  • 11:40 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest2002.codfw.wmnet
  • 11:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 11:18 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 11:17 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 11:17 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 10:41 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 10:40 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
  • 10:40 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 10:37 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 10:30 samtar@deploy1002: Finished scap: Backport for Revert "enwiki: temp enable emergencyCaptcha" (duration: 07m 33s)
  • 10:24 samtar@deploy1002: samtar: Continuing with sync
  • 10:24 samtar@deploy1002: samtar: Backport for Revert "enwiki: temp enable emergencyCaptcha" synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 10:22 samtar@deploy1002: Started scap: Backport for Revert "enwiki: temp enable emergencyCaptcha"
  • 10:02 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49954 and previous config saved to /var/cache/conftool/dbconfig/20230802-095428-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 09:39 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:24 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 09:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:15 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:13 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:12 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:02 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 09:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 09:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 09:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 08:53 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 08:39 jelto: downgrade gitlab-ce package to 15.11.13-ce.0
  • 08:15 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 08:07 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 07:28 taavi: mwscript namespaceDupes.php idwikisource --fix --add-prefix "BROKEN " # T341173
  • 07:19 taavi@deploy1002: Finished scap: Backport for idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias (T341173), Change idwikisource logos (T341173) (duration: 11m 43s)
  • 07:18 moritzm: installing Linux 5.10.179-3 on bullseye hosts
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49951 and previous config saved to /var/cache/conftool/dbconfig/20230802-071441-root.json
  • 07:13 taavi@deploy1002: anzx and taavi: Continuing with sync
  • 07:09 taavi@deploy1002: anzx and taavi: Backport for idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias (T341173), Change idwikisource logos (T341173) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:07 taavi@deploy1002: Started scap: Backport for idwikisource change wgSiteName, wgMetaNamespace and add project namespace alias (T341173), Change idwikisource logos (T341173)
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49950 and previous config saved to /var/cache/conftool/dbconfig/20230802-065936-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49949 and previous config saved to /var/cache/conftool/dbconfig/20230802-064431-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49948 and previous config saved to /var/cache/conftool/dbconfig/20230802-062925-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49947 and previous config saved to /var/cache/conftool/dbconfig/20230802-061420-root.json
  • 06:13 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:12 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:12 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 06:12 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 06:11 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 06:10 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49946 and previous config saved to /var/cache/conftool/dbconfig/20230802-055916-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 3%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49945 and previous config saved to /var/cache/conftool/dbconfig/20230802-054411-root.json
  • 05:33 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: enabling emergency captcha on enwiki - T343294 (take 2) (duration: 06m 40s)
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 1%: Repooling after replacing its memory', diff saved to https://phabricator.wikimedia.org/P49944 and previous config saved to /var/cache/conftool/dbconfig/20230802-052906-root.json
  • 05:23 marostegui: Stop mariadb on es2025 for onsite maintenance dbmaint codfw T343254
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2025 T343254', diff saved to https://phabricator.wikimedia.org/P49943 and previous config saved to /var/cache/conftool/dbconfig/20230802-052021-root.json
  • 05:11 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: enabling emergency captcha on enwiki - T343294 (duration: 06m 36s)
  • 04:49 _joe_: running scap pull on mwmaint1002 to pick up the noc.w.o changes
  • 01:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc2015']
  • 01:24 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2015']
  • 01:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2015.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pc2016.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for pc2016 - pt1979@cumin2002"
  • 01:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch port and DNS for pc2016 - pt1979@cumin2002"
  • 00:59 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2005-dev.codfw.wmnet with OS bullseye
  • 00:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pc2015.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:41 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:41 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch interfaces and DNS for pc201[5-6] - pt1979@cumin2002"
  • 00:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setup switch interfaces and DNS for pc201[5-6] - pt1979@cumin2002"
  • 00:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 00:37 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bullseye
  • 00:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2006-dev.codfw.wmnet with OS bullseye
  • 00:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:25 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 00:17 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 00:11 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 00:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage

2023-08-01

  • 23:13 eileen: config revision changed from 8b3a46c3 to f5e6425b - updated process controll (added segmentation_aging job - rollback if it doesn't work)
  • 22:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bullseye
  • 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bullseye
  • 22:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bullseye
  • 22:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2006-dev']
  • 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2005-dev']
  • 22:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2004-dev']
  • 22:19 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2006-dev']
  • 22:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
  • 22:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
  • 22:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2006-dev']
  • 22:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2004-dev']
  • 22:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2005-dev']
  • 22:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
  • 22:11 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2005-dev']
  • 22:10 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
  • 22:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2004-dev']
  • 22:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2006-dev']
  • 22:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2005-dev']
  • 21:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2004-dev']
  • 21:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2008-dev.codfw.wmnet with OS bullseye
  • 21:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2007-dev.codfw.wmnet with OS bullseye
  • 21:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:29 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Log the 'WikiLambda' warnings and above logs (duration: 10m 22s)
  • 21:23 jforrester@deploy1002: jforrester: Continuing with sync
  • 21:20 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Log the 'WikiLambda' warnings and above logs synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:19 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Log the 'WikiLambda' warnings and above logs
  • 21:16 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Restrict wikilambda-execute to functioneers for now (duration: 09m 03s)
  • 21:10 jforrester@deploy1002: jforrester: Continuing with sync
  • 21:09 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Restrict wikilambda-execute to functioneers for now synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:07 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Restrict wikilambda-execute to functioneers for now
  • 21:05 jforrester@deploy1002: Synchronized ./php-1.41.0-wmf.20/extensions/WikiLambda/: T343253 T343256 (duration: 07m 23s)
  • 20:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:55 jforrester@deploy1002: Synchronized ./php-1.41.0-wmf.19/extensions/WikiLambda/: T343253 T343256 (duration: 06m 58s)
  • 20:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2008-dev.codfw.wmnet with reason: host reimage
  • 20:49 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2008-dev.codfw.wmnet with reason: host reimage
  • 20:44 urbanecm@deploy1002: Finished scap: Backport for Write new on group0 for event table migration (T330158) (duration: 21m 46s)
  • 20:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2007-dev.codfw.wmnet with reason: host reimage
  • 20:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2008-dev.codfw.wmnet with OS bullseye
  • 20:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2007-dev.codfw.wmnet with reason: host reimage
  • 20:38 urbanecm@deploy1002: urbanecm and dreamyjazz: Continuing with sync
  • 20:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2008-dev.codfw.wmnet with OS bullseye
  • 20:23 urbanecm@deploy1002: urbanecm and dreamyjazz: Backport for Write new on group0 for event table migration (T330158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:22 urbanecm@deploy1002: Started scap: Backport for Write new on group0 for event table migration (T330158)
  • 20:19 urbanecm@deploy1002: Finished scap: Backport for Design: Provide wordmarks/taglines for Wikiversity projects (T341256), Provide wordmarks for Wikivoyage projects (T341259) (duration: 09m 41s)
  • 20:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2007-dev.codfw.wmnet with OS bullseye
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2007-dev.codfw.wmnet with OS bullseye
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2008-dev.codfw.wmnet with reason: host reimage
  • 20:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2008-dev.codfw.wmnet with reason: host reimage
  • 20:13 urbanecm@deploy1002: urbanecm and jdlrobson: Continuing with sync
  • 20:11 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Design: Provide wordmarks/taglines for Wikiversity projects (T341256), Provide wordmarks for Wikivoyage projects (T341259) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option
  • 20:10 urbanecm@deploy1002: Started scap: Backport for Design: Provide wordmarks/taglines for Wikiversity projects (T341256), Provide wordmarks for Wikivoyage projects (T341259)
  • 20:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49941 and previous config saved to /var/cache/conftool/dbconfig/20230801-200444-ladsgroup.json
  • 19:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2008-dev']
  • 19:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2008-dev.codfw.wmnet with OS bullseye
  • 19:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
  • 19:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2007-dev.codfw.wmnet with reason: host reimage
  • 19:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2008-dev']
  • 19:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49940 and previous config saved to /var/cache/conftool/dbconfig/20230801-194938-ladsgroup.json
  • 19:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2007-dev.codfw.wmnet with reason: host reimage
  • 19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2008-dev']
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49939 and previous config saved to /var/cache/conftool/dbconfig/20230801-193432-ladsgroup.json
  • 19:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 19:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2006-dev.codfw.wmnet with reason: host reimage
  • 19:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2008-dev']
  • 19:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2007-dev.codfw.wmnet with OS bullseye
  • 19:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2007-dev']
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49938 and previous config saved to /var/cache/conftool/dbconfig/20230801-191925-ladsgroup.json
  • 19:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49937 and previous config saved to /var/cache/conftool/dbconfig/20230801-191709-ladsgroup.json
  • 19:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2007-dev']
  • 19:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2006-dev']
  • 19:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2007-dev']
  • 19:05 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2006-dev']
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P49936 and previous config saved to /var/cache/conftool/dbconfig/20230801-190203-ladsgroup.json
  • 19:01 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2007-dev']
  • 18:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2006-dev.codfw.wmnet with OS bullseye
  • 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P49935 and previous config saved to /var/cache/conftool/dbconfig/20230801-184657-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T342617)', diff saved to https://phabricator.wikimedia.org/P49934 and previous config saved to /var/cache/conftool/dbconfig/20230801-184220-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49933 and previous config saved to /var/cache/conftool/dbconfig/20230801-184159-ladsgroup.json
  • 18:39 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2007-dev']
  • 18:37 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:37 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:35 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49932 and previous config saved to /var/cache/conftool/dbconfig/20230801-183151-ladsgroup.json
  • 18:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudnet2008-dev']
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49931 and previous config saved to /var/cache/conftool/dbconfig/20230801-182653-ladsgroup.json
  • 18:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2007-dev']
  • 18:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudnet2008-dev']
  • 18:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2007-dev']
  • 18:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2006-dev']
  • 18:15 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.20 refs T340248
  • 18:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2007-dev']
  • 18:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2006-dev']
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49930 and previous config saved to /var/cache/conftool/dbconfig/20230801-181147-ladsgroup.json
  • 18:05 fabfur: adding dns3001 on cr2-esams and cr3-esams routing for ns2 (T335835)
  • 17:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49929 and previous config saved to /var/cache/conftool/dbconfig/20230801-175641-ladsgroup.json
  • 17:55 fabfur: running authdns-update on dns1004 to revert ntp.esams to dns3001 (T335835)
  • 17:48 fabfur: running puppet on 'A:cumin or A:dns-rec or A:netbox' (https://gerrit.wikimedia.org/r/c/operations/puppet/+/944286) (T335835)
  • 17:42 fabfur: started bird and enabled puppet on dns3001 (T335835)
  • 17:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3001.wikimedia.org
  • 17:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns3001.wikimedia.org
  • 17:36 fabfur: stopped bird and disable puppet on dns3001 (T335835)
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49928 and previous config saved to /var/cache/conftool/dbconfig/20230801-173130-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49927 and previous config saved to /var/cache/conftool/dbconfig/20230801-173109-ladsgroup.json
  • 17:26 fabfur: running puppet on 'A:cumin or A:dns-rec or A:netbox' (https://gerrit.wikimedia.org/r/c/operations/puppet/+/944286) (T335835)
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P49926 and previous config saved to /var/cache/conftool/dbconfig/20230801-171603-ladsgroup.json
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49925 and previous config saved to /var/cache/conftool/dbconfig/20230801-171120-ladsgroup.json
  • 17:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 17:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49924 and previous config saved to /var/cache/conftool/dbconfig/20230801-171059-ladsgroup.json
  • 17:09 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@ee544cb]: Update kartotherian to e28ea7ef (T334668 T332985 T332664 T329924) (duration: 04m 25s)
  • 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@ee544cb]: Update kartotherian to e28ea7ef (T334668 T332985 T332664 T329924)
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P49923 and previous config saved to /var/cache/conftool/dbconfig/20230801-170057-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49922 and previous config saved to /var/cache/conftool/dbconfig/20230801-165553-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49921 and previous config saved to /var/cache/conftool/dbconfig/20230801-164550-ladsgroup.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49920 and previous config saved to /var/cache/conftool/dbconfig/20230801-164047-ladsgroup.json
  • 16:38 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2005-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49919 and previous config saved to /var/cache/conftool/dbconfig/20230801-162541-ladsgroup.json
  • 16:23 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:22 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:22 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:21 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:20 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:07 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:06 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T342617)', diff saved to https://phabricator.wikimedia.org/P49918 and previous config saved to /var/cache/conftool/dbconfig/20230801-160006-ladsgroup.json
  • 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49917 and previous config saved to /var/cache/conftool/dbconfig/20230801-155945-ladsgroup.json
  • 15:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt2005-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt2004-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P49916 and previous config saved to /var/cache/conftool/dbconfig/20230801-154439-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T342617)', diff saved to https://phabricator.wikimedia.org/P49915 and previous config saved to /var/cache/conftool/dbconfig/20230801-154242-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49914 and previous config saved to /var/cache/conftool/dbconfig/20230801-154220-ladsgroup.json
  • 15:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudnet2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49913 and previous config saved to /var/cache/conftool/dbconfig/20230801-153155-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P49912 and previous config saved to /var/cache/conftool/dbconfig/20230801-152933-ladsgroup.json
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49911 and previous config saved to /var/cache/conftool/dbconfig/20230801-152714-ladsgroup.json
  • 15:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudnet2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49910 and previous config saved to /var/cache/conftool/dbconfig/20230801-151650-ladsgroup.json
  • 15:15 moritzm: bounce ferm on dse-k8s-ctrl1001
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49909 and previous config saved to /var/cache/conftool/dbconfig/20230801-151427-ladsgroup.json
  • 15:14 apine@deploy1002: Finished scap: Backport for Move wikifunctions.org from locked-down to limited deployment (T342820) (duration: 07m 45s)
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49908 and previous config saved to /var/cache/conftool/dbconfig/20230801-151208-ladsgroup.json
  • 15:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:08 apine@deploy1002: jforrester and apine: Continuing with sync
  • 15:07 apine@deploy1002: jforrester and apine: Backport for Move wikifunctions.org from locked-down to limited deployment (T342820) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 15:06 apine@deploy1002: Started scap: Backport for Move wikifunctions.org from locked-down to limited deployment (T342820)
  • 15:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudnet2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49907 and previous config saved to /var/cache/conftool/dbconfig/20230801-150146-ladsgroup.json
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49906 and previous config saved to /var/cache/conftool/dbconfig/20230801-145702-ladsgroup.json
  • 14:47 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add config-master[12]001 - jbond@cumin1001 - T341717"
  • 14:46 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add config-master[12]001 - jbond@cumin1001 - T341717"
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49905 and previous config saved to /var/cache/conftool/dbconfig/20230801-144641-ladsgroup.json
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T342617)', diff saved to https://phabricator.wikimedia.org/P49904 and previous config saved to /var/cache/conftool/dbconfig/20230801-143930-ladsgroup.json
  • 14:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 14:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49903 and previous config saved to /var/cache/conftool/dbconfig/20230801-143909-ladsgroup.json
  • 14:38 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dse-k8s-ctrl1001.eqiad.wmnet
  • 14:34 Lucas_WMDE: UTC afternoon backport+config window done (one change, then some k8s issues, which are resolved for now)
  • 14:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2008-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:27 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
  • 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
  • 14:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:25 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:25 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 14:24 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 14:24 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:24 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:24 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:24 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P49902 and previous config saved to /var/cache/conftool/dbconfig/20230801-142403-ladsgroup.json
  • 14:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
  • 14:22 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:22 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 14:21 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 14:21 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 14:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
  • 14:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:19 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:18 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:16 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
  • 14:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:15 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49901 and previous config saved to /var/cache/conftool/dbconfig/20230801-141144-ladsgroup.json
  • 14:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 14:11 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49900 and previous config saved to /var/cache/conftool/dbconfig/20230801-141123-ladsgroup.json
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P49899 and previous config saved to /var/cache/conftool/dbconfig/20230801-140856-ladsgroup.json
  • 14:07 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:05 fabfur: running authdns-update on dns1004 to move ntp.esams to dns3002 (https://gerrit.wikimedia.org/r/c/operations/dns/+/944232) (T335835)
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P49897 and previous config saved to /var/cache/conftool/dbconfig/20230801-135617-ladsgroup.json
  • 13:54 fabfur: removing dns3001 from cr2-esams and cr3-esams routing for reboot (T335835)
  • 13:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1001.eqiad.wmnet
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49896 and previous config saved to /var/cache/conftool/dbconfig/20230801-135350-ladsgroup.json
  • 13:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:49 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 13:49 cgoubert@deploy1002: Started scap: (no justification provided)
  • 13:47 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 13:46 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1001.eqiad.wmnet
  • 13:46 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:46 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:45 jbond@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host config-master2001.codfw.wmnet
  • 13:45 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host config-master2001.codfw.wmnet with OS bookworm
  • 13:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
  • 13:43 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for btmwiktionary: Add project logo (T343004) (duration: 32m 32s)
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P49895 and previous config saved to /var/cache/conftool/dbconfig/20230801-134111-ladsgroup.json
  • 13:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
  • 13:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 13:33 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on config-master2001.codfw.wmnet with reason: host reimage
  • 13:33 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host config-master1001.eqiad.wmnet
  • 13:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host config-master1001.eqiad.wmnet with OS bookworm
  • 13:32 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:31 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 13:30 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on config-master2001.codfw.wmnet with reason: host reimage
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49891 and previous config saved to /var/cache/conftool/dbconfig/20230801-132604-ladsgroup.json
  • 13:24 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Continuing with sync
  • 13:22 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for btmwiktionary: Add project logo (T343004) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on config-master1001.eqiad.wmnet with reason: host reimage
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T342617)', diff saved to https://phabricator.wikimedia.org/P49890 and previous config saved to /var/cache/conftool/dbconfig/20230801-131946-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49889 and previous config saved to /var/cache/conftool/dbconfig/20230801-131925-ladsgroup.json
  • 13:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on config-master1001.eqiad.wmnet with reason: host reimage
  • 13:12 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host config-master2001.codfw.wmnet with OS bookworm
  • 13:11 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master2001.codfw.wmnet - jbond@cumin2002"
  • 13:11 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master2001.codfw.wmnet - jbond@cumin2002"
  • 13:11 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for btmwiktionary: Add project logo (T343004)
  • 13:10 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master2001.codfw.wmnet on all recursors
  • 13:10 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache config-master2001.codfw.wmnet on all recursors
  • 13:10 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:10 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master2001.codfw.wmnet - jbond@cumin2002"
  • 13:09 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master2001.codfw.wmnet - jbond@cumin2002"
  • 13:06 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host config-master1001.eqiad.wmnet with OS bookworm
  • 13:06 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 13:06 jbond@cumin2002: START - Cookbook sre.ganeti.makevm for new host config-master2001.codfw.wmnet
  • 13:05 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master1001.eqiad.wmnet - jbond@cumin1001"
  • 13:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM config-master1001.eqiad.wmnet - jbond@cumin1001"
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P49888 and previous config saved to /var/cache/conftool/dbconfig/20230801-130419-ladsgroup.json
  • 13:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master1001.eqiad.wmnet on all recursors
  • 13:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache config-master1001.eqiad.wmnet on all recursors
  • 13:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:01 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master1001.eqiad.wmnet - jbond@cumin1001"
  • 13:00 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM config-master1001.eqiad.wmnet - jbond@cumin1001"
  • 12:58 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:58 jbond@cumin1001: START - Cookbook sre.ganeti.makevm for new host config-master1001.eqiad.wmnet
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P49887 and previous config saved to /var/cache/conftool/dbconfig/20230801-124912-ladsgroup.json
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T342617)', diff saved to https://phabricator.wikimedia.org/P49886 and previous config saved to /var/cache/conftool/dbconfig/20230801-124508-ladsgroup.json
  • 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49885 and previous config saved to /var/cache/conftool/dbconfig/20230801-124442-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49883 and previous config saved to /var/cache/conftool/dbconfig/20230801-123406-ladsgroup.json
  • 12:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
  • 12:30 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.resource_report (exit_code=0)
  • 12:30 jbond@cumin1001: START - Cookbook sre.ganeti.resource_report
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P49882 and previous config saved to /var/cache/conftool/dbconfig/20230801-122936-ladsgroup.json
  • 12:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P49881 and previous config saved to /var/cache/conftool/dbconfig/20230801-121430-ladsgroup.json
  • 12:11 fabfur: imported purged package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/purged/+/944177) T342154
  • 12:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1076.eqiad.wmnet with OS bullseye
  • 12:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1077.eqiad.wmnet with OS bullseye
  • 12:03 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49880 and previous config saved to /var/cache/conftool/dbconfig/20230801-115924-ladsgroup.json
  • 11:57 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
  • 11:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T342617)', diff saved to https://phabricator.wikimedia.org/P49879 and previous config saved to /var/cache/conftool/dbconfig/20230801-115110-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 11:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 11:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
  • 11:38 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1077.eqiad.wmnet with reason: host reimage
  • 11:36 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1076.eqiad.wmnet with reason: host reimage
  • 11:33 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1077.eqiad.wmnet with reason: host reimage
  • 11:33 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1076.eqiad.wmnet with reason: host reimage
  • 11:22 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
  • 11:21 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T342617)', diff saved to https://phabricator.wikimedia.org/P49878 and previous config saved to /var/cache/conftool/dbconfig/20230801-111829-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49877 and previous config saved to /var/cache/conftool/dbconfig/20230801-111808-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49876 and previous config saved to /var/cache/conftool/dbconfig/20230801-110858-ladsgroup.json
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P49875 and previous config saved to /var/cache/conftool/dbconfig/20230801-110302-ladsgroup.json
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P49874 and previous config saved to /var/cache/conftool/dbconfig/20230801-105352-ladsgroup.json
  • 10:51 hnowlan@deploy1002: Finished deploy [restbase/deploy@8eb62f2]: Add gpewiki and btmwiktionary (T335988, T336116) (duration: 20m 29s)
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P49873 and previous config saved to /var/cache/conftool/dbconfig/20230801-104755-ladsgroup.json
  • 10:45 moritzm: update d-i images to bookworm 12.1 T343121
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P49872 and previous config saved to /var/cache/conftool/dbconfig/20230801-103846-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49871 and previous config saved to /var/cache/conftool/dbconfig/20230801-103249-ladsgroup.json
  • 10:31 hnowlan@deploy1002: Started deploy [restbase/deploy@8eb62f2]: Add gpewiki and btmwiktionary (T335988, T336116)
  • 10:28 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1076.eqiad.wmnet with OS bullseye
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49870 and previous config saved to /var/cache/conftool/dbconfig/20230801-102340-ladsgroup.json
  • 10:21 fabfur: imported prometheus-varnishkafka-exporter package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/prometheus-varnishkafka-exporter/+/944169) T342154
  • 10:18 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1077.eqiad.wmnet with OS bullseye
  • 09:47 urbanecm@deploy1002: Finished scap: Backport for Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192) (duration: 11m 35s)
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T342617)', diff saved to https://phabricator.wikimedia.org/P49869 and previous config saved to /var/cache/conftool/dbconfig/20230801-094538-ladsgroup.json
  • 09:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 09:40 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:39 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:38 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 09:37 urbanecm@deploy1002: urbanecm: Backport for Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T342617)', diff saved to https://phabricator.wikimedia.org/P49868 and previous config saved to /var/cache/conftool/dbconfig/20230801-093717-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:35 urbanecm@deploy1002: Started scap: Backport for Revert "Fixes: Echo notification count disappears on load in mobile skin" (T335273 T343192)
  • 09:33 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
  • 09:33 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1076.eqiad.wmnet with OS bullseye
  • 09:32 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 09:21 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 09:15 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 09:12 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:11 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
  • 09:03 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1077.eqiad.wmnet with OS bullseye
  • 09:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 09:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 08:40 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1077.eqiad.wmnet with OS bullseye
  • 08:38 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1076.eqiad.wmnet with OS bullseye
  • 08:33 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135) (duration: 10m 52s)
  • 08:30 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 08:29 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 08:27 urbanecm@deploy1002: sgimeno and urbanecm: Continuing with sync
  • 08:24 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:22 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: enable AddLink task frontend in 10th round of wikis (T308135)
  • 08:22 moritzm: installing Linux 4.19.289 on Buster hosts
  • 08:17 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 08:17 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 07:49 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 07:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 07:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 07:44 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 07:41 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 07:41 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 07:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 732 hosts
  • 07:37 root@cumin2002: START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 732 hosts
  • 07:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 24 hosts
  • 07:37 root@cumin2002: START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 24 hosts
  • 07:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nmaphophe out of all services on: 1277 hosts
  • 07:36 root@cumin2002: START - Cookbook sre.idm.logout Logging Nmaphophe out of all services on: 1277 hosts
  • 07:07 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 07:07 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 06:54 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 06:54 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 06:24 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 05:48 kart_: cxserver: Remove Youdao MT service (T329137)
  • 05:46 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:45 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:26 kart_: Updated cxserver to 2023-07-13-063245-production (T340953)
  • 05:24 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:23 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:18 marostegui: dbmaint s4 testcommonswiki eqiad T343174
  • 05:16 marostegui: dbmaint s4 labswiki (wikitech) eqiad T343175
  • 05:15 marostegui: dbmaint s4 testcommonswiki eqiad T343175
  • 05:12 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:12 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:07 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:06 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 03:57 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.18 (duration: 02m 09s)
  • 03:54 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.20 refs T340248 (duration: 52m 06s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.20 refs T340248
  • 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49867 and previous config saved to /var/cache/conftool/dbconfig/20230801-023010-ladsgroup.json
  • 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P49866 and previous config saved to /var/cache/conftool/dbconfig/20230801-021504-ladsgroup.json
  • 01:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P49865 and previous config saved to /var/cache/conftool/dbconfig/20230801-015958-ladsgroup.json
  • 01:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49864 and previous config saved to /var/cache/conftool/dbconfig/20230801-014452-ladsgroup.json
  • 00:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49863 and previous config saved to /var/cache/conftool/dbconfig/20230801-004000-ladsgroup.json
  • 00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P49862 and previous config saved to /var/cache/conftool/dbconfig/20230801-002454-ladsgroup.json
  • 00:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2007-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2006-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud nodes DNS and switch config - pt1979@cumin2002"
  • 00:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud nodes DNS and switch config - pt1979@cumin2002"
  • 00:11 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P49861 and previous config saved to /var/cache/conftool/dbconfig/20230801-000948-ladsgroup.json

Other archives

2000s

2010s

2020s