Jump to content

Server Admin Log/Archive 78

From Wikitech


2024-04-30

  • 23:04 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7014.magru.wmnet with OS bullseye
  • 23:04 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 22:58 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7013.magru.wmnet with OS bullseye
  • 22:56 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 22:35 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7013.magru.wmnet with reason: host reimage
  • 22:33 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7014.magru.wmnet with reason: host reimage
  • 22:32 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7013.magru.wmnet with reason: host reimage
  • 22:30 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7014.magru.wmnet with reason: host reimage
  • 22:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1005.eqiad.wmnet with OS bullseye
  • 22:05 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7013.magru.wmnet with OS bullseye
  • 22:04 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7014.magru.wmnet with OS bullseye
  • 22:02 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
  • 21:56 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
  • 21:50 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cephadm1001.eqiad.wmnet
  • 21:50 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:50 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cephadm1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 21:49 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cephadm1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 21:37 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bullseye
  • 21:33 mutante: grafana2001 - sudo -u loki /usr/bin/loki -config.file=/etc/loki/loki-local-config.yaml in an attempt to debug issue on grafana-next.wikimedia.org
  • 21:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1004.eqiad.wmnet with OS bullseye
  • 21:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir1002.eqiad.wmnet with OS bookworm
  • 21:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
  • 20:58 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
  • 20:56 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 20:55 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7015.magru.wmnet with OS bullseye
  • 20:55 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 20:54 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 20:51 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts cephadm1001.eqiad.wmnet
  • 20:50 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns7002.wikimedia.org with reason: reimaged again
  • 20:50 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns7002.wikimedia.org with reason: reimaged again
  • 20:49 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns7002.wikimedia.org with OS bookworm
  • 20:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 20:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 20:40 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lists1004.wikimedia.org with OS bookworm
  • 20:40 aokoth@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aokoth@cumin1002"
  • 20:39 cjming: end of UTC late backport window
  • 20:39 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bullseye
  • 20:38 cjming@deploy1002: Finished scap: Backport for Deploy a11y settings to testwiki (T362147) (duration: 21m 00s)
  • 20:38 aokoth@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aokoth@cumin1002"
  • 20:37 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7016.magru.wmnet with OS bullseye
  • 20:37 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 20:36 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 20:31 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7015.magru.wmnet with reason: host reimage
  • 20:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1002.eqiad.wmnet with OS bookworm
  • 20:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir1001.eqiad.wmnet with OS bookworm
  • 20:27 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7015.magru.wmnet with reason: host reimage
  • 20:26 cjming@deploy1002: ksarabia and cjming: Continuing with sync
  • 20:21 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists1004.wikimedia.org with reason: host reimage
  • 20:20 cjming@deploy1002: ksarabia and cjming: Backport for Deploy a11y settings to testwiki (T362147) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:18 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists1004.wikimedia.org with reason: host reimage
  • 20:17 cjming@deploy1002: Started scap: Backport for Deploy a11y settings to testwiki (T362147)
  • 20:16 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7002.wikimedia.org with reason: host reimage
  • 20:13 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7016.magru.wmnet with reason: host reimage
  • 20:11 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7002.wikimedia.org with reason: host reimage
  • 20:10 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7016.magru.wmnet with reason: host reimage
  • 20:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 20:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 20:03 aokoth@cumin1002: START - Cookbook sre.hosts.reimage for host lists1004.wikimedia.org with OS bookworm
  • 20:01 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.wikimedia.org with OS bookworm
  • 19:59 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
  • 19:55 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1001.eqiad.wmnet with OS bookworm
  • 19:53 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ncredir1001.eqiad.wmnet with OS bookworm
  • 19:46 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7008.magru.wmnet with OS bullseye
  • 19:46 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 19:45 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 19:45 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7006.magru.wmnet with OS bullseye
  • 19:45 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 19:44 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7016.magru.wmnet with OS bullseye
  • 19:44 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 19:41 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7007.magru.wmnet with OS bullseye
  • 19:41 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 19:40 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 19:38 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bookworm
  • 19:32 sukhe: sudo ipmitool -I lanplus -H "dns7002.mgmt.magru.wmnet" -U root -E chassis power cycle
  • 19:29 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7003.magru.wmnet with OS bullseye
  • 19:27 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns7002.wikimedia.org with OS bookworm
  • 19:22 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1001.eqiad.wmnet with OS bookworm
  • 19:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7008.magru.wmnet with reason: host reimage
  • 19:16 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7006.magru.wmnet with reason: host reimage
  • 19:15 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7008.magru.wmnet with reason: host reimage
  • 19:15 aokoth@cumin1002: START - Cookbook sre.hosts.reimage for host lists1004.wikimedia.org with OS bookworm
  • 19:14 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
  • 19:14 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7006.magru.wmnet with reason: host reimage
  • 19:12 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
  • 19:08 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:04 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
  • 19:01 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
  • 18:59 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:57 sukhe@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns7001
  • 18:56 sukhe@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dns7001
  • 18:55 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:53 herron: updated alertmanager IRC alert text format. for details please see https://gerrit.wikimedia.org/r/c/operations/puppet/+/1019840
  • 18:49 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bookworm
  • 18:48 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye
  • 18:47 sukhe: sudo cumin -b1 -s10 'C:confd and *.esams.wmnet' 'systemctl restart confd'
  • 18:46 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7006.magru.wmnet with OS bullseye
  • 18:46 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
  • 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.esams.wmnet on all recursors
  • 18:41 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache _etcd._tcp.esams.wmnet on all recursors
  • 18:40 sukhe: running authdns-update for CR 1025800
  • 18:38 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp7014']
  • 18:38 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7014']
  • 18:32 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7003.magru.wmnet with OS bullseye
  • 18:31 sukhe@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns7001.magru.wmnet']
  • 18:31 sukhe@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns7001.magru.wmnet']
  • 18:31 sukhe@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns7001.magru.wmnet']
  • 18:31 sukhe@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns7001.magru.wmnet']
  • 18:30 aokoth@cumin1002: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:26 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7014.magru.wmnet with OS bullseye
  • 18:21 xcollazo@deploy1002: Finished deploy [analytics/refinery@4836095] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4836095f] (duration: 02m 52s)
  • 18:18 aokoth@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lists1004
  • 18:18 xcollazo@deploy1002: Started deploy [analytics/refinery@4836095] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4836095f]
  • 18:18 xcollazo@deploy1002: Finished deploy [analytics/refinery@4836095] (thin): Regular analytics weekly train THIN [analytics/refinery@4836095f] (duration: 03m 57s)
  • 18:17 aokoth@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host lists1004
  • 18:16 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:16 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge pending changes - sukhe@cumin1002"
  • 18:15 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge pending changes - sukhe@cumin1002"
  • 18:14 xcollazo@deploy1002: Started deploy [analytics/refinery@4836095] (thin): Regular analytics weekly train THIN [analytics/refinery@4836095f]
  • 18:13 xcollazo@deploy1002: Finished deploy [analytics/refinery@4836095]: Regular analytics weekly train [analytics/refinery@4836095f] (duration: 16m 16s)
  • 18:13 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 18:12 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7014.magru.wmnet with OS bullseye
  • 18:11 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7014.magru.wmnet with OS bullseye
  • 18:11 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns7001.wikimedia.org with OS bookworm
  • 18:11 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye
  • 18:09 sukhe: running cookbook -d sre.dns.netbox "test"
  • 18:06 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
  • 18:04 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7014.magru.wmnet with OS bullseye
  • 18:03 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "running manually for cp7013 - sukhe@cumin1002"
  • 18:03 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm
  • 18:02 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "running manually for cp7013 - sukhe@cumin1002"
  • 17:57 xcollazo@deploy1002: Started deploy [analytics/refinery@4836095]: Regular analytics weekly train [analytics/refinery@4836095f]
  • 17:56 swfrench@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: etcd replication maintenance - T358636 (duration: 55m 11s)
  • 17:54 swfrench-wmf: putting etcd back in read-write mode for T358636
  • 17:09 swfrench-wmf: disabling etcd replication into conf2005 for T358636
  • 17:03 swfrench-wmf: putting etcd in read-only mode for T358636
  • 17:01 swfrench@deploy1002: Locking from deployment [ALL REPOSITORIES]: etcd replication maintenance - T358636
  • 16:56 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7004.magru.wmnet with OS bullseye
  • 16:55 ejegg: payments-wiki upgraded from c7ab847d to c4f43931
  • 16:53 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7013.magru.wmnet with OS bullseye
  • 16:53 sukhe@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 16:53 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 16:52 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7005.magru.wmnet with OS bullseye
  • 16:52 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 16:51 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 16:36 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7002.magru.wmnet with OS bullseye
  • 16:33 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7012.magru.wmnet with OS bullseye
  • 16:33 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 16:30 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 16:30 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7013.magru.wmnet with reason: host reimage
  • 16:27 fabfur@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7003.magru.wmnet with OS bullseye
  • 16:27 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7004.magru.wmnet with reason: host reimage
  • 16:25 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7013.magru.wmnet with reason: host reimage
  • 16:25 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7004.magru.wmnet with reason: host reimage
  • 16:24 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7005.magru.wmnet with reason: host reimage
  • 16:22 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1003.eqiad.wmnet with OS bullseye
  • 16:22 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7005.magru.wmnet with reason: host reimage
  • 16:18 stevemunene@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:18 stevemunene@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:16 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference,name=codfw
  • 16:16 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*.eqiad.wmnet: Move to PKI Truststore - elukey@cumin1002
  • 16:12 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
  • 16:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7002.magru.wmnet with reason: host reimage
  • 16:09 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 16:08 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns7001.wikimedia.org with OS bullseye
  • 16:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7012.magru.wmnet with reason: host reimage
  • 16:06 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
  • 16:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
  • 16:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7002.magru.wmnet with reason: host reimage
  • 16:04 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7012.magru.wmnet with reason: host reimage
  • 16:02 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
  • 15:59 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7013.magru.wmnet with OS bullseye
  • 15:58 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
  • 15:58 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*.eqiad.wmnet: Move to PKI Truststore - elukey@cumin1002
  • 15:57 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200[5,6].codfw.wmnet: Move to PKI Truststore - elukey@cumin1002
  • 15:56 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS bullseye
  • 15:56 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7005.magru.wmnet with OS bullseye
  • 15:44 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200[5,6].codfw.wmnet: Move to PKI Truststore - elukey@cumin1002
  • 15:40 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2004.codfw.wmnet: Move to PKI Truststore - elukey@cumin1002
  • 15:39 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7012.magru.wmnet with OS bullseye
  • 15:38 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1003.eqiad.wmnet with OS bullseye
  • 15:36 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7003.magru.wmnet with OS bullseye
  • 15:36 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye
  • 15:33 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2004.codfw.wmnet: Move to PKI Truststore - elukey@cumin1002
  • 15:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 15:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 15:31 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bullseye
  • 15:28 elukey: move Cassandra instances on session store nodes to a new Java Truststore to support PKI - T352647
  • 15:27 elukey: depool liftwing codfw for a couple of hours to test eqiad capabilities to handle the traffic
  • 15:27 elukey@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=inference,name=codfw
  • 15:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 15:11 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7011.magru.wmnet with OS bullseye
  • 15:11 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 15:09 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 15:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mw2382']
  • 15:06 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2382']
  • 15:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2382.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1002.eqiad.wmnet with OS bullseye
  • 14:59 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7010.magru.wmnet with OS bullseye
  • 14:59 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 14:58 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7004.magru.wmnet with OS bookworm
  • 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 14:50 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns7001.wikimedia.org with OS bullseye
  • 14:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mw2382.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:45 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7011.magru.wmnet with reason: host reimage
  • 14:45 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir3004.esams.wmnet
  • 14:45 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir3003.esams.wmnet
  • 14:43 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7001.magru.wmnet with OS bullseye
  • 14:43 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 14:42 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 14:42 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
  • 14:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7011.magru.wmnet with reason: host reimage
  • 14:39 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
  • 14:35 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir3003.esams.wmnet
  • 14:34 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
  • 14:34 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7009.magru.wmnet with OS bullseye
  • 14:34 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 14:33 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage
  • 14:31 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
  • 14:31 bblack@cumin1002: conftool action : set/pooled=yes; selector: name=ncredir3003.esams.wmnet
  • 14:31 bblack@cumin1002: conftool action : set/pooled=no; selector: name=ncredir3003.esams.wmnet
  • 14:29 moritzm: installing gnutls28 security updates on buster
  • 14:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage
  • 14:21 moritzm: installing Java 8 security updates
  • 14:21 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7004.magru.wmnet with reason: will be reimaged soon
  • 14:21 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7004.magru.wmnet with reason: will be reimaged soon
  • 14:19 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
  • 14:19 dcausse@deploy1002: Finished deploy [airflow-dags/search@ab19bcd]: wdqs: deduplicate side-output events (T362508) (duration: 00m 29s)
  • 14:19 dcausse@deploy1002: Started deploy [airflow-dags/search@ab19bcd]: wdqs: deduplicate side-output events (T362508)
  • 14:16 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7001.magru.wmnet with reason: host reimage
  • 14:15 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7011.magru.wmnet with OS bullseye
  • 14:13 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7001.magru.wmnet with reason: host reimage
  • 14:10 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.copy (exit_code=99) Will create a clone of db2114.codfw.wmnet onto db1125.eqiad.wmnet
  • 14:10 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db2114.codfw.wmnet onto db1125.eqiad.wmnet
  • 14:10 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7009.magru.wmnet with reason: host reimage
  • 14:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp[7002-7003].magru.wmnet with reason: will be reimaged soon
  • 14:07 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7009.magru.wmnet with reason: host reimage
  • 14:06 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp[7002-7003].magru.wmnet with reason: will be reimaged soon
  • 14:06 urbanecm@deploy1002: Finished scap: Backport for Turn on ParserMigration extension everywhere, Quiet ParserMigration notice for 30 days after acknowledgement (duration: 23m 40s)
  • 14:03 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7010.magru.wmnet with OS bullseye
  • 13:54 urbanecm@deploy1002: urbanecm and cscott: Continuing with sync
  • 13:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7004.magru.wmnet with OS bookworm
  • 13:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7003.magru.wmnet with OS bookworm
  • 13:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 13:51 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bullseye
  • 13:47 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 13:47 urbanecm@deploy1002: urbanecm and cscott: Backport for Turn on ParserMigration extension everywhere, Quiet ParserMigration notice for 30 days after acknowledgement synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1233.eqiad.wmnet
  • 13:43 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7001.magru.wmnet with OS bullseye
  • 13:42 urbanecm@deploy1002: Started scap: Backport for Turn on ParserMigration extension everywhere, Quiet ParserMigration notice for 30 days after acknowledgement
  • 13:42 urbanecm@deploy1002: Finished scap: Backport for [itwiki] Create a new 'arbcom' usergroup (T363805) (duration: 20m 09s)
  • 13:39 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7009.magru.wmnet with OS bullseye
  • 13:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1233.eqiad.wmnet
  • 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1222.eqiad.wmnet
  • 13:35 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7009.magru.wmnet with OS bullseye
  • 13:28 urbanecm@deploy1002: superpes and urbanecm: Continuing with sync
  • 13:27 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1222.eqiad.wmnet
  • 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage
  • 13:26 urbanecm@deploy1002: superpes and urbanecm: Backport for [itwiki] Create a new 'arbcom' usergroup (T363805) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1197.eqiad.wmnet
  • 13:26 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 13:26 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 13:25 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.copy (exit_code=99) Will create a clone of db2114.codfw.wmnet onto db1125.eqiad.wmnet
  • 13:25 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db2114.codfw.wmnet onto db1125.eqiad.wmnet
  • 13:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage
  • 13:22 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.copy (exit_code=99) Will create a clone of db2114.codfw.wmnet onto db1125.eqiad.wmnet
  • 13:22 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db2114.codfw.wmnet onto db1125.eqiad.wmnet
  • 13:21 urbanecm@deploy1002: Started scap: Backport for [itwiki] Create a new 'arbcom' usergroup (T363805)
  • 13:21 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.copy (exit_code=99) Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 13:20 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 13:19 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1197.eqiad.wmnet
  • 13:15 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.copy (exit_code=99) Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 13:15 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 13:12 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.copy (exit_code=99) Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 13:12 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 13:11 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.copy (exit_code=99) Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 13:11 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 13:03 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.copy (exit_code=99) Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 13:03 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 13:02 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7009.magru.wmnet with OS bullseye
  • 12:58 moritzm: installing util-linux security updates
  • 12:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.copy (exit_code=0) Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 12:58 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 12:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.copy (exit_code=0) Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 12:58 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db1125.eqiad.wmnet onto db2114.codfw.wmnet
  • 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1188.eqiad.wmnet
  • 12:55 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to bullseye-wikimedia (forward port of latest Java 8 security updates)
  • 12:51 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7003.magru.wmnet with OS bookworm
  • 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7002.magru.wmnet with OS bookworm
  • 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 12:46 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 12:45 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage
  • 12:24 jforrester@deploy1002: Finished deploy [integration/docroot@b88f9e1]: Update VisualEditor links, post-JSDoc (b88f9e1674) (duration: 00m 06s)
  • 12:24 jforrester@deploy1002: Started deploy [integration/docroot@b88f9e1]: Update VisualEditor links, post-JSDoc (b88f9e1674)
  • 12:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage
  • 12:20 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1182.eqiad.wmnet
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1162.eqiad.wmnet
  • 12:09 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:09 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:01 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1162.eqiad.wmnet
  • 11:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7002.magru.wmnet with OS bookworm
  • 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7001.magru.wmnet with OS bookworm
  • 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 11:43 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
  • 11:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 11:37 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P61495 and previous config saved to /var/cache/conftool/dbconfig/20240430-113640-root.json
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P61494 and previous config saved to /var/cache/conftool/dbconfig/20240430-112135-root.json
  • 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2207.codfw.wmnet
  • 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
  • 11:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
  • 11:08 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:07 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:07 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:06 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:06 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P61493 and previous config saved to /var/cache/conftool/dbconfig/20240430-110629-root.json
  • 11:06 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:04 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2207.codfw.wmnet
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2204.codfw.wmnet
  • 10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2204.codfw.wmnet
  • 10:54 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
  • 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2189.codfw.wmnet
  • 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P61492 and previous config saved to /var/cache/conftool/dbconfig/20240430-105124-root.json
  • 10:48 Dreamy_Jazz: Security deploy finished
  • 10:47 logmsgbot: dreamyjazz Deployed security patch for T338419
  • 10:45 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2189.codfw.wmnet
  • 10:43 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7001.magru.wmnet with OS bookworm
  • 10:39 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lists1004.eqiad.wmnet
  • 10:39 aokoth@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:39 aokoth@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aokoth@cumin1002"
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2175.codfw.wmnet
  • 10:38 aokoth@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aokoth@cumin1002"
  • 10:37 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P61491 and previous config saved to /var/cache/conftool/dbconfig/20240430-103618-root.json
  • 10:35 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 10:33 logmsgbot: dreamyjazz Deployed security patch for T338419
  • 10:32 aokoth@cumin1002: START - Cookbook sre.dns.netbox
  • 10:27 aokoth@cumin1002: START - Cookbook sre.hosts.decommission for hosts lists1004.eqiad.wmnet
  • 10:25 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2175.codfw.wmnet
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2148.codfw.wmnet
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P61490 and previous config saved to /var/cache/conftool/dbconfig/20240430-102113-root.json
  • 10:16 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
  • 10:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2148.codfw.wmnet
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2138.codfw.wmnet
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P61489 and previous config saved to /var/cache/conftool/dbconfig/20240430-100607-root.json
  • 10:00 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2138.codfw.wmnet
  • 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2125.codfw.wmnet
  • 09:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T360332)', diff saved to https://phabricator.wikimedia.org/P61488 and previous config saved to /var/cache/conftool/dbconfig/20240430-095745-arnaudb.json
  • 09:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'Push es6 codfw config T355424', diff saved to https://phabricator.wikimedia.org/P61487 and previous config saved to /var/cache/conftool/dbconfig/20240430-095119-marostegui.json
  • 09:47 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2125.codfw.wmnet
  • 09:46 marostegui@cumin1002: dbctl commit (dc=all): 'Push es6 eqiad section T355285', diff saved to https://phabricator.wikimedia.org/P61486 and previous config saved to /var/cache/conftool/dbconfig/20240430-094635-marostegui.json
  • 09:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P61485 and previous config saved to /var/cache/conftool/dbconfig/20240430-094237-arnaudb.json
  • 09:41 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2382.codfw.wmnet
  • 09:39 Dreamy_Jazz: Starting security deploy on tmux session
  • 09:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS bookworm
  • 09:30 marostegui@deploy1002: Finished scap: Backport for etcd.php: Add es6 (T355285 T355424) (duration: 15m 01s)
  • 09:29 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 09:29 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:28 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:28 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2382.codfw.wmnet with reason: Degraded RAID/storage controller issues
  • 09:28 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2382.codfw.wmnet with reason: Degraded RAID/storage controller issues
  • 09:28 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:27 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:27 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:27 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P61484 and previous config saved to /var/cache/conftool/dbconfig/20240430-092729-arnaudb.json
  • 09:26 jayme: draining mw2382.codfw.wmnet - T362938
  • 09:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 100%: Post replag', diff saved to https://phabricator.wikimedia.org/P61483 and previous config saved to /var/cache/conftool/dbconfig/20240430-092230-arnaudb.json
  • 09:18 volans@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2002.codfw.wmnet with reason: Update Netbox dependencies for netbox-next - volans@cumin1002
  • 09:17 marostegui@deploy1002: marostegui: Continuing with sync
  • 09:17 marostegui@deploy1002: marostegui: Backport for etcd.php: Add es6 (T355285 T355424) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T360332)', diff saved to https://phabricator.wikimedia.org/P61482 and previous config saved to /var/cache/conftool/dbconfig/20240430-091556-arnaudb.json
  • 09:15 marostegui@deploy1002: Started scap: Backport for etcd.php: Add es6 (T355285 T355424)
  • 09:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T360332)', diff saved to https://phabricator.wikimedia.org/P61481 and previous config saved to /var/cache/conftool/dbconfig/20240430-091221-arnaudb.json
  • 09:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage
  • 09:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T352010)', diff saved to https://phabricator.wikimedia.org/P61480 and previous config saved to /var/cache/conftool/dbconfig/20240430-091049-ladsgroup.json
  • 09:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 09:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 09:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T360332)', diff saved to https://phabricator.wikimedia.org/P61479 and previous config saved to /var/cache/conftool/dbconfig/20240430-091002-arnaudb.json
  • 09:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 09:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 09:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage
  • 09:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 75%: Post replag', diff saved to https://phabricator.wikimedia.org/P61478 and previous config saved to /var/cache/conftool/dbconfig/20240430-090724-arnaudb.json
  • 09:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P61477 and previous config saved to /var/cache/conftool/dbconfig/20240430-090048-arnaudb.json
  • 08:54 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS bookworm
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P61476 and previous config saved to /var/cache/conftool/dbconfig/20240430-085441-root.json
  • 08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 50%: Post replag', diff saved to https://phabricator.wikimedia.org/P61475 and previous config saved to /var/cache/conftool/dbconfig/20240430-085219-arnaudb.json
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P61474 and previous config saved to /var/cache/conftool/dbconfig/20240430-085129-root.json
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P61473 and previous config saved to /var/cache/conftool/dbconfig/20240430-084926-root.json
  • 08:48 volans@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2002.codfw.wmnet with reason: Update Netbox dependencies for netbox-next - volans@cumin1002
  • 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P61472 and previous config saved to /var/cache/conftool/dbconfig/20240430-084541-arnaudb.json
  • 08:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P61471 and previous config saved to /var/cache/conftool/dbconfig/20240430-083935-root.json
  • 08:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 25%: Post replag', diff saved to https://phabricator.wikimedia.org/P61470 and previous config saved to /var/cache/conftool/dbconfig/20240430-083713-arnaudb.json
  • 08:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P61469 and previous config saved to /var/cache/conftool/dbconfig/20240430-083420-root.json
  • 08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T360332)', diff saved to https://phabricator.wikimedia.org/P61468 and previous config saved to /var/cache/conftool/dbconfig/20240430-083033-arnaudb.json
  • 08:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2121 (T360332)', diff saved to https://phabricator.wikimedia.org/P61467 and previous config saved to /var/cache/conftool/dbconfig/20240430-082753-arnaudb.json
  • 08:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P61466 and previous config saved to /var/cache/conftool/dbconfig/20240430-082430-root.json
  • 08:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 5%: Post replag', diff saved to https://phabricator.wikimedia.org/P61465 and previous config saved to /var/cache/conftool/dbconfig/20240430-082208-arnaudb.json
  • 08:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Removes db2114, repools db2151', diff saved to https://phabricator.wikimedia.org/P61464 and previous config saved to /var/cache/conftool/dbconfig/20240430-082200-arnaudb.json
  • 08:21 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.3 refs T361397
  • 08:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P61463 and previous config saved to /var/cache/conftool/dbconfig/20240430-081915-root.json
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P61462 and previous config saved to /var/cache/conftool/dbconfig/20240430-080924-root.json
  • 08:08 godog: bounce prometheus@k8s in eqiad - T343529
  • 08:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P61461 and previous config saved to /var/cache/conftool/dbconfig/20240430-080409-root.json
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P61460 and previous config saved to /var/cache/conftool/dbconfig/20240430-075418-root.json
  • 07:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P61459 and previous config saved to /var/cache/conftool/dbconfig/20240430-074903-root.json
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P61458 and previous config saved to /var/cache/conftool/dbconfig/20240430-073912-root.json
  • 07:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P61457 and previous config saved to /var/cache/conftool/dbconfig/20240430-073358-root.json
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P61456 and previous config saved to /var/cache/conftool/dbconfig/20240430-072406-root.json
  • 07:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1174.eqiad.wmnet with OS bookworm
  • 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P61455 and previous config saved to /var/cache/conftool/dbconfig/20240430-071852-root.json
  • 07:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 07:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1174.eqiad.wmnet with reason: host reimage
  • 07:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1174.eqiad.wmnet with reason: host reimage
  • 06:48 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS bookworm
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P61454 and previous config saved to /var/cache/conftool/dbconfig/20240430-064720-root.json
  • 05:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1157.eqiad.wmnet with OS bookworm
  • 05:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2129 from API serving', diff saved to https://phabricator.wikimedia.org/P61453 and previous config saved to /var/cache/conftool/dbconfig/20240430-054943-arnaudb.json
  • 05:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 05:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 05:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: host reimage
  • 05:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: host reimage
  • 05:19 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1157.eqiad.wmnet with OS bookworm
  • 05:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1157 T363672', diff saved to https://phabricator.wikimedia.org/P61452 and previous config saved to /var/cache/conftool/dbconfig/20240430-051419-root.json
  • 05:13 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write T363672', diff saved to https://phabricator.wikimedia.org/P61451 and previous config saved to /var/cache/conftool/dbconfig/20240430-051332-root.json
  • 05:13 marostegui@cumin1002: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T363672', diff saved to https://phabricator.wikimedia.org/P61450 and previous config saved to /var/cache/conftool/dbconfig/20240430-051312-root.json
  • 05:12 marostegui: Starting s3 eqiad failover from db1157 to db1223 - T363672
  • 05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 04:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T363672
  • 04:55 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1223 with weight 0 T363672', diff saved to https://phabricator.wikimedia.org/P61449 and previous config saved to /var/cache/conftool/dbconfig/20240430-045541-marostegui.json
  • 04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T363672
  • 04:05 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.3 refs T361397 (duration: 59m 27s)
  • 03:05 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.3 refs T361397
  • 03:03 mwpresync@deploy1002: Pruned MediaWiki: 1.42.0-wmf.26 (duration: 03m 03s)
  • 02:16 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7004.magru.wmnet with OS bullseye
  • 02:16 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 02:15 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 01:55 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7004.magru.wmnet with reason: host reimage
  • 01:53 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7004.magru.wmnet with reason: host reimage
  • 01:24 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS bullseye
  • 00:45 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7004.magru.wmnet with OS bullseye

2024-04-29

  • 23:56 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS bullseye
  • 22:57 eileen: civicrm upgraded from e95e03d9 to 393e1deb
  • 21:19 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs7002']
  • 21:14 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7004.mgmt.magru.wmnet with reboot policy FORCED
  • 21:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7004.mgmt.magru.wmnet with reboot policy FORCED
  • 21:06 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7001.mgmt.magru.wmnet with reboot policy FORCED
  • 21:06 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs7002']
  • 21:06 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs7002']
  • 21:06 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs7002']
  • 21:06 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7002.mgmt.magru.wmnet with reboot policy FORCED
  • 21:06 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7003.mgmt.magru.wmnet with reboot policy FORCED
  • 21:02 cjming: end of UTC late backport window
  • 21:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs7002.mgmt.magru.wmnet with reboot policy FORCED
  • 21:01 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns7001']
  • 20:58 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7003.mgmt.magru.wmnet with reboot policy FORCED
  • 20:58 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7002.mgmt.magru.wmnet with reboot policy FORCED
  • 20:58 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7001.mgmt.magru.wmnet with reboot policy FORCED
  • 20:58 mforns@deploy1002: Finished deploy [airflow-dags/analytics@8c9c32c]: (no justification provided) (duration: 00m 30s)
  • 20:57 mforns@deploy1002: Started deploy [airflow-dags/analytics@8c9c32c]: (no justification provided)
  • 20:54 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns7001']
  • 20:41 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs7003']
  • 20:41 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:39 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 20:39 cjming@deploy1002: Finished scap: Backport for Turn off DiscussionTools A/B test, and enable features on those wikis (T341491) (duration: 16m 06s)
  • 20:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7003.magru.wmnet with OS bullseye
  • 20:37 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 20:35 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs7003']
  • 20:29 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti7004']
  • 20:29 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs7001']
  • 20:28 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns7001']
  • 20:28 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns7001']
  • 20:28 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti7003']
  • 20:28 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti7002']
  • 20:27 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti7001']
  • 20:27 topranks: re-announcing magru prefixes to from EdgeUno
  • 20:27 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns7002']
  • 20:26 cjming@deploy1002: cjming and esanders: Continuing with sync
  • 20:26 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs7002.mgmt.magru.wmnet with reboot policy FORCED
  • 20:26 cjming@deploy1002: cjming and esanders: Backport for Turn off DiscussionTools A/B test, and enable features on those wikis (T341491) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:24 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs7002']
  • 20:24 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs7002']
  • 20:23 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs7001']
  • 20:23 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti7004']
  • 20:23 cjming@deploy1002: Started scap: Backport for Turn off DiscussionTools A/B test, and enable features on those wikis (T341491)
  • 20:23 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
  • 20:22 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti7003']
  • 20:22 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti7002']
  • 20:22 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti7001']
  • 20:22 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs7001.mgmt.magru.wmnet with reboot policy FORCED
  • 20:22 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs7003.mgmt.magru.wmnet with reboot policy FORCED
  • 20:21 cjming@deploy1002: Finished scap: Backport for Shift writes to SUP, 1st batch (T363475) (duration: 16m 17s)
  • 20:20 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns7001']
  • 20:20 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns7001']
  • 20:19 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns7001']
  • 20:19 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns7001']
  • 20:19 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns7002']
  • 20:19 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns7001']
  • 20:19 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns7001']
  • 20:17 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7004.mgmt.magru.wmnet with reboot policy FORCED
  • 20:16 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7001.mgmt.magru.wmnet with reboot policy FORCED
  • 20:16 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7003.mgmt.magru.wmnet with reboot policy FORCED
  • 20:15 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7002.mgmt.magru.wmnet with reboot policy FORCED
  • 20:15 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns7001.mgmt.magru.wmnet with reboot policy FORCED
  • 20:14 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns7002.mgmt.magru.wmnet with reboot policy FORCED
  • 20:09 cjming@deploy1002: cjming and pfischer: Continuing with sync
  • 20:09 cjming@deploy1002: cjming and pfischer: Backport for Shift writes to SUP, 1st batch (T363475) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:09 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7002.magru.wmnet with OS bullseye
  • 20:09 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 20:07 topranks: withdrawing prefixes from EdgeUno transit in magru to test paths via second transit
  • 20:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs7003.mgmt.magru.wmnet with reboot policy FORCED
  • 20:05 cjming@deploy1002: Started scap: Backport for Shift writes to SUP, 1st batch (T363475)
  • 20:05 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs7001.mgmt.magru.wmnet with reboot policy FORCED
  • 20:05 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7004.mgmt.magru.wmnet with reboot policy FORCED
  • 20:04 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7003.mgmt.magru.wmnet with reboot policy FORCED
  • 20:04 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7002.mgmt.magru.wmnet with reboot policy FORCED
  • 20:03 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7001.mgmt.magru.wmnet with reboot policy FORCED
  • 20:03 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7002.mgmt.magru.wmnet with reboot policy FORCED
  • 20:03 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
  • 20:02 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7001.mgmt.magru.wmnet with reboot policy FORCED
  • 20:00 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
  • 19:51 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1002"
  • 19:46 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:46 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru misc hosts - robh@cumin2002"
  • 19:45 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru misc hosts - robh@cumin2002"
  • 19:41 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:33 mforns@deploy1002: Finished deploy [analytics/refinery@1693892] (hadoop-test): Fixes queries for Commons Impact MEtrics dumps TEST [analytics/refinery@1693892a] (duration: 03m 22s)
  • 19:31 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7003.magru.wmnet with OS bullseye
  • 19:31 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7002.magru.wmnet with reason: host reimage
  • 19:30 mforns@deploy1002: Started deploy [analytics/refinery@1693892] (hadoop-test): Fixes queries for Commons Impact MEtrics dumps TEST [analytics/refinery@1693892a]
  • 19:29 mforns@deploy1002: Finished deploy [analytics/refinery@1693892] (thin): Fixes queries for Commons Impact MEtrics dumps THIN [analytics/refinery@1693892a] (duration: 03m 46s)
  • 19:27 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7002.magru.wmnet with reason: host reimage
  • 19:26 mforns@deploy1002: Started deploy [analytics/refinery@1693892] (thin): Fixes queries for Commons Impact MEtrics dumps THIN [analytics/refinery@1693892a]
  • 19:25 mforns@deploy1002: Finished deploy [analytics/refinery@1693892]: Fixes to queries for Commons Impact Metrics dumps [analytics/refinery@1693892a] (duration: 13m 58s)
  • 19:11 mforns@deploy1002: Started deploy [analytics/refinery@1693892]: Fixes to queries for Commons Impact Metrics dumps [analytics/refinery@1693892a]
  • 19:08 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7003.magru.wmnet with OS bullseye
  • 18:58 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye
  • 18:50 swfrench-wmf: running authdns-update on dns1004 for T361835
  • 18:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 18:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 18:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 18:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 18:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T361627)', diff saved to https://phabricator.wikimedia.org/P61447 and previous config saved to /var/cache/conftool/dbconfig/20240429-183903-marostegui.json
  • 18:25 denisse: Manually delete unused TLS certificates for thanos-query as part of the CFSSL migration - T360414
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P61446 and previous config saved to /var/cache/conftool/dbconfig/20240429-182355-marostegui.json
  • 18:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P61445 and previous config saved to /var/cache/conftool/dbconfig/20240429-180848-marostegui.json
  • 18:02 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7016']
  • 18:02 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7014']
  • 18:02 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7015']
  • 18:01 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7013']
  • 18:00 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7012']
  • 17:59 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7011']
  • 17:59 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7010']
  • 17:56 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7016']
  • 17:56 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7015']
  • 17:55 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7014']
  • 17:54 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7013']
  • 17:54 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7012']
  • 17:53 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7003.magru.wmnet with OS bullseye
  • 17:53 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7011']
  • 17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T361627)', diff saved to https://phabricator.wikimedia.org/P61444 and previous config saved to /var/cache/conftool/dbconfig/20240429-175340-marostegui.json
  • 17:53 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7010']
  • 17:49 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7009']
  • 17:49 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7007']
  • 17:49 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7005']
  • 17:49 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7004']
  • 17:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T361627)', diff saved to https://phabricator.wikimedia.org/P61443 and previous config saved to /var/cache/conftool/dbconfig/20240429-174856-marostegui.json
  • 17:48 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7008']
  • 17:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:48 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7006']
  • 17:48 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7003']
  • 17:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T361627)', diff saved to https://phabricator.wikimedia.org/P61442 and previous config saved to /var/cache/conftool/dbconfig/20240429-174834-marostegui.json
  • 17:42 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7009']
  • 17:42 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7008']
  • 17:42 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7007']
  • 17:42 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7006']
  • 17:42 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7005']
  • 17:42 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7004']
  • 17:42 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7003']
  • 17:41 root@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[29-42]*: Move Cassandra to PKI - root@cumin1002
  • 17:40 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7006.mgmt.magru.wmnet with reboot policy FORCED
  • 17:38 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7014.mgmt.magru.wmnet with reboot policy FORCED
  • 17:38 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7016.mgmt.magru.wmnet with reboot policy FORCED
  • 17:37 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7012.mgmt.magru.wmnet with reboot policy FORCED
  • 17:37 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7008.mgmt.magru.wmnet with reboot policy FORCED
  • 17:36 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7004.mgmt.magru.wmnet with reboot policy FORCED
  • 17:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1005.eqiad.wmnet with OS bullseye
  • 17:36 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7010.mgmt.magru.wmnet with reboot policy FORCED
  • 17:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P61441 and previous config saved to /var/cache/conftool/dbconfig/20240429-173326-marostegui.json
  • 17:29 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7006.mgmt.magru.wmnet with reboot policy FORCED
  • 17:28 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:28 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:26 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7016.mgmt.magru.wmnet with reboot policy FORCED
  • 17:26 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7014.mgmt.magru.wmnet with reboot policy FORCED
  • 17:25 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp7006.mgmt.magru.wmnet with reboot policy FORCED
  • 17:25 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7012.mgmt.magru.wmnet with reboot policy FORCED
  • 17:24 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7010.mgmt.magru.wmnet with reboot policy FORCED
  • 17:24 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7008.mgmt.magru.wmnet with reboot policy FORCED
  • 17:24 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7006.mgmt.magru.wmnet with reboot policy FORCED
  • 17:24 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7004.mgmt.magru.wmnet with reboot policy FORCED
  • 17:22 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:22 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack b4 cp hosts - robh@cumin2002"
  • 17:22 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack b4 cp hosts - robh@cumin2002"
  • 17:20 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
  • 17:19 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P61440 and previous config saved to /var/cache/conftool/dbconfig/20240429-171818-marostegui.json
  • 17:17 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7015.mgmt.magru.wmnet with reboot policy FORCED
  • 17:17 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1005.eqiad.wmnet with reason: host reimage
  • 17:15 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7013.mgmt.magru.wmnet with reboot policy FORCED
  • 17:14 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:13 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:13 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7002.magru.wmnet with OS bullseye
  • 17:11 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7011.mgmt.magru.wmnet with reboot policy FORCED
  • 17:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7015.mgmt.magru.wmnet with reboot policy FORCED
  • 17:03 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7013.mgmt.magru.wmnet with reboot policy FORCED
  • 17:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T361627)', diff saved to https://phabricator.wikimedia.org/P61439 and previous config saved to /var/cache/conftool/dbconfig/20240429-170311-marostegui.json
  • 17:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7009.mgmt.magru.wmnet with reboot policy FORCED
  • 16:59 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1005.eqiad.wmnet with OS bullseye
  • 16:58 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7011.mgmt.magru.wmnet with reboot policy FORCED
  • 16:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T361627)', diff saved to https://phabricator.wikimedia.org/P61438 and previous config saved to /var/cache/conftool/dbconfig/20240429-165728-marostegui.json
  • 16:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 16:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7007.mgmt.magru.wmnet with reboot policy FORCED
  • 16:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 16:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T361627)', diff saved to https://phabricator.wikimedia.org/P61437 and previous config saved to /var/cache/conftool/dbconfig/20240429-165705-marostegui.json
  • 16:50 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7009.mgmt.magru.wmnet with reboot policy FORCED
  • 16:49 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7005.mgmt.magru.wmnet with reboot policy FORCED
  • 16:45 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7007.mgmt.magru.wmnet with reboot policy FORCED
  • 16:44 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7003.mgmt.magru.wmnet with reboot policy FORCED
  • 16:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P61436 and previous config saved to /var/cache/conftool/dbconfig/20240429-164158-marostegui.json
  • 16:37 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7005.mgmt.magru.wmnet with reboot policy FORCED
  • 16:32 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7003.mgmt.magru.wmnet with reboot policy FORCED
  • 16:30 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:30 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack b3 cp hosts - robh@cumin2002"
  • 16:29 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye
  • 16:29 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rack b3 cp hosts - robh@cumin2002"
  • 16:27 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P61435 and previous config saved to /var/cache/conftool/dbconfig/20240429-162650-marostegui.json
  • 16:26 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7002']
  • 16:23 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=kubestagemaster2003.codfw.wmnet
  • 16:23 jayme@cumin1002: conftool action : set/weight=10; selector: name=kubestagemaster2003.codfw.wmnet
  • 16:20 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7002']
  • 16:20 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cp7002']
  • 16:19 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7002']
  • 16:19 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp7002']
  • 16:19 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7002']
  • 16:18 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7002.magru.wmnet with OS bullseye
  • 16:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T361627)', diff saved to https://phabricator.wikimedia.org/P61434 and previous config saved to /var/cache/conftool/dbconfig/20240429-161143-marostegui.json
  • 16:10 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 10s)
  • 16:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T361627)', diff saved to https://phabricator.wikimedia.org/P61433 and previous config saved to /var/cache/conftool/dbconfig/20240429-160859-marostegui.json
  • 16:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 16:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 16:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T361627)', diff saved to https://phabricator.wikimedia.org/P61432 and previous config saved to /var/cache/conftool/dbconfig/20240429-160836-marostegui.json
  • 16:06 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye
  • 15:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2114 T363713', diff saved to https://phabricator.wikimedia.org/P61431 and previous config saved to /var/cache/conftool/dbconfig/20240429-155838-arnaudb.json
  • 15:56 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 38s)
  • 15:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2129 to s6 primary T363713', diff saved to https://phabricator.wikimedia.org/P61430 and previous config saved to /var/cache/conftool/dbconfig/20240429-155557-arnaudb.json
  • 15:55 arnaudb: Starting s6 codfw failover from db2114 to db2129 - T363713
  • 15:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P61429 and previous config saved to /var/cache/conftool/dbconfig/20240429-155328-marostegui.json
  • 15:44 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:43 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:39 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P61428 and previous config saved to /var/cache/conftool/dbconfig/20240429-153821-marostegui.json
  • 15:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 15:37 swfrench@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 15:36 swfrench@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:35 swfrench@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:35 root@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[29-42]*: Move Cassandra to PKI - root@cumin1002
  • 15:34 swfrench@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:32 swfrench@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:28 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db2151', diff saved to https://phabricator.wikimedia.org/P61427 and previous config saved to /var/cache/conftool/dbconfig/20240429-152809-arnaudb.json
  • 15:25 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Move to PKI TLS certs - elukey@cumin1002
  • 15:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T361627)', diff saved to https://phabricator.wikimedia.org/P61426 and previous config saved to /var/cache/conftool/dbconfig/20240429-152314-marostegui.json
  • 15:23 robh@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp7002']
  • 15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T361627)', diff saved to https://phabricator.wikimedia.org/P61425 and previous config saved to /var/cache/conftool/dbconfig/20240429-152029-marostegui.json
  • 15:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T361627)', diff saved to https://phabricator.wikimedia.org/P61424 and previous config saved to /var/cache/conftool/dbconfig/20240429-152006-marostegui.json
  • 15:19 robh@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7002']
  • 15:15 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Move to PKI TLS certs - elukey@cumin1002
  • 15:14 robh@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7002.mgmt.magru.wmnet with reboot policy FORCED
  • 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P61423 and previous config saved to /var/cache/conftool/dbconfig/20240429-150459-marostegui.json
  • 15:03 robh@cumin1002: START - Cookbook sre.hosts.provision for host cp7002.mgmt.magru.wmnet with reboot policy FORCED
  • 15:02 robh@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:02 robh@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7002 setup - robh@cumin1002"
  • 15:01 robh@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7002 setup - robh@cumin1002"
  • 14:59 robh@cumin1002: START - Cookbook sre.dns.netbox
  • 14:54 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[23-35]*: Roll out PKI TLS certs - elukey@cumin1002
  • 14:53 arnaudb@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61422 and previous config saved to /var/cache/conftool/dbconfig/20240429-145306-arnaudb.json
  • 14:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T363713
  • 14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2129 with weight 0 T363713', diff saved to https://phabricator.wikimedia.org/P61421 and previous config saved to /var/cache/conftool/dbconfig/20240429-145203-arnaudb.json
  • 14:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T363713
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P61420 and previous config saved to /var/cache/conftool/dbconfig/20240429-144951-marostegui.json
  • 14:41 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2002-dev.codfw.wmnet with OS bookworm
  • 14:38 godog: add 120G to prometheus/k8s in codfw
  • 14:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61418 and previous config saved to /var/cache/conftool/dbconfig/20240429-143800-arnaudb.json
  • 14:37 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 14:36 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 14:36 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 14:35 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 14:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T361627)', diff saved to https://phabricator.wikimedia.org/P61417 and previous config saved to /var/cache/conftool/dbconfig/20240429-143444-marostegui.json
  • 14:32 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1004.eqiad.wmnet with OS bullseye
  • 14:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T361627)', diff saved to https://phabricator.wikimedia.org/P61416 and previous config saved to /var/cache/conftool/dbconfig/20240429-143053-marostegui.json
  • 14:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 14:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 14:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T361627)', diff saved to https://phabricator.wikimedia.org/P61415 and previous config saved to /var/cache/conftool/dbconfig/20240429-143030-marostegui.json
  • 14:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61414 and previous config saved to /var/cache/conftool/dbconfig/20240429-142254-arnaudb.json
  • 14:17 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
  • 14:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P61413 and previous config saved to /var/cache/conftool/dbconfig/20240429-141523-marostegui.json
  • 14:14 reedy@deploy1002: Synchronized php-1.43.0-wmf.2/extensions/TimedMediaHandler/: T363550 (duration: 14m 42s)
  • 14:14 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
  • 14:11 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1004.eqiad.wmnet with reason: host reimage
  • 14:11 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
  • 14:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61412 and previous config saved to /var/cache/conftool/dbconfig/20240429-140748-arnaudb.json
  • 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P61411 and previous config saved to /var/cache/conftool/dbconfig/20240429-140015-marostegui.json
  • 13:53 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1004.eqiad.wmnet with OS bullseye
  • 13:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61410 and previous config saved to /var/cache/conftool/dbconfig/20240429-135241-arnaudb.json
  • 13:50 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2002-dev.codfw.wmnet with OS bookworm
  • 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T361627)', diff saved to https://phabricator.wikimedia.org/P61409 and previous config saved to /var/cache/conftool/dbconfig/20240429-134507-marostegui.json
  • 13:43 dcausse@deploy1002: Finished scap: Backport for Revert "cirrus: Shift autocomplete traffic to codfw" (T363516) (duration: 16m 02s)
  • 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T361627)', diff saved to https://phabricator.wikimedia.org/P61408 and previous config saved to /var/cache/conftool/dbconfig/20240429-134115-marostegui.json
  • 13:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 13:40 sukhe: sudo cumin -b1 -s10 "A:cp-text" "run-puppet-agent --enable 'merging CR 1025357'"
  • 13:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 13:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 13:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61407 and previous config saved to /var/cache/conftool/dbconfig/20240429-133736-arnaudb.json
  • 13:30 dcausse@deploy1002: dcausse: Continuing with sync
  • 13:29 dcausse@deploy1002: dcausse: Backport for Revert "cirrus: Shift autocomplete traffic to codfw" (T363516) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2179.codfw.wmnet with OS bookworm
  • 13:27 dcausse@deploy1002: Started scap: Backport for Revert "cirrus: Shift autocomplete traffic to codfw" (T363516)
  • 13:26 sukhe: sudo cumin "A:cp-text" "disable-puppet 'merging CR 1025357'"
  • 13:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1003.eqiad.wmnet with OS bullseye
  • 13:23 dcausse@deploy1002: Finished scap: Backport for CommonSettings: change jobrunner xff to mw-jobrunner (duration: 16m 10s)
  • 13:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P61406 and previous config saved to /var/cache/conftool/dbconfig/20240429-131728-root.json
  • 13:09 dcausse@deploy1002: hnowlan and dcausse: Continuing with sync
  • 13:09 dcausse@deploy1002: hnowlan and dcausse: Backport for CommonSettings: change jobrunner xff to mw-jobrunner synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:08 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1087.eqiad.wmnet
  • 13:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
  • 13:07 dcausse@deploy1002: Started scap: Backport for CommonSettings: change jobrunner xff to mw-jobrunner
  • 13:06 arnaudb@cumin1002: dbctl commit (dc=all): 'fix weights', diff saved to https://phabricator.wikimedia.org/P61405 and previous config saved to /var/cache/conftool/dbconfig/20240429-130652-arnaudb.json
  • 13:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2179.codfw.wmnet with reason: host reimage
  • 13:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P61403 and previous config saved to /var/cache/conftool/dbconfig/20240429-130222-root.json
  • 13:02 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1087.eqiad.wmnet
  • 12:58 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[23-35]*: Roll out PKI TLS certs - elukey@cumin1002
  • 12:57 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
  • 12:54 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1003.eqiad.wmnet with reason: host reimage
  • 12:53 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2022.codfw.wmnet: Move to PKI TLS certs - elukey@cumin1002
  • 12:49 brouberol@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:48 brouberol@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:47 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2179.codfw.wmnet with OS bookworm
  • 12:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P61402 and previous config saved to /var/cache/conftool/dbconfig/20240429-124716-root.json
  • 12:47 brouberol@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:46 brouberol@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:24 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P61398 and previous config saved to /var/cache/conftool/dbconfig/20240429-122246-root.json
  • 12:18 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P61397 and previous config saved to /var/cache/conftool/dbconfig/20240429-121704-root.json
  • 12:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2140 with weight 0 T363688', diff saved to https://phabricator.wikimedia.org/P61396 and previous config saved to /var/cache/conftool/dbconfig/20240429-121559-arnaudb.json
  • 12:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T363688
  • 12:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s4 T363688
  • 12:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
  • 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P61395 and previous config saved to /var/cache/conftool/dbconfig/20240429-120740-root.json
  • 12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P61394 and previous config saved to /var/cache/conftool/dbconfig/20240429-120159-root.json
  • 12:01 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
  • 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P61393 and previous config saved to /var/cache/conftool/dbconfig/20240429-115234-root.json
  • 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P61392 and previous config saved to /var/cache/conftool/dbconfig/20240429-113850-root.json
  • 11:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P61391 and previous config saved to /var/cache/conftool/dbconfig/20240429-113728-root.json
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P61390 and previous config saved to /var/cache/conftool/dbconfig/20240429-113640-root.json
  • 11:35 marostegui@cumin1002: END (ERROR) - Cookbook sre.mysql.clone (exit_code=97) of db1180.eqiad.wmnet onto es1036.eqiad.wmnet
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T361627)', diff saved to https://phabricator.wikimedia.org/P61389 and previous config saved to /var/cache/conftool/dbconfig/20240429-113445-marostegui.json
  • 11:32 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1002.eqiad.wmnet with OS bullseye
  • 11:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P61388 and previous config saved to /var/cache/conftool/dbconfig/20240429-112223-root.json
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P61387 and previous config saved to /var/cache/conftool/dbconfig/20240429-112134-root.json
  • 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P61386 and previous config saved to /var/cache/conftool/dbconfig/20240429-111938-marostegui.json
  • 11:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P61385 and previous config saved to /var/cache/conftool/dbconfig/20240429-110717-root.json
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P61384 and previous config saved to /var/cache/conftool/dbconfig/20240429-110628-root.json
  • 11:05 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Added kubestagemaster2003 - jayme@cumin1002"
  • 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P61383 and previous config saved to /var/cache/conftool/dbconfig/20240429-110430-marostegui.json
  • 11:03 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Added kubestagemaster2003 - jayme@cumin1002"
  • 10:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1191.eqiad.wmnet with OS bookworm
  • 10:54 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephadm1001.eqiad.wmnet with OS bookworm
  • 10:54 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
  • 10:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P61382 and previous config saved to /var/cache/conftool/dbconfig/20240429-105212-root.json
  • 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P61381 and previous config saved to /var/cache/conftool/dbconfig/20240429-105122-root.json
  • 10:51 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1002.eqiad.wmnet with reason: host reimage
  • 10:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T361627)', diff saved to https://phabricator.wikimedia.org/P61380 and previous config saved to /var/cache/conftool/dbconfig/20240429-104923-marostegui.json
  • 10:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T361627)', diff saved to https://phabricator.wikimedia.org/P61379 and previous config saved to /var/cache/conftool/dbconfig/20240429-104501-marostegui.json
  • 10:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 10:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 10:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 10:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T361627)', diff saved to https://phabricator.wikimedia.org/P61378 and previous config saved to /var/cache/conftool/dbconfig/20240429-104152-marostegui.json
  • 10:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephadm1001.eqiad.wmnet with reason: host reimage
  • 10:37 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephadm1001.eqiad.wmnet with reason: host reimage
  • 10:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P61377 and previous config saved to /var/cache/conftool/dbconfig/20240429-103617-root.json
  • 10:35 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 10:35 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1180.eqiad.wmnet onto es1036.eqiad.wmnet
  • 10:35 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P61376 and previous config saved to /var/cache/conftool/dbconfig/20240429-103436-marostegui.json
  • 10:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
  • 10:33 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1002.eqiad.wmnet with OS bullseye
  • 10:26 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephadm1001.eqiad.wmnet with OS bookworm
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P61375 and previous config saved to /var/cache/conftool/dbconfig/20240429-102644-marostegui.json
  • 10:26 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephadm1001.eqiad.wmnet with OS bullseye
  • 10:25 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P61374 and previous config saved to /var/cache/conftool/dbconfig/20240429-102111-root.json
  • 10:20 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS bookworm
  • 10:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1191 T362745', diff saved to https://phabricator.wikimedia.org/P61373 and previous config saved to /var/cache/conftool/dbconfig/20240429-101908-marostegui.json
  • 10:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P61372 and previous config saved to /var/cache/conftool/dbconfig/20240429-101525-root.json
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P61371 and previous config saved to /var/cache/conftool/dbconfig/20240429-101137-marostegui.json
  • 10:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2121.codfw.wmnet with OS bookworm
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P61370 and previous config saved to /var/cache/conftool/dbconfig/20240429-100605-root.json
  • 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P61369 and previous config saved to /var/cache/conftool/dbconfig/20240429-100018-root.json
  • 09:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T361627)', diff saved to https://phabricator.wikimedia.org/P61368 and previous config saved to /var/cache/conftool/dbconfig/20240429-095629-marostegui.json
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T361627)', diff saved to https://phabricator.wikimedia.org/P61367 and previous config saved to /var/cache/conftool/dbconfig/20240429-095259-marostegui.json
  • 09:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T361627)', diff saved to https://phabricator.wikimedia.org/P61366 and previous config saved to /var/cache/conftool/dbconfig/20240429-095237-marostegui.json
  • 09:49 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 09:48 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 09:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2121.codfw.wmnet with reason: host reimage
  • 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P61365 and previous config saved to /var/cache/conftool/dbconfig/20240429-094512-root.json
  • 09:43 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 09:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2121.codfw.wmnet with reason: host reimage
  • 09:42 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 09:42 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 09:41 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 09:39 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 09:39 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 09:38 ladsgroup@deploy1002: Finished scap: Backport for rdbms: Protect against stale cache in LB::getMaxLag() (T361824) (duration: 20m 15s)
  • 09:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P61364 and previous config saved to /var/cache/conftool/dbconfig/20240429-093729-marostegui.json
  • 09:36 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 09:36 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 09:35 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 09:35 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 09:31 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 09:31 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 09:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P61363 and previous config saved to /var/cache/conftool/dbconfig/20240429-093007-root.json
  • 09:25 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 09:25 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2121.codfw.wmnet with OS bookworm
  • 09:24 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
  • 09:23 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephadm1001.eqiad.wmnet with OS bullseye
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P61362 and previous config saved to /var/cache/conftool/dbconfig/20240429-092222-marostegui.json
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2218 from api', diff saved to https://phabricator.wikimedia.org/P61361 and previous config saved to /var/cache/conftool/dbconfig/20240429-092213-marostegui.json
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2121 T363668', diff saved to https://phabricator.wikimedia.org/P61360 and previous config saved to /var/cache/conftool/dbconfig/20240429-092104-root.json
  • 09:20 ladsgroup@deploy1002: ladsgroup: Backport for rdbms: Protect against stale cache in LB::getMaxLag() (T361824) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2218 to s7 primary T363668', diff saved to https://phabricator.wikimedia.org/P61359 and previous config saved to /var/cache/conftool/dbconfig/20240429-092029-marostegui.json
  • 09:20 marostegui: Starting s7 codfw failover from db2121 to db2218 - T363668
  • 09:18 ladsgroup@deploy1002: Started scap: Backport for rdbms: Protect against stale cache in LB::getMaxLag() (T361824)
  • 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P61358 and previous config saved to /var/cache/conftool/dbconfig/20240429-091500-root.json
  • 09:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T361627)', diff saved to https://phabricator.wikimedia.org/P61357 and previous config saved to /var/cache/conftool/dbconfig/20240429-090701-marostegui.json
  • 09:04 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2003.codfw.wmnet to plain
  • 09:03 jayme@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2003.codfw.wmnet to plain
  • 09:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T361627)', diff saved to https://phabricator.wikimedia.org/P61356 and previous config saved to /var/cache/conftool/dbconfig/20240429-090329-marostegui.json
  • 09:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 09:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 09:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T361627)', diff saved to https://phabricator.wikimedia.org/P61355 and previous config saved to /var/cache/conftool/dbconfig/20240429-090317-marostegui.json
  • 09:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T363668
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2218 with weight 0 T363668', diff saved to https://phabricator.wikimedia.org/P61354 and previous config saved to /var/cache/conftool/dbconfig/20240429-090046-marostegui.json
  • 09:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T363668
  • 08:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P61353 and previous config saved to /var/cache/conftool/dbconfig/20240429-085953-root.json
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P61352 and previous config saved to /var/cache/conftool/dbconfig/20240429-085829-root.json
  • 08:54 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kubestagemaster2003.codfw.wmnet
  • 08:54 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2003.codfw.wmnet with OS bullseye
  • 08:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P61351 and previous config saved to /var/cache/conftool/dbconfig/20240429-084808-marostegui.json
  • 08:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1223.eqiad.wmnet with OS bookworm
  • 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P61350 and previous config saved to /var/cache/conftool/dbconfig/20240429-084447-root.json
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P61349 and previous config saved to /var/cache/conftool/dbconfig/20240429-084323-root.json
  • 08:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
  • 08:37 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
  • 08:33 taavi@deploy1002: Finished scap: Backport for Fix disabling TOTP keys with scratch tokens (T363548) (duration: 15m 27s)
  • 08:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P61348 and previous config saved to /var/cache/conftool/dbconfig/20240429-083301-marostegui.json
  • 08:29 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P61347 and previous config saved to /var/cache/conftool/dbconfig/20240429-082817-root.json
  • 08:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1223.eqiad.wmnet with reason: host reimage
  • 08:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1223.eqiad.wmnet with reason: host reimage
  • 08:21 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2003.codfw.wmnet with OS bullseye
  • 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1023 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P61346 and previous config saved to /var/cache/conftool/dbconfig/20240429-082056-root.json
  • 08:20 taavi@deploy1002: taavi: Continuing with sync
  • 08:20 taavi@deploy1002: taavi: Backport for Fix disabling TOTP keys with scratch tokens (T363548) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:18 taavi@deploy1002: Started scap: Backport for Fix disabling TOTP keys with scratch tokens (T363548)
  • 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T361627)', diff saved to https://phabricator.wikimedia.org/P61345 and previous config saved to /var/cache/conftool/dbconfig/20240429-081754-marostegui.json
  • 08:16 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 08:15 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 08:15 jayme@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster2003.codfw.wmnet on all recursors
  • 08:15 jayme@cumin1002: START - Cookbook sre.dns.wipe-cache kubestagemaster2003.codfw.wmnet on all recursors
  • 08:15 jayme@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:15 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 08:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P61344 and previous config saved to /var/cache/conftool/dbconfig/20240429-081455-root.json
  • 08:14 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T361627)', diff saved to https://phabricator.wikimedia.org/P61343 and previous config saved to /var/cache/conftool/dbconfig/20240429-081323-marostegui.json
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P61342 and previous config saved to /var/cache/conftool/dbconfig/20240429-081312-root.json
  • 08:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 08:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 08:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T361627)', diff saved to https://phabricator.wikimedia.org/P61341 and previous config saved to /var/cache/conftool/dbconfig/20240429-081254-marostegui.json
  • 08:11 jayme@cumin1002: START - Cookbook sre.dns.netbox
  • 08:11 jayme@cumin1002: START - Cookbook sre.ganeti.makevm for new host kubestagemaster2003.codfw.wmnet
  • 08:09 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1223.eqiad.wmnet with OS bookworm
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1223', diff saved to https://phabricator.wikimedia.org/P61340 and previous config saved to /var/cache/conftool/dbconfig/20240429-080710-root.json
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1023 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P61339 and previous config saved to /var/cache/conftool/dbconfig/20240429-080550-root.json
  • 08:04 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 08:04 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 08:04 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 08:04 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 08:04 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 08:04 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 08:00 dcausse: restarting blazegraph on wdqs1019 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 07:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P61338 and previous config saved to /var/cache/conftool/dbconfig/20240429-075949-root.json
  • 07:59 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 07:59 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:59 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 07:58 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 07:58 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:58 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P61337 and previous config saved to /var/cache/conftool/dbconfig/20240429-075806-root.json
  • 07:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P61336 and previous config saved to /var/cache/conftool/dbconfig/20240429-075746-marostegui.json
  • 07:52 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 07:52 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:52 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 07:52 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 07:52 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:52 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1023 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P61335 and previous config saved to /var/cache/conftool/dbconfig/20240429-075045-root.json
  • 07:49 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 07:48 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:48 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 07:48 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 07:48 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:47 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P61334 and previous config saved to /var/cache/conftool/dbconfig/20240429-074444-root.json
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P61333 and previous config saved to /var/cache/conftool/dbconfig/20240429-074301-root.json
  • 07:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P61332 and previous config saved to /var/cache/conftool/dbconfig/20240429-074238-marostegui.json
  • 07:37 marostegui: Drop machinevision tables on commonswiki T362229
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1023 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P61331 and previous config saved to /var/cache/conftool/dbconfig/20240429-073539-root.json
  • 07:35 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 07:34 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:34 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 07:34 marostegui: Drop machinevision tables on testcommonswiki T362229
  • 07:34 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 07:34 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:33 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P61330 and previous config saved to /var/cache/conftool/dbconfig/20240429-072937-root.json
  • 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P61329 and previous config saved to /var/cache/conftool/dbconfig/20240429-072755-root.json
  • 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T361627)', diff saved to https://phabricator.wikimedia.org/P61328 and previous config saved to /var/cache/conftool/dbconfig/20240429-072731-marostegui.json
  • 07:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS bookworm
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T361627)', diff saved to https://phabricator.wikimedia.org/P61327 and previous config saved to /var/cache/conftool/dbconfig/20240429-072404-marostegui.json
  • 07:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 07:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T361627)', diff saved to https://phabricator.wikimedia.org/P61326 and previous config saved to /var/cache/conftool/dbconfig/20240429-072341-marostegui.json
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1023 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P61325 and previous config saved to /var/cache/conftool/dbconfig/20240429-072033-root.json
  • 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P61324 and previous config saved to /var/cache/conftool/dbconfig/20240429-071431-root.json
  • 07:13 slyngs: Upgrade idm.wikimedia.org to Bitu 0.7.0
  • 07:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage
  • 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P61323 and previous config saved to /var/cache/conftool/dbconfig/20240429-070834-marostegui.json
  • 07:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1023 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P61322 and previous config saved to /var/cache/conftool/dbconfig/20240429-070527-root.json
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P61321 and previous config saved to /var/cache/conftool/dbconfig/20240429-065926-root.json
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P61320 and previous config saved to /var/cache/conftool/dbconfig/20240429-065326-marostegui.json
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1023 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P61319 and previous config saved to /var/cache/conftool/dbconfig/20240429-065022-root.json
  • 06:48 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS bookworm
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2159', diff saved to https://phabricator.wikimedia.org/P61318 and previous config saved to /var/cache/conftool/dbconfig/20240429-064717-root.json
  • 06:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS bookworm
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P61317 and previous config saved to /var/cache/conftool/dbconfig/20240429-064420-root.json
  • 06:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1023.eqiad.wmnet with OS bookworm
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T361627)', diff saved to https://phabricator.wikimedia.org/P61316 and previous config saved to /var/cache/conftool/dbconfig/20240429-063819-marostegui.json
  • 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T361627)', diff saved to https://phabricator.wikimedia.org/P61315 and previous config saved to /var/cache/conftool/dbconfig/20240429-063450-marostegui.json
  • 06:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T361627)', diff saved to https://phabricator.wikimedia.org/P61314 and previous config saved to /var/cache/conftool/dbconfig/20240429-063412-marostegui.json
  • 06:24 marostegui: Restart sanitarium instances in eqiad T363276
  • 06:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage
  • 06:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1023.eqiad.wmnet with reason: host reimage
  • 06:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P61313 and previous config saved to /var/cache/conftool/dbconfig/20240429-061905-marostegui.json
  • 06:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1023.eqiad.wmnet with reason: host reimage
  • 06:14 marostegui: Restart sanitarium instances in codfw T363276
  • 06:06 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS bookworm
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1212', diff saved to https://phabricator.wikimedia.org/P61312 and previous config saved to /var/cache/conftool/dbconfig/20240429-060423-root.json
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P61311 and previous config saved to /var/cache/conftool/dbconfig/20240429-060358-marostegui.json
  • 06:02 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1023.eqiad.wmnet with OS bookworm
  • 05:58 marostegui@deploy1002: Finished scap: Backport for Revert "db-production.php: Disable writes on es5" (duration: 14m 47s)
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T361627)', diff saved to https://phabricator.wikimedia.org/P61310 and previous config saved to /var/cache/conftool/dbconfig/20240429-054850-marostegui.json
  • 05:46 marostegui@deploy1002: marostegui: Continuing with sync
  • 05:46 marostegui@deploy1002: marostegui: Backport for Revert "db-production.php: Disable writes on es5" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T361627)', diff saved to https://phabricator.wikimedia.org/P61309 and previous config saved to /var/cache/conftool/dbconfig/20240429-054519-marostegui.json
  • 05:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 05:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P61308 and previous config saved to /var/cache/conftool/dbconfig/20240429-054413-ladsgroup.json
  • 05:43 marostegui@deploy1002: Started scap: Backport for Revert "db-production.php: Disable writes on es5"
  • 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1023 T361548', diff saved to https://phabricator.wikimedia.org/P61306 and previous config saved to /var/cache/conftool/dbconfig/20240429-054158-marostegui.json
  • 05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1024 to es5 primary T361548', diff saved to https://phabricator.wikimedia.org/P61305 and previous config saved to /var/cache/conftool/dbconfig/20240429-054035-marostegui.json
  • 05:40 marostegui: Starting es5 eqiad failover from es1023 to es1024 T361548
  • 05:35 marostegui@deploy1002: Finished scap: Backport for db-production.php: Disable writes on es5 (T361548) (duration: 26m 58s)
  • 05:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P61304 and previous config saved to /var/cache/conftool/dbconfig/20240429-052906-ladsgroup.json
  • 05:23 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1024 with weight 0 T361548', diff saved to https://phabricator.wikimedia.org/P61303 and previous config saved to /var/cache/conftool/dbconfig/20240429-052311-root.json
  • 05:22 marostegui@deploy1002: marostegui: Continuing with sync
  • 05:22 marostegui@deploy1002: marostegui: Backport for db-production.php: Disable writes on es5 (T361548) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 05:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P61302 and previous config saved to /var/cache/conftool/dbconfig/20240429-051359-ladsgroup.json
  • 05:08 marostegui@deploy1002: Started scap: Backport for db-production.php: Disable writes on es5 (T361548)
  • 05:05 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T322187
  • 05:04 root@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T322187
  • 04:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P61301 and previous config saved to /var/cache/conftool/dbconfig/20240429-045851-ladsgroup.json
  • 03:11 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1003.eqiad.wmnet with OS bookworm
  • 02:17 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1003.eqiad.wmnet with reason: host reimage
  • 02:14 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1003.eqiad.wmnet with reason: host reimage
  • 01:42 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS bookworm

2024-04-28

  • 20:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P61300 and previous config saved to /var/cache/conftool/dbconfig/20240428-200522-ladsgroup.json
  • 20:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P61299 and previous config saved to /var/cache/conftool/dbconfig/20240428-200500-ladsgroup.json
  • 19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P61298 and previous config saved to /var/cache/conftool/dbconfig/20240428-194952-ladsgroup.json
  • 19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P61297 and previous config saved to /var/cache/conftool/dbconfig/20240428-193445-ladsgroup.json
  • 19:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P61296 and previous config saved to /var/cache/conftool/dbconfig/20240428-191938-ladsgroup.json
  • 07:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P61295 and previous config saved to /var/cache/conftool/dbconfig/20240428-074511-ladsgroup.json
  • 07:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 07:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 07:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P61294 and previous config saved to /var/cache/conftool/dbconfig/20240428-074448-ladsgroup.json
  • 07:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P61293 and previous config saved to /var/cache/conftool/dbconfig/20240428-073827-ladsgroup.json
  • 07:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P61292 and previous config saved to /var/cache/conftool/dbconfig/20240428-072941-ladsgroup.json
  • 07:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P61291 and previous config saved to /var/cache/conftool/dbconfig/20240428-072320-ladsgroup.json
  • 07:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P61290 and previous config saved to /var/cache/conftool/dbconfig/20240428-071434-ladsgroup.json
  • 07:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P61289 and previous config saved to /var/cache/conftool/dbconfig/20240428-070812-ladsgroup.json
  • 06:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P61288 and previous config saved to /var/cache/conftool/dbconfig/20240428-065927-ladsgroup.json
  • 06:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P61287 and previous config saved to /var/cache/conftool/dbconfig/20240428-065305-ladsgroup.json

2024-04-27

  • 23:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P61286 and previous config saved to /var/cache/conftool/dbconfig/20240427-231136-ladsgroup.json
  • 23:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 23:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P61285 and previous config saved to /var/cache/conftool/dbconfig/20240427-231112-ladsgroup.json
  • 22:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P61284 and previous config saved to /var/cache/conftool/dbconfig/20240427-225604-ladsgroup.json
  • 22:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7001.magru.wmnet with OS bullseye
  • 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P61283 and previous config saved to /var/cache/conftool/dbconfig/20240427-224057-ladsgroup.json
  • 22:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P61282 and previous config saved to /var/cache/conftool/dbconfig/20240427-222548-ladsgroup.json
  • 21:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cp7001.magru.wmnet with OS bullseye
  • 21:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7001']
  • 21:06 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7001']
  • 21:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp7001']
  • 20:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp7001']
  • 20:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7001.mgmt.magru.wmnet with reboot policy FORCED
  • 20:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cp7001.mgmt.magru.wmnet with reboot policy FORCED
  • 20:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P61281 and previous config saved to /var/cache/conftool/dbconfig/20240427-202602-ladsgroup.json
  • 20:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P61280 and previous config saved to /var/cache/conftool/dbconfig/20240427-202539-ladsgroup.json
  • 20:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp7001.mgmt.magru.wmnet with reboot policy FORCED
  • 20:25 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cp7001.mgmt.magru.wmnet with reboot policy FORCED
  • 20:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7001 DNS add - pt1979@cumin2002"
  • 20:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7001 DNS add - pt1979@cumin2002"
  • 20:20 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P61279 and previous config saved to /var/cache/conftool/dbconfig/20240427-201031-ladsgroup.json
  • 19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P61278 and previous config saved to /var/cache/conftool/dbconfig/20240427-195524-ladsgroup.json
  • 19:46 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:46 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for magru PDUs - cmooney@cumin1002"
  • 19:45 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for magru PDUs - cmooney@cumin1002"
  • 19:43 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P61277 and previous config saved to /var/cache/conftool/dbconfig/20240427-194017-ladsgroup.json
  • 19:39 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:39 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for magru PDUs - cmooney@cumin1002"
  • 19:38 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for magru PDUs - cmooney@cumin1002"
  • 19:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 17:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru oob - ayounsi@cumin1002"
  • 17:42 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru oob - ayounsi@cumin1002"
  • 17:39 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 17:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru - ayounsi@cumin1002"
  • 17:03 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru - ayounsi@cumin1002"
  • 17:01 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 14:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P61275 and previous config saved to /var/cache/conftool/dbconfig/20240427-144642-ladsgroup.json
  • 14:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 14:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 09:51 volans: manually upgraded wmflib in netbox1002/2002's Netbox's venv
  • 08:58 volans: restarted uwsgi on netbox1002 to pickup the latest wmflib with magru
  • 07:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P61274 and previous config saved to /var/cache/conftool/dbconfig/20240427-075233-ladsgroup.json
  • 07:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 07:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 07:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P61273 and previous config saved to /var/cache/conftool/dbconfig/20240427-075210-ladsgroup.json
  • 07:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 07:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P61272 and previous config saved to /var/cache/conftool/dbconfig/20240427-074250-ladsgroup.json
  • 07:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P61271 and previous config saved to /var/cache/conftool/dbconfig/20240427-073703-ladsgroup.json
  • 07:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P61270 and previous config saved to /var/cache/conftool/dbconfig/20240427-072742-ladsgroup.json
  • 07:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P61269 and previous config saved to /var/cache/conftool/dbconfig/20240427-072155-ladsgroup.json
  • 07:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P61268 and previous config saved to /var/cache/conftool/dbconfig/20240427-071235-ladsgroup.json
  • 07:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P61267 and previous config saved to /var/cache/conftool/dbconfig/20240427-070648-ladsgroup.json
  • 06:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P61266 and previous config saved to /var/cache/conftool/dbconfig/20240427-065728-ladsgroup.json
  • 00:51 urandom: rebooting puppetserver1001.eqiad.wmnet via drac

2024-04-26

  • 23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P61265 and previous config saved to /var/cache/conftool/dbconfig/20240426-231316-ladsgroup.json
  • 23:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 23:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P61264 and previous config saved to /var/cache/conftool/dbconfig/20240426-231252-ladsgroup.json
  • 22:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P61263 and previous config saved to /var/cache/conftool/dbconfig/20240426-225744-ladsgroup.json
  • 22:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P61262 and previous config saved to /var/cache/conftool/dbconfig/20240426-224235-ladsgroup.json
  • 22:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P61261 and previous config saved to /var/cache/conftool/dbconfig/20240426-222728-ladsgroup.json
  • 22:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lists2001.wikimedia.org with OS bookworm
  • 22:07 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists2001.wikimedia.org with reason: host reimage
  • 22:04 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists2001.wikimedia.org with reason: host reimage
  • 21:43 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host lists2001.wikimedia.org with OS bookworm
  • 21:38 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lists2001.wikimedia.org with OS bullseye
  • 21:38 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin2002"
  • 21:37 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin2002"
  • 21:25 amastilovic@deploy1002: Finished deploy [airflow-dags/analytics@33b39d9]: (no justification provided) (duration: 00m 28s)
  • 21:24 amastilovic@deploy1002: Started deploy [airflow-dags/analytics@33b39d9]: (no justification provided)
  • 21:21 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists2001.wikimedia.org with reason: host reimage
  • 21:18 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists2001.wikimedia.org with reason: host reimage
  • 21:01 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host lists2001.wikimedia.org with OS bullseye
  • 19:11 mutante: LDAP - added linafaridwmde to groups wmde and nda (T362959)
  • 19:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P61260 and previous config saved to /var/cache/conftool/dbconfig/20240426-190909-ladsgroup.json
  • 19:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P61259 and previous config saved to /var/cache/conftool/dbconfig/20240426-190842-ladsgroup.json
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P61258 and previous config saved to /var/cache/conftool/dbconfig/20240426-185335-ladsgroup.json
  • 18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P61257 and previous config saved to /var/cache/conftool/dbconfig/20240426-183827-ladsgroup.json
  • 18:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P61256 and previous config saved to /var/cache/conftool/dbconfig/20240426-182320-ladsgroup.json
  • 17:57 dancy@deploy1002: Finished scap: Testing T325530 (duration: 09m 14s)
  • 17:48 dancy@deploy1002: Started scap: Testing T325530
  • 17:47 dancy@deploy1002: Installation of scap version "4.80.0" completed for 325 hosts
  • 17:47 dancy@deploy1002: Installing scap version "4.80.0" for 325 hosts
  • 17:27 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=elastic110[3-7]\.eqiad\.wmnet
  • 17:15 eoghan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists2001.wikimedia.org with OS bookworm
  • 17:02 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 16:57 bking@cumin2002: conftool action : set/pooled=no; selector: name=elastic1107\.eqiad\.wmnet
  • 16:36 bking@cumin2002: conftool action : set/weight=20:pooled=yes; selector: name=elastic1107\.eqiad\.wmnet
  • 16:35 bking@cumin2002: conftool action : set/pooled=no; selector: name=elastic1106\.eqiad\.wmnet
  • 16:33 bking@cumin2002: conftool action : set/weight=20:pooled=yes; selector: name=elastic1106\.eqiad\.wmnet
  • 16:32 bking@cumin2002: conftool action : set/pooled=no; selector: name=elastic1105\.eqiad\.wmnet
  • 16:30 denisse: Delete the unused Prometheus PoP TLS certificates in the private repository as part of the cergen to CFSSL migration - T360414
  • 16:22 eoghan@cumin1002: START - Cookbook sre.hosts.reimage for host lists2001.wikimedia.org with OS bookworm
  • 16:20 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 16:20 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lists2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:19 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 16:18 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 16:18 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 16:17 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 16:17 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 16:16 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 16:06 vgutierrez: repool ncredir6001
  • 15:56 bking@cumin2002: conftool action : set/pooled=no; selector: name=elastic1104\.eqiad\.wmnet
  • 15:56 bking@cumin2002: conftool action : set/pooled=no; selector: name=elastic1103\.eqiad\.wmnet
  • 15:55 vgutierrez: depool ncredir6001
  • 15:53 eoghan@cumin1002: START - Cookbook sre.hosts.provision for host lists2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:46 denisse: Enabling Puppet on the Prometheus PoP hosts as part of the cergen to CFSSL migration - T360414
  • 15:43 bking@cumin2002: conftool action : set/weight=20:pooled=yes; selector: name=elastic1105\.eqiad\.wmnet
  • 15:36 eoghan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 eoghan@cumin1002: START - Cookbook sre.dns.netbox
  • 15:31 bking@cumin2002: conftool action : set/weight=20:pooled=yes; selector: name=elastic1104\.eqiad\.wmnet
  • 15:29 denisse: testing patch #1023917 on prometheus6002 - T360414
  • 15:28 denisse: testing patch #1023917 on prometheus6002
  • 15:26 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubestagemaster2003.codfw.wmnet
  • 15:26 jayme@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:25 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubestagemaster2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin1002"
  • 15:25 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on prometheus6002.drmrs.wmnet,prometheus5002.eqsin.wmnet,prometheus3003.esams.wmnet,prometheus4002.ulsfo.wmnet with reason: Downtiming the Prometheus PoP hosts part of the cergen to CFSSL migration - T360414
  • 15:25 denisse: Disabling Puppet on the Prometheus PoP hosts as part of the cergen to CFSSL migration - T360414
  • 15:24 denisse@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on prometheus6002.drmrs.wmnet,prometheus5002.eqsin.wmnet,prometheus3003.esams.wmnet,prometheus4002.ulsfo.wmnet with reason: Downtiming the Prometheus PoP hosts part of the cergen to CFSSL migration - T360414
  • 15:23 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubestagemaster2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin1002"
  • 15:22 denisse: Downtiming the Prometheus PoP hosts as part of the cergen to CFSSL migration - T360414
  • 15:20 cmooney@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lists2001
  • 15:19 cmooney@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host lists2001
  • 15:19 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:14 bking@cumin2002: conftool action : set/weight=20:pooled=yes; selector: name=elastic1103\.eqiad\.wmnet
  • 15:12 jayme@cumin1002: START - Cookbook sre.dns.netbox
  • 15:07 jayme@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubestagemaster2003.codfw.wmnet
  • 14:48 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2021.codfw.wmnet: Move to PKI TLS certs - elukey@cumin1002
  • 14:38 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2021.codfw.wmnet: Move to PKI TLS certs - elukey@cumin1002
  • 14:15 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephadm1001.eqiad.wmnet with OS bookworm
  • 14:10 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1014.eqiad.wmnet
  • 14:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephadm1001.eqiad.wmnet with reason: host reimage
  • 14:02 eevans@cumin1002: START - Cookbook sre.hosts.reboot-single for host aqs1014.eqiad.wmnet
  • 13:57 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephadm1001.eqiad.wmnet with reason: host reimage
  • 13:48 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2003.codfw.wmnet to plain
  • 13:47 jayme@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2003.codfw.wmnet to plain
  • 13:45 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephadm1001.eqiad.wmnet with OS bookworm
  • 13:28 akosiaris@cumin1002: conftool action : set/pooled=no; selector: name=elastic110[3-7]\.eqiad\.wmnet
  • 13:28 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lists2001.codfw.wmnet
  • 13:28 eoghan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:27 eoghan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1002"
  • 13:25 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:23 eoghan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1002"
  • 13:21 eoghan@cumin1002: START - Cookbook sre.dns.netbox
  • 13:14 eoghan@cumin1002: START - Cookbook sre.hosts.decommission for hosts lists2001.codfw.wmnet
  • 12:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:27 btullis@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host cephadm1001.eqiad.wmnet
  • 12:26 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cephadm1001.eqiad.wmnet with OS bookworm
  • 12:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P61251 and previous config saved to /var/cache/conftool/dbconfig/20240426-121951-ladsgroup.json
  • 12:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P61250 and previous config saved to /var/cache/conftool/dbconfig/20240426-121939-ladsgroup.json
  • 12:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P61249 and previous config saved to /var/cache/conftool/dbconfig/20240426-120431-ladsgroup.json
  • 11:53 claime: Silencing all alerts matching parse1002.* for 4 days - T363086
  • 11:53 moritzm: uploaded debdeploy 0.0.99.14 to apt.wikimedia.org (for buster/bullseye/bookworm)
  • 11:50 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kubestagemaster2003.codfw.wmnet
  • 11:50 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2003.codfw.wmnet with OS bullseye
  • 11:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P61248 and previous config saved to /var/cache/conftool/dbconfig/20240426-114923-ladsgroup.json
  • 11:43 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephadm1001.eqiad.wmnet with OS bookworm
  • 11:43 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cephadm1001.eqiad.wmnet - btullis@cumin1002"
  • 11:43 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cephadm1001.eqiad.wmnet - btullis@cumin1002"
  • 11:43 btullis@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cephadm1001.eqiad.wmnet on all recursors
  • 11:42 btullis@cumin1002: START - Cookbook sre.dns.wipe-cache cephadm1001.eqiad.wmnet on all recursors
  • 11:42 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:42 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cephadm1001.eqiad.wmnet - btullis@cumin1002"
  • 11:39 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cephadm1001.eqiad.wmnet - btullis@cumin1002"
  • 11:36 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 11:36 btullis@cumin1002: START - Cookbook sre.ganeti.makevm for new host cephadm1001.eqiad.wmnet
  • 11:35 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
  • 11:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P61247 and previous config saved to /var/cache/conftool/dbconfig/20240426-113416-ladsgroup.json
  • 11:33 claime: Forcing puppet run on O:alerting_host - T363086
  • 11:32 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
  • 11:29 claime: Forcing puppet run on deploy server - T363086
  • 11:28 claime: Deactivating puppet for parse1002 - T363086
  • 11:19 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2003.codfw.wmnet with OS bullseye
  • 11:19 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 11:19 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
  • 11:18 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 11:18 jayme@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster2003.codfw.wmnet on all recursors
  • 11:17 jayme@cumin1002: START - Cookbook sre.dns.wipe-cache kubestagemaster2003.codfw.wmnet on all recursors
  • 11:17 jayme@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:16 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 11:15 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 11:13 jayme@cumin1002: START - Cookbook sre.dns.netbox
  • 11:13 jayme@cumin1002: START - Cookbook sre.ganeti.makevm for new host kubestagemaster2003.codfw.wmnet
  • 11:12 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubestagemaster2003.codfw.wmnet
  • 11:11 jayme@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:11 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubestagemaster2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin1002"
  • 11:10 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
  • 11:10 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubestagemaster2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin1002"
  • 11:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 10:59 jayme@cumin1002: START - Cookbook sre.dns.netbox
  • 10:55 jayme@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubestagemaster2003.codfw.wmnet
  • 10:54 jayme@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubestagemaster2003.codfw.wmnet with OS bullseye
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 10:46 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 10:08 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2003.codfw.wmnet with OS bullseye
  • 10:07 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2003.codfw.wmnet to plain
  • 10:06 jayme@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2003.codfw.wmnet to plain
  • 10:04 jayme@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=93) for new host kubestagemaster2003.codfw.wmnet
  • 10:03 jayme@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster2003.codfw.wmnet on all recursors
  • 10:02 jayme@cumin1002: START - Cookbook sre.dns.wipe-cache kubestagemaster2003.codfw.wmnet on all recursors
  • 10:00 jayme@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:58 jayme@cumin1002: START - Cookbook sre.dns.netbox
  • 09:57 jayme@cumin1002: START - Cookbook sre.ganeti.makevm for new host kubestagemaster2003.codfw.wmnet
  • 09:54 dcausse@deploy1002: Finished scap: Backport for cirrus: Shift autocomplete traffic to codfw (T363516) (duration: 17m 57s)
  • 09:51 joal@deploy1002: Finished deploy [airflow-dags/analytics@e57ae00]: Deploy of Analytics airflow dags for browser-metrics [airflow-dags/analytics@e57ae006] (duration: 00m 27s)
  • 09:50 joal@deploy1002: Started deploy [airflow-dags/analytics@e57ae00]: Deploy of Analytics airflow dags for browser-metrics [airflow-dags/analytics@e57ae006]
  • 09:47 jayme: repooled mw2391.codfw.wmnet
  • 09:42 dcausse@deploy1002: dcausse and ebernhardson: Continuing with sync
  • 09:41 dcausse@deploy1002: dcausse and ebernhardson: Backport for cirrus: Shift autocomplete traffic to codfw (T363516) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:40 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster2003.codfw.wmnet to plain
  • 09:40 jayme@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster2003.codfw.wmnet to plain
  • 09:36 dcausse@deploy1002: Started scap: Backport for cirrus: Shift autocomplete traffic to codfw (T363516)
  • 08:57 hashar: Restarted Gerrit
  • 08:41 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 08:34 hashar: Restarted Gerrit replica
  • 08:34 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kubestagemaster2003.codfw.wmnet
  • 08:33 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2003.codfw.wmnet with OS bullseye
  • 08:18 jayme: depooled mw2391.codfw.wmnet for etcd benchmark
  • 07:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61246 and previous config saved to /var/cache/conftool/dbconfig/20240426-075748-arnaudb.json
  • 07:48 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
  • 07:45 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
  • 07:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61245 and previous config saved to /var/cache/conftool/dbconfig/20240426-074243-arnaudb.json
  • 07:30 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2003.codfw.wmnet with OS bullseye
  • 07:29 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 07:29 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 07:29 jayme@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster2003.codfw.wmnet on all recursors
  • 07:28 jayme@cumin1002: START - Cookbook sre.dns.wipe-cache kubestagemaster2003.codfw.wmnet on all recursors
  • 07:28 jayme@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:28 jayme@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 07:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61244 and previous config saved to /var/cache/conftool/dbconfig/20240426-072737-arnaudb.json
  • 07:21 jayme@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2003.codfw.wmnet - jayme@cumin1002"
  • 07:19 jayme@cumin1002: START - Cookbook sre.dns.netbox
  • 07:18 jayme@cumin1002: START - Cookbook sre.ganeti.makevm for new host kubestagemaster2003.codfw.wmnet
  • 07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61243 and previous config saved to /var/cache/conftool/dbconfig/20240426-071231-arnaudb.json
  • 07:08 hashar: Restarting CI Jenkins
  • 07:05 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 07:05 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 07:03 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 07:02 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 07:01 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 07:01 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 06:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61242 and previous config saved to /var/cache/conftool/dbconfig/20240426-065726-arnaudb.json
  • 06:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61241 and previous config saved to /var/cache/conftool/dbconfig/20240426-064220-arnaudb.json
  • 06:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P61240 and previous config saved to /var/cache/conftool/dbconfig/20240426-062340-ladsgroup.json
  • 06:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 06:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 06:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P61239 and previous config saved to /var/cache/conftool/dbconfig/20240426-062317-ladsgroup.json
  • 06:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P61238 and previous config saved to /var/cache/conftool/dbconfig/20240426-060810-ladsgroup.json
  • 05:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P61237 and previous config saved to /var/cache/conftool/dbconfig/20240426-055303-ladsgroup.json
  • 05:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P61236 and previous config saved to /var/cache/conftool/dbconfig/20240426-053756-ladsgroup.json
  • 01:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P61235 and previous config saved to /var/cache/conftool/dbconfig/20240426-015212-ladsgroup.json
  • 01:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 01:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P61234 and previous config saved to /var/cache/conftool/dbconfig/20240426-015149-ladsgroup.json
  • 01:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P61233 and previous config saved to /var/cache/conftool/dbconfig/20240426-013642-ladsgroup.json
  • 01:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P61232 and previous config saved to /var/cache/conftool/dbconfig/20240426-012135-ladsgroup.json
  • 01:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P61231 and previous config saved to /var/cache/conftool/dbconfig/20240426-010628-ladsgroup.json

2024-04-25

  • 23:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T352010)', diff saved to https://phabricator.wikimedia.org/P61230 and previous config saved to /var/cache/conftool/dbconfig/20240425-231201-ladsgroup.json
  • 22:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P61229 and previous config saved to /var/cache/conftool/dbconfig/20240425-225654-ladsgroup.json
  • 22:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P61228 and previous config saved to /var/cache/conftool/dbconfig/20240425-224146-ladsgroup.json
  • 22:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T352010)', diff saved to https://phabricator.wikimedia.org/P61227 and previous config saved to /var/cache/conftool/dbconfig/20240425-222638-ladsgroup.json
  • 22:23 brett: Extend prometheus1005 and prometheus1006 logical volume by an extra 60G due to disk filling up
  • 19:33 ebernhardson: T363516 started manual rebuild of enwiki titlesuggest indices in eqiad
  • 19:12 dancy@deploy1002: Finished scap: Testing (duration: 08m 47s)
  • 19:03 dancy@deploy1002: Started scap: Testing
  • 18:23 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.2 refs T361396
  • 18:22 logmsgbot: nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@0e9fd9a]: (no justification provided) (duration: 00m 07s)
  • 18:22 logmsgbot: nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@0e9fd9a]: (no justification provided)
  • 18:08 brennen: train 1.43.0-wmf.2 (T361396) status: no current blockers, rolling to group2
  • 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P61223 and previous config saved to /var/cache/conftool/dbconfig/20240425-175802-ladsgroup.json
  • 17:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P61222 and previous config saved to /var/cache/conftool/dbconfig/20240425-175739-ladsgroup.json
  • 17:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P61221 and previous config saved to /var/cache/conftool/dbconfig/20240425-174233-ladsgroup.json
  • 17:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P61220 and previous config saved to /var/cache/conftool/dbconfig/20240425-172725-ladsgroup.json
  • 17:16 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd1001.eqiad.wmnet with OS bullseye
  • 17:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T352010)', diff saved to https://phabricator.wikimedia.org/P61219 and previous config saved to /var/cache/conftool/dbconfig/20240425-171329-ladsgroup.json
  • 17:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 17:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P61218 and previous config saved to /var/cache/conftool/dbconfig/20240425-171218-ladsgroup.json
  • 16:34 mutante: releases1003 - docker and containerd restarted by manually starting wmf_auto_restart services
  • 15:38 dancy@deploy1002: Finished scap: Testing (duration: 08m 44s)
  • 15:34 mforns@deploy1002: Finished deploy [airflow-dags/analytics@b17acd0]: (no justification provided) (duration: 00m 27s)
  • 15:33 mforns@deploy1002: Started deploy [airflow-dags/analytics@b17acd0]: (no justification provided)
  • 15:29 dancy@deploy1002: Started scap: Testing
  • 15:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2155.codfw.wmnet with OS bookworm
  • 15:27 dancy@deploy1002: sync-world aborted: Testing (duration: 01m 33s)
  • 15:26 klausman@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Java 11 security updates - klausman@cumin1002
  • 15:25 dancy@deploy1002: Started scap: Testing
  • 15:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P61216 and previous config saved to /var/cache/conftool/dbconfig/20240425-151120-ladsgroup.json
  • 15:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P61215 and previous config saved to /var/cache/conftool/dbconfig/20240425-151041-ladsgroup.json
  • 15:07 klausman@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Java 11 security updates - klausman@cumin1002
  • 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2155.codfw.wmnet with reason: host reimage
  • 15:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2155.codfw.wmnet with reason: host reimage
  • 14:59 klausman@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Java 11 security updates - klausman@cumin1002
  • 14:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P61214 and previous config saved to /var/cache/conftool/dbconfig/20240425-145534-ladsgroup.json
  • 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on db2187.codfw.wmnet with reason: Host has hardware issues
  • 14:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 20:00:00 on db2187.codfw.wmnet with reason: Host has hardware issues
  • 14:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1234.eqiad.wmnet with reason: Host has hardware issues
  • 14:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1234.eqiad.wmnet with reason: Host has hardware issues
  • 14:41 klausman@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Java 11 security updates - klausman@cumin1002
  • 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P61213 and previous config saved to /var/cache/conftool/dbconfig/20240425-144027-ladsgroup.json
  • 14:29 moritzm: installing Java 11 security updates
  • 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P61212 and previous config saved to /var/cache/conftool/dbconfig/20240425-142520-ladsgroup.json
  • 14:21 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bookworm
  • 14:15 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host db2155.codfw.wmnet with OS bookworm
  • 14:10 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1005.eqiad.wmnet with OS bullseye
  • 14:10 root@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - root@cumin1002"
  • 13:47 claime: UTC afternoon backports window closed
  • 13:45 root@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - root@cumin1002"
  • 13:44 cgoubert@deploy1002: Finished scap: Backport for Set conflicting gadget settings for the Cite extension (T362771) (duration: 21m 33s)
  • 13:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2155.codfw.wmnet with reason: host reimage
  • 13:26 cgoubert@deploy1002: cgoubert and wmde-fisch: Continuing with sync
  • 13:26 cgoubert@deploy1002: cgoubert and wmde-fisch: Backport for Set conflicting gadget settings for the Cite extension (T362771) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2155.codfw.wmnet with reason: host reimage
  • 13:23 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1005.eqiad.wmnet with reason: host reimage
  • 13:23 cgoubert@deploy1002: Started scap: Backport for Set conflicting gadget settings for the Cite extension (T362771)
  • 13:20 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1005.eqiad.wmnet with reason: host reimage
  • 13:19 cgoubert@deploy1002: Finished scap: Backport for ClusterConfigTest: Add mw-on-k8s specific tests (duration: 14m 54s)
  • 13:09 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bookworm
  • 13:08 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2155.codfw.wmnet with OS bullseye
  • 13:07 cgoubert@deploy1002: cgoubert: Continuing with sync
  • 13:07 cgoubert@deploy1002: cgoubert: Backport for ClusterConfigTest: Add mw-on-k8s specific tests synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bullseye
  • 13:04 cgoubert@deploy1002: Started scap: Backport for ClusterConfigTest: Add mw-on-k8s specific tests
  • 12:44 arnaudb@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2155.codfw.wmnet with OS bookworm
  • 12:05 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2155.codfw.wmnet with OS bookworm
  • 12:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2155', diff saved to https://phabricator.wikimedia.org/P61211 and previous config saved to /var/cache/conftool/dbconfig/20240425-120409-arnaudb.json
  • 12:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2155,2187].codfw.wmnet with reason: T362746
  • 12:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2155,2187].codfw.wmnet with reason: T362746
  • 12:02 root@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1005.eqiad.wmnet with OS bullseye
  • 11:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:37 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:37 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:36 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:17 root@cumin1002: START - Cookbook sre.hosts.reimage for host backup1005.eqiad.wmnet with OS bullseye
  • 11:15 root@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1005.eqiad.wmnet with OS bookworm
  • 11:10 root@cumin1002: START - Cookbook sre.hosts.reimage for host backup1005.eqiad.wmnet with OS bookworm
  • 10:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61208 and previous config saved to /var/cache/conftool/dbconfig/20240425-103802-arnaudb.json
  • 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61207 and previous config saved to /var/cache/conftool/dbconfig/20240425-102255-arnaudb.json
  • 10:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61206 and previous config saved to /var/cache/conftool/dbconfig/20240425-100748-arnaudb.json
  • 09:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1241 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61205 and previous config saved to /var/cache/conftool/dbconfig/20240425-095459-arnaudb.json
  • 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61204 and previous config saved to /var/cache/conftool/dbconfig/20240425-095242-arnaudb.json
  • 09:39 arnaudb@cumin1002: dbctl commit (dc=all): 'db1241 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61203 and previous config saved to /var/cache/conftool/dbconfig/20240425-093954-arnaudb.json
  • 09:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61202 and previous config saved to /var/cache/conftool/dbconfig/20240425-093735-arnaudb.json
  • 09:36 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 09:29 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 09:29 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
  • 09:24 arnaudb@cumin1002: dbctl commit (dc=all): 'db1241 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61201 and previous config saved to /var/cache/conftool/dbconfig/20240425-092448-arnaudb.json
  • 09:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 09:22 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
  • 09:22 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org
  • 09:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61200 and previous config saved to /var/cache/conftool/dbconfig/20240425-092229-arnaudb.json
  • 09:21 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd1001.eqiad.wmnet with reason: host reimage
  • 09:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1160.eqiad.wmnet with OS bookworm
  • 09:17 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:cloudelastic
  • 09:16 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
  • 09:16 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:15 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 09:15 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 09:13 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:cloudelastic
  • 09:09 arnaudb@cumin1002: dbctl commit (dc=all): 'db1241 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61199 and previous config saved to /var/cache/conftool/dbconfig/20240425-090942-arnaudb.json
  • 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-eqiad
  • 09:06 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:06 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-eqiad
  • 09:05 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 09:04 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 09:04 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 09:02 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host cephosd1001.eqiad.wmnet with OS bullseye
  • 09:01 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:58 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 08:57 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 08:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage
  • 08:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db1241 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61198 and previous config saved to /var/cache/conftool/dbconfig/20240425-085437-arnaudb.json
  • 08:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage
  • 08:54 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 08:54 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 08:51 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 08:50 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-codfw
  • 08:47 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-codfw
  • 08:42 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1160.eqiad.wmnet with OS bookworm
  • 08:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: T362746
  • 08:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: T362746
  • 08:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P61197 and previous config saved to /var/cache/conftool/dbconfig/20240425-083956-arnaudb.json
  • 08:39 arnaudb@cumin1002: dbctl commit (dc=all): 'db1241 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61196 and previous config saved to /var/cache/conftool/dbconfig/20240425-083931-arnaudb.json
  • 08:38 jelto@cumin1002: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 08:34 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 08:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 08:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1241.eqiad.wmnet with OS bookworm
  • 08:23 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 08:23 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 08:23 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:22 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:22 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 08:21 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 08:20 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 08:19 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 08:11 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 08:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1241.eqiad.wmnet with reason: host reimage
  • 08:02 hashar@deploy1002: Finished scap: Backport for logging: do not explicitly set blackhole handler (T228838) (duration: 16m 17s)
  • 08:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1241.eqiad.wmnet with reason: host reimage
  • 07:59 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 07:56 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 07:50 hashar@deploy1002: hashar: Continuing with sync
  • 07:48 hashar@deploy1002: hashar: Backport for logging: do not explicitly set blackhole handler (T228838) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:47 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1241.eqiad.wmnet with OS bookworm
  • 07:45 hashar@deploy1002: Started scap: Backport for logging: do not explicitly set blackhole handler (T228838)
  • 07:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1241', diff saved to https://phabricator.wikimedia.org/P61195 and previous config saved to /var/cache/conftool/dbconfig/20240425-074516-arnaudb.json
  • 07:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1241.eqiad.wmnet with reason: T362746
  • 07:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1241.eqiad.wmnet with reason: T362746
  • 07:38 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 07:33 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 07:15 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:08 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 06:58 moritzm: installing glibc security updates
  • 06:34 moritzm: uninstalling redis on netbox hosts, it uses the central Redis servers for a while now
  • 05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P61194 and previous config saved to /var/cache/conftool/dbconfig/20240425-055431-ladsgroup.json
  • 05:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P61193 and previous config saved to /var/cache/conftool/dbconfig/20240425-055408-ladsgroup.json
  • 05:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P61192 and previous config saved to /var/cache/conftool/dbconfig/20240425-053901-ladsgroup.json
  • 05:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P61191 and previous config saved to /var/cache/conftool/dbconfig/20240425-053608-ladsgroup.json
  • 05:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 05:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 05:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P61190 and previous config saved to /var/cache/conftool/dbconfig/20240425-053545-ladsgroup.json
  • 05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P61189 and previous config saved to /var/cache/conftool/dbconfig/20240425-052354-ladsgroup.json
  • 05:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P61188 and previous config saved to /var/cache/conftool/dbconfig/20240425-052038-ladsgroup.json
  • 05:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P61187 and previous config saved to /var/cache/conftool/dbconfig/20240425-050845-ladsgroup.json
  • 05:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P61186 and previous config saved to /var/cache/conftool/dbconfig/20240425-050531-ladsgroup.json
  • 04:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P61185 and previous config saved to /var/cache/conftool/dbconfig/20240425-045023-ladsgroup.json

2024-04-24

  • 21:52 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release T363349
  • 21:36 ryankemper: [Elastic] T361268 Pooled new hosts: `elastic110[3-7]`
  • 21:35 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic110[3-7]\.eqiad\.wmnet
  • 20:38 denisse: Disabling Puppet on the Prometheus PoP hosts as part of the cergen to CFSSL migration - T360414
  • 20:38 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on prometheus6002.drmrs.wmnet,prometheus5002.eqsin.wmnet,prometheus3003.esams.wmnet,prometheus4002.ulsfo.wmnet with reason: Downtiming the Prometheus PoP hosts as part of the cergen to CFSSL migration - T360414
  • 20:37 denisse@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on prometheus6002.drmrs.wmnet,prometheus5002.eqsin.wmnet,prometheus3003.esams.wmnet,prometheus4002.ulsfo.wmnet with reason: Downtiming the Prometheus PoP hosts as part of the cergen to CFSSL migration - T360414
  • 20:37 denisse: Downtiming the Prometheus PoP hosts as part of the cergen to CFSSL migration - T360414
  • 20:24 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:24 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:32 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release T363349
  • 19:27 cstone: payments-wiki upgraded from 1895e43b to c7ab847d
  • 19:15 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:15 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:14 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Apply truststore changes — T352647 - eevans@cumin1002
  • 19:08 inflatador: bking@deploy1002 stop `consumer-cloudelastic` release to test alerting T359213
  • 19:07 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:06 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P61181 and previous config saved to /var/cache/conftool/dbconfig/20240424-190237-ladsgroup.json
  • 19:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P61180 and previous config saved to /var/cache/conftool/dbconfig/20240424-190214-ladsgroup.json
  • 18:57 amastilovic@deploy1002: Finished deploy [airflow-dags/analytics@3f994d5]: (no justification provided) (duration: 00m 28s)
  • 18:57 amastilovic@deploy1002: Started deploy [airflow-dags/analytics@3f994d5]: (no justification provided)
  • 18:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P61179 and previous config saved to /var/cache/conftool/dbconfig/20240424-184707-ladsgroup.json
  • 18:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P61178 and previous config saved to /var/cache/conftool/dbconfig/20240424-183200-ladsgroup.json
  • 18:20 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.2 refs T361396
  • 18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P61177 and previous config saved to /var/cache/conftool/dbconfig/20240424-181653-ladsgroup.json
  • 18:14 cstone: payments-wiki upgraded from fb0367a4 to 1895e43b
  • 18:03 brennen: train 1.43.0-wmf.2 (T361396) status: no current blockers, rolling to group1
  • 17:41 btullis@cumin1002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 17:20 btullis@cumin1002: START - Cookbook sre.wdqs.restart
  • 17:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P61176 and previous config saved to /var/cache/conftool/dbconfig/20240424-170444-ladsgroup.json
  • 17:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 17:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 17:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P61175 and previous config saved to /var/cache/conftool/dbconfig/20240424-170421-ladsgroup.json
  • 17:03 hmonroy@deploy1002: Finished scap: Backport for [hewiki] enable CodeMirrorV6 and CodeMirrorLineNumberingNamespaces (T357795 T347211) (duration: 20m 36s)
  • 16:51 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply truststore changes — T352647 - eevans@cumin1002
  • 16:50 hmonroy@deploy1002: musikanimal and hmonroy: Continuing with sync
  • 16:49 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1010.eqiad.wmnet
  • 16:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P61174 and previous config saved to /var/cache/conftool/dbconfig/20240424-164914-ladsgroup.json
  • 16:45 hmonroy@deploy1002: musikanimal and hmonroy: Backport for [hewiki] enable CodeMirrorV6 and CodeMirrorLineNumberingNamespaces (T357795 T347211) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:43 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Deploy new TLS Truststore for PKI - elukey@cumin1002
  • 16:42 hmonroy@deploy1002: Started scap: Backport for [hewiki] enable CodeMirrorV6 and CodeMirrorLineNumberingNamespaces (T357795 T347211)
  • 16:42 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host stat1010.eqiad.wmnet
  • 16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P61173 and previous config saved to /var/cache/conftool/dbconfig/20240424-163407-ladsgroup.json
  • 16:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P61172 and previous config saved to /var/cache/conftool/dbconfig/20240424-161859-ladsgroup.json
  • 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61171 and previous config saved to /var/cache/conftool/dbconfig/20240424-161112-arnaudb.json
  • 16:05 sukhe: running authdns-update
  • 16:04 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 16:01 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:59 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:58 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns6002.wikimedia.org
  • 15:56 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns6002.wikimedia.org
  • 15:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts aqs1014.eqiad.wmnet
  • 15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61170 and previous config saved to /var/cache/conftool/dbconfig/20240424-155607-arnaudb.json
  • 15:55 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts aqs1014.eqiad.wmnet
  • 15:55 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns5004.wikimedia.org
  • 15:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts aqs1014.eqiad.wmnet
  • 15:54 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs.
  • 15:54 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts aqs1014.eqiad.wmnet
  • 15:53 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns5004.wikimedia.org
  • 15:51 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns5003.wikimedia.org
  • 15:50 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns5003.wikimedia.org
  • 15:48 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org
  • 15:47 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns4004.wikimedia.org
  • 15:45 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=0) rolling restart_daemons on A:logstash-collector
  • 15:44 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns4003.wikimedia.org
  • 15:42 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns4003.wikimedia.org
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61169 and previous config saved to /var/cache/conftool/dbconfig/20240424-154101-arnaudb.json
  • 15:40 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns3004.wikimedia.org
  • 15:38 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns3004.wikimedia.org
  • 15:37 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
  • 15:37 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 15:35 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns3003.wikimedia.org
  • 15:34 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns3003.wikimedia.org
  • 15:32 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns2006.wikimedia.org
  • 15:31 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns2006.wikimedia.org
  • 15:29 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns2005.wikimedia.org
  • 15:28 ebysans@deploy1002: Finished deploy [analytics/refinery@a5f2b25] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a5f2b252] (duration: 02m 51s)
  • 15:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61168 and previous config saved to /var/cache/conftool/dbconfig/20240424-152811-arnaudb.json
  • 15:27 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns2005.wikimedia.org
  • 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61167 and previous config saved to /var/cache/conftool/dbconfig/20240424-152556-arnaudb.json
  • 15:26 ebysans@deploy1002: Started deploy [analytics/refinery@a5f2b25] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a5f2b252]
  • 15:25 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns2004.wikimedia.org
  • 15:23 ebysans@deploy1002: Finished deploy [analytics/refinery@a5f2b25] (thin): Regular analytics weekly train THIN [analytics/refinery@a5f2b252] (duration: 03m 36s)
  • 15:23 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns2004.wikimedia.org
  • 15:21 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns1006.wikimedia.org
  • 15:20 ebysans@deploy1002: Started deploy [analytics/refinery@a5f2b25] (thin): Regular analytics weekly train THIN [analytics/refinery@a5f2b252]
  • 15:20 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns1006.wikimedia.org
  • 15:18 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns1005.wikimedia.org
  • 15:16 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns1005.wikimedia.org
  • 15:14 sukhe@cumin1002: conftool action : set/pooled=yes; selector: name=dns1004.wikimedia.org
  • 15:13 sukhe@cumin1002: conftool action : set/pooled=no; selector: name=dns1004.wikimedia.org
  • 15:13 ebysans@deploy1002: Finished deploy [analytics/refinery@a5f2b25]: Regular analytics weekly train [analytics/refinery@a5f2b252] (duration: 12m 13s)
  • 15:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61166 and previous config saved to /var/cache/conftool/dbconfig/20240424-151304-arnaudb.json
  • 15:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61165 and previous config saved to /var/cache/conftool/dbconfig/20240424-151050-arnaudb.json
  • 15:10 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:10 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 15:09 fabfur: depooling cp4037 to test tls connection to kafka cluster (T358109)
  • 15:01 ebysans@deploy1002: Started deploy [analytics/refinery@a5f2b25]: Regular analytics weekly train [analytics/refinery@a5f2b252]
  • 15:00 SandraEbele_: starting refinery deployment
  • 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61164 and previous config saved to /var/cache/conftool/dbconfig/20240424-145758-arnaudb.json
  • 14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61163 and previous config saved to /var/cache/conftool/dbconfig/20240424-145545-arnaudb.json
  • 14:55 dancy@deploy1002: Installation of scap version "4.79.0" completed for 325 hosts
  • 14:54 dancy@deploy1002: Installing scap version "4.79.0" for 325 hosts
  • 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1190.eqiad.wmnet with OS bookworm
  • 14:52 moritzm: installing exim4/spamassassin on MXes
  • 14:45 moritzm: installing php7.4 security updates (as shipped in Debian, not our internal component)
  • 14:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61162 and previous config saved to /var/cache/conftool/dbconfig/20240424-144252-arnaudb.json
  • 14:38 sukhe: rolling restart of haproxy, pdns-rec and ntp on A:dnsbox
  • 14:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
  • 14:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1242 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61160 and previous config saved to /var/cache/conftool/dbconfig/20240424-142905-arnaudb.json
  • 14:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61159 and previous config saved to /var/cache/conftool/dbconfig/20240424-142747-arnaudb.json
  • 14:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
  • 14:26 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Deploy new TLS Truststore for PKI - elukey@cumin1002
  • 14:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org
  • 14:20 sukhe: restarting pdns-rec on dns6001
  • 14:19 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org
  • 14:19 moritzm: import djangorestframework 3.14.0-2+wmf12u1 to apt.wikimedia.org (bug fix needed for Bitu 0.7.0, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1068747)
  • 14:14 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS bookworm
  • 14:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db1242 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61158 and previous config saved to /var/cache/conftool/dbconfig/20240424-141400-arnaudb.json
  • 14:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: T362746
  • 14:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1190.eqiad.wmnet with reason: T362746
  • 14:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1190', diff saved to https://phabricator.wikimedia.org/P61157 and previous config saved to /var/cache/conftool/dbconfig/20240424-141305-arnaudb.json
  • 14:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61156 and previous config saved to /var/cache/conftool/dbconfig/20240424-141241-arnaudb.json
  • 14:11 elukey@cumin1002: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching A:restbase-codfw: Deploy new TLS Truststore for PKI - elukey@cumin1002
  • 13:59 urbanecm@deploy1002: Finished scap: Backport for Enabled subpages for main namespace in ptwikimedia (T362300), Updated uzwiktionary project namespace name and site name to follow Uzbek grammar (T362620), Revert "Updated uzwiktionary project namespace name and site name to follow" (T362620) (duration: 14m 08s)
  • 13:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db1242 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61155 and previous config saved to /var/cache/conftool/dbconfig/20240424-135854-arnaudb.json
  • 13:56 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Deploy new TLS Truststore for PKI - elukey@cumin1002
  • 13:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1199.eqiad.wmnet with OS bookworm
  • 13:49 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2021.codfw.wmnet: Deploy new TLS Truststore for PKI - elukey@cumin1002
  • 13:48 urbanecm@deploy1002: urbanecm and nmw03: Continuing with sync
  • 13:48 urbanecm@deploy1002: urbanecm and nmw03: Backport for Enabled subpages for main namespace in ptwikimedia (T362300), Updated uzwiktionary project namespace name and site name to follow Uzbek grammar (T362620), Revert "Updated uzwiktionary project namespace name and site name to follow" (T362620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdeb
  • 13:47 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
  • 13:45 urbanecm@deploy1002: Started scap: Backport for Enabled subpages for main namespace in ptwikimedia (T362300), Updated uzwiktionary project namespace name and site name to follow Uzbek grammar (T362620), Revert "Updated uzwiktionary project namespace name and site name to follow" (T362620)
  • 13:44 urbanecm@deploy1002: Sync cancelled.
  • 13:44 urbanecm@deploy1002: urbanecm and nmw03: Backport for Enabled subpages for main namespace in ptwikimedia (T362300), Updated uzwiktionary project namespace name and site name to follow Uzbek grammar (T362620), Revert "Updated uzwiktionary project namespace name and site name to follow" (T362620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdeb
  • 13:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db1242 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61153 and previous config saved to /var/cache/conftool/dbconfig/20240424-134349-arnaudb.json
  • 13:42 urbanecm@deploy1002: Started scap: Backport for Enabled subpages for main namespace in ptwikimedia (T362300), Updated uzwiktionary project namespace name and site name to follow Uzbek grammar (T362620), Revert "Updated uzwiktionary project namespace name and site name to follow" (T362620)
  • 13:40 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2021.codfw.wmnet: Deploy new TLS Truststore for PKI - elukey@cumin1002
  • 13:39 urbanecm@deploy1002: Sync cancelled.
  • 13:38 elukey@cumin1002: END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching restbase2021.codfw.wmnet: Deploy new TLS Truststore for PKI - elukey@cumin1002
  • 13:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:37 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:37 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2021.codfw.wmnet: Deploy new TLS Truststore for PKI - elukey@cumin1002
  • 13:36 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:36 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:35 urbanecm@deploy1002: urbanecm and nmw03: Backport for Enabled subpages for main namespace in ptwikimedia (T362300), Updated uzwiktionary project namespace name and site name to follow Uzbek grammar (T362620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1199.eqiad.wmnet with reason: host reimage
  • 13:34 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling restart_daemons on A:durum
  • 13:32 urbanecm@deploy1002: Started scap: Backport for Enabled subpages for main namespace in ptwikimedia (T362300), Updated uzwiktionary project namespace name and site name to follow Uzbek grammar (T362620)
  • 13:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1199.eqiad.wmnet with reason: host reimage
  • 13:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db1242 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61152 and previous config saved to /var/cache/conftool/dbconfig/20240424-132841-arnaudb.json
  • 13:25 urbanecm@deploy1002: Finished scap: Backport for Growth: Enable Levelling up features on all wikis (T348086), WikiEduDashboard: allow removal when course is not synced (T363187), WikiEduDashboard: allow removal when course is not synced (T363187) (duration: 20m 21s)
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=0) rolling restart_daemons on A:logstash-collector
  • 13:18 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1199.eqiad.wmnet with OS bookworm
  • 13:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1199.eqiad.wmnet with reason: T362746
  • 13:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1199.eqiad.wmnet with reason: T362746
  • 13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1199', diff saved to https://phabricator.wikimedia.org/P61151 and previous config saved to /var/cache/conftool/dbconfig/20240424-131702-arnaudb.json
  • 13:15 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-eqiad
  • 13:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db1242 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61150 and previous config saved to /var/cache/conftool/dbconfig/20240424-131336-arnaudb.json
  • 13:12 urbanecm@deploy1002: daimona and urbanecm: Continuing with sync
  • 13:09 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling restart_daemons on A:durum
  • 13:07 urbanecm@deploy1002: daimona and urbanecm: Backport for Growth: Enable Levelling up features on all wikis (T348086), WikiEduDashboard: allow removal when course is not synced (T363187), WikiEduDashboard: allow removal when course is not synced (T363187) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 urbanecm@deploy1002: Started scap: Backport for Growth: Enable Levelling up features on all wikis (T348086), WikiEduDashboard: allow removal when course is not synced (T363187), WikiEduDashboard: allow removal when course is not synced (T363187)
  • 13:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1242.eqiad.wmnet with OS bookworm
  • 12:59 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:58 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:57 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
  • 12:52 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-eqiad
  • 12:26 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1242.eqiad.wmnet with OS bookworm
  • 12:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1242', diff saved to https://phabricator.wikimedia.org/P61149 and previous config saved to /var/cache/conftool/dbconfig/20240424-122520-arnaudb.json
  • 12:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1242.eqiad.wmnet with reason: T362746
  • 12:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1242.eqiad.wmnet with reason: T362746
  • 12:23 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on stat1010.eqiad.wmnet with reason: Connecting GPU power cable
  • 12:23 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on stat1010.eqiad.wmnet with reason: Connecting GPU power cable
  • 12:20 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-codfw
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-canary
  • 12:10 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-canary
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:relforge
  • 11:49 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:relforge
  • 11:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61148 and previous config saved to /var/cache/conftool/dbconfig/20240424-113241-arnaudb.json
  • 11:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61147 and previous config saved to /var/cache/conftool/dbconfig/20240424-111735-arnaudb.json
  • 11:05 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database kawikisource (T363242)
  • 11:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61146 and previous config saved to /var/cache/conftool/dbconfig/20240424-110230-arnaudb.json
  • 10:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61145 and previous config saved to /var/cache/conftool/dbconfig/20240424-104724-arnaudb.json
  • 10:40 taavi@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database kawikisource (T363242)
  • 10:39 arnaudb@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61144 and previous config saved to /var/cache/conftool/dbconfig/20240424-103922-arnaudb.json
  • 10:39 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database mswikisource (T363249)
  • 10:39 taavi@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database mswikisource (T363249)
  • 10:38 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database kaawiktionary (T363255)
  • 10:38 taavi@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database kaawiktionary (T363255)
  • 10:37 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database iglwiki (T363262)
  • 10:37 taavi@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database iglwiki (T363262)
  • 10:32 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database mywikisource (T363269)
  • 10:32 taavi@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database mywikisource (T363269)
  • 10:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61143 and previous config saved to /var/cache/conftool/dbconfig/20240424-103218-arnaudb.json
  • 10:30 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database bewwiki
  • 10:30 taavi@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database bewwiki
  • 10:24 arnaudb@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61142 and previous config saved to /var/cache/conftool/dbconfig/20240424-102416-arnaudb.json
  • 10:22 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 10:22 taavi@cumin1002: Added views for new wiki: kuswiki T360302
  • 10:21 taavi@cumin1002: START - Cookbook sre.wikireplicas.add-wiki
  • 10:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 151326
  • 10:18 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 151326
  • 10:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61141 and previous config saved to /var/cache/conftool/dbconfig/20240424-101713-arnaudb.json
  • 10:09 arnaudb@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61140 and previous config saved to /var/cache/conftool/dbconfig/20240424-100910-arnaudb.json
  • 10:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1247.eqiad.wmnet with OS bookworm
  • 09:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61139 and previous config saved to /var/cache/conftool/dbconfig/20240424-095405-arnaudb.json
  • 09:45 taavi: echo "https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-tagline-ca-750k.svg" | mwscript purgeList.php --wiki enwiki # T363057
  • 09:44 taavi@deploy1002: Finished scap: Backport for logos: Update cawiki 750k logo tagline (T363057) (duration: 14m 53s)
  • 09:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1247.eqiad.wmnet with reason: host reimage
  • 09:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1247.eqiad.wmnet with reason: host reimage
  • 09:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P61138 and previous config saved to /var/cache/conftool/dbconfig/20240424-094027-ladsgroup.json
  • 09:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 09:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 09:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P61137 and previous config saved to /var/cache/conftool/dbconfig/20240424-094004-ladsgroup.json
  • 09:39 arnaudb@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61136 and previous config saved to /var/cache/conftool/dbconfig/20240424-093859-arnaudb.json
  • 09:33 taavi@deploy1002: taavi: Continuing with sync
  • 09:32 taavi@deploy1002: taavi: Backport for logos: Update cawiki 750k logo tagline (T363057) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:29 taavi@deploy1002: Started scap: Backport for logos: Update cawiki 750k logo tagline (T363057)
  • 09:29 claime: 80% of external traffix to mw-on-k8s - T362323
  • 09:28 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1247.eqiad.wmnet with OS bookworm
  • 09:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1247', diff saved to https://phabricator.wikimedia.org/P61135 and previous config saved to /var/cache/conftool/dbconfig/20240424-092540-arnaudb.json
  • 09:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P61134 and previous config saved to /var/cache/conftool/dbconfig/20240424-092457-ladsgroup.json
  • 09:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1247.eqiad.wmnet with reason: T362746
  • 09:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1247.eqiad.wmnet with reason: T362746
  • 09:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61133 and previous config saved to /var/cache/conftool/dbconfig/20240424-092353-arnaudb.json
  • 09:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1248.eqiad.wmnet with OS bookworm
  • 09:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P61132 and previous config saved to /var/cache/conftool/dbconfig/20240424-090950-ladsgroup.json
  • 09:08 elukey: run 'kill `pgrep -u dbad2021`' on all stat nodes to unblock puppet
  • 08:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1248.eqiad.wmnet with reason: host reimage
  • 08:54 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 08:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P61131 and previous config saved to /var/cache/conftool/dbconfig/20240424-085442-ladsgroup.json
  • 08:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 08:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 08:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 08:53 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 08:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 08:53 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 08:53 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 08:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1248.eqiad.wmnet with reason: host reimage
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
  • 08:51 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
  • 08:46 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
  • 08:39 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1248.eqiad.wmnet with OS bookworm
  • 08:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1248', diff saved to https://phabricator.wikimedia.org/P61130 and previous config saved to /var/cache/conftool/dbconfig/20240424-083736-arnaudb.json
  • 08:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1248.eqiad.wmnet with reason: T362746
  • 08:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1248.eqiad.wmnet with reason: T362746
  • 08:08 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 08:08 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 06:17 moritzm: installing glibc security updates
  • 04:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P61129 and previous config saved to /var/cache/conftool/dbconfig/20240424-045230-ladsgroup.json
  • 04:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 04:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 00:15 zabe@deploy1002: Finished scap: Backport for Set timezones for new wikis (T360310 T360303 T363263 T363256 T363250 T363243 T363270), Update interwiki cache (duration: 13m 56s)
  • 00:04 zabe@deploy1002: zabe: Continuing with sync
  • 00:03 zabe@deploy1002: zabe: Backport for Set timezones for new wikis (T360310 T360303 T363263 T363256 T363250 T363243 T363270), Update interwiki cache synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:01 zabe@deploy1002: Started scap: Backport for Set timezones for new wikis (T360310 T360303 T363263 T363256 T363250 T363243 T363270), Update interwiki cache

2024-04-23

  • 23:58 eileen: config revision changed from 75af3eb6 to dc53becd
  • 23:57 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=kawikisource --cluster=all 2>&1 | tee /tmp/kawikisource.UpdateSearchIndexConfig.log # T363085
  • 23:56 zabe@deploy1002: Finished scap: Creating kawikisource (T363085) (duration: 14m 40s)
  • 23:45 zabe@deploy1002: zabe: Continuing with sync
  • 23:44 zabe@deploy1002: zabe: Creating kawikisource (T363085) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:42 zabe@deploy1002: Started scap: Creating kawikisource (T363085)
  • 23:41 zabe: create Wikisource Georgian # T363085
  • 23:39 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=mswikisource --cluster=all 2>&1 | tee /tmp/mswikisource.UpdateSearchIndexConfig.log # T363039
  • 23:39 zabe@deploy1002: Finished scap: Creating mswikisource (T363039) (duration: 15m 00s)
  • 23:34 eileen: config revision changed from 974afe9c to 75af3eb6
  • 23:27 zabe@deploy1002: zabe: Continuing with sync
  • 23:26 zabe@deploy1002: zabe: Creating mswikisource (T363039) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:24 zabe@deploy1002: Started scap: Creating mswikisource (T363039)
  • 23:23 zabe: create Wikisource Malay # T363039
  • 23:22 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=kaawiktionary --cluster=all 2>&1 | tee /tmp/kaawiktionary.UpdateSearchIndexConfig.log # T362135
  • 23:20 zabe@deploy1002: Finished scap: Creating kaawiktionary (T362135) (duration: 13m 34s)
  • 23:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P61128 and previous config saved to /var/cache/conftool/dbconfig/20240423-231923-ladsgroup.json
  • 23:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 23:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 23:10 zabe@deploy1002: zabe: Continuing with sync
  • 23:10 zabe@deploy1002: zabe: Creating kaawiktionary (T362135) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:07 zabe@deploy1002: Started scap: Creating kaawiktionary (T362135)
  • 23:06 zabe: create Wiktionary Karakalpak # T362135
  • 23:06 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs1013.eqiad.wmnet
  • 23:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=iglwiki --cluster=all 2>&1 | tee /tmp/iglwiki.UpdateSearchIndexConfig.log # T361644
  • 23:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=iglwiki --cluster=all 2>&1 | tee /tmp/iglwiki.UpdateSearchIndexConfig.log # T362135
  • 23:03 zabe@deploy1002: Finished scap: Creating iglwiki (T361644) (duration: 13m 32s)
  • 23:00 eevans@cumin1002: START - Cookbook sre.hosts.reboot-single for host aqs1013.eqiad.wmnet
  • 22:52 zabe@deploy1002: zabe: Continuing with sync
  • 22:52 zabe@deploy1002: zabe: Creating iglwiki (T361644) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:49 zabe@deploy1002: Started scap: Creating iglwiki (T361644)
  • 22:49 zabe: create Wikipedia Igala # T361644
  • 22:47 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=mywikisource --cluster=all 2>&1 | tee /tmp/mywikisource.UpdateSearchIndexConfig.log # T361085
  • 22:47 zabe@deploy1002: Finished scap: Creating mywikisource (T361085) (duration: 13m 45s)
  • 22:36 zabe@deploy1002: zabe: Continuing with sync
  • 22:36 zabe@deploy1002: zabe: Creating mywikisource (T361085) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:33 zabe@deploy1002: Started scap: Creating mywikisource (T361085)
  • 22:33 zabe: create Wikisource Burmese # T361085
  • 22:29 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=kuswiki --cluster=all 2>&1 | tee /tmp/kuswiki.UpdateSearchIndexConfig.log # T359757
  • 22:28 zabe@deploy1002: Finished scap: Creating kuswiki (T359757) (duration: 14m 10s)
  • 22:18 zabe@deploy1002: zabe: Continuing with sync
  • 22:17 zabe@deploy1002: zabe: Creating kuswiki (T359757) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:14 zabe@deploy1002: Started scap: Creating kuswiki (T359757)
  • 22:14 zabe: create Wikipedia Kusaal # T359757
  • 22:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=bewwiki --cluster=all 2>&1 | tee /tmp/bewwiki.UpdateSearchIndexConfig.log # T357866
  • 22:11 zabe@deploy1002: Finished scap: Creating bewwiki (T357866) (duration: 12m 59s)
  • 21:58 zabe@deploy1002: Started scap: Creating bewwiki (T357866)
  • 21:58 zabe@deploy1002: Sync cancelled.
  • 21:56 zabe@deploy1002: zabe: Creating bewwiki (T357866) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:53 zabe@deploy1002: Started scap: Creating bewwiki (T357866)
  • 21:52 zabe: create Wikipedia Betawi # T357866
  • 21:46 brennen@deploy1002: Finished scap: Backport for Add afl_var_dump to AbuseLogPager::getQueryInfo (T363213) (duration: 16m 05s)
  • 21:35 brennen@deploy1002: brennen: Continuing with sync
  • 21:33 brennen@deploy1002: brennen: Backport for Add afl_var_dump to AbuseLogPager::getQueryInfo (T363213) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:30 brennen@deploy1002: Started scap: Backport for Add afl_var_dump to AbuseLogPager::getQueryInfo (T363213)
  • 21:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:18 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: telxius magru-eqiad - ayounsi@cumin1002"
  • 21:17 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: telxius magru-eqiad - ayounsi@cumin1002"
  • 21:17 zabe: zabe@mwmaint1002:~$ mwscript namespaceDupes.php azwikiquote --fix # T362645
  • 21:16 zabe@deploy1002: Finished scap: Backport for Added namespace alias for Azerbaijani Wikiquote (T362645) (duration: 14m 58s)
  • 21:13 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 21:06 zabe@deploy1002: zabe and nmw03: Continuing with sync
  • 21:04 zabe@deploy1002: zabe and nmw03: Backport for Added namespace alias for Azerbaijani Wikiquote (T362645) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:01 zabe@deploy1002: Started scap: Backport for Added namespace alias for Azerbaijani Wikiquote (T362645)
  • 21:01 zabe@deploy1002: Finished scap: Backport for .nvmrc: Update version from 18.17.0 to 18.20.2, Use dedicated Codex style modules (T362986), Use dedicated Codex style modules (T362986) (duration: 24m 32s)
  • 20:50 zabe@deploy1002: zabe and jdlrobson: Continuing with sync
  • 20:39 zabe@deploy1002: zabe and jdlrobson: Backport for .nvmrc: Update version from 18.17.0 to 18.20.2, Use dedicated Codex style modules (T362986), Use dedicated Codex style modules (T362986) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:36 zabe@deploy1002: Started scap: Backport for .nvmrc: Update version from 18.17.0 to 18.20.2, Use dedicated Codex style modules (T362986), Use dedicated Codex style modules (T362986)
  • 20:35 zabe@deploy1002: Finished scap: Backport for Enable night mode styles on Vector 2022 skin (T362726) (duration: 26m 08s)
  • 20:24 zabe@deploy1002: jdlrobson and zabe: Continuing with sync
  • 20:12 zabe@deploy1002: jdlrobson and zabe: Backport for Enable night mode styles on Vector 2022 skin (T362726) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:09 zabe@deploy1002: Started scap: Backport for Enable night mode styles on Vector 2022 skin (T362726)
  • 19:31 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.2 refs T361396
  • 19:15 brennen@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.2 refs T361396 (duration: 56m 50s)
  • 18:18 brennen@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.2 refs T361396
  • 18:04 brennen: train 1.43.0-wmf.2 (T361396) status: no current blockers, rolling to group0
  • 18:03 jynus: db1208 aka matomo db (data engineering)
  • 18:02 jynus: add backup user to db1208 T349397
  • 17:09 inflatador: bking@mw1461 "restart rsyslog to reclaim fds T357616"
  • 16:46 zabe@deploy1002: Finished scap: T361041 T362529 (duration: 06m 28s)
  • 16:42 denisse@deploy1002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.4.0 - T363141 (duration: 00m 12s)
  • 16:42 denisse@deploy1002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.4.0 - T363141
  • 16:41 denisse: Upgrading LibreNMS in production - T363141
  • 16:41 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 16:39 zabe@deploy1002: Started scap: T361041 T362529
  • 16:37 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:30 taavi: disable puppet on P:mediawiki::webserver to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/1020920 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/1023436
  • 16:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61125 and previous config saved to /var/cache/conftool/dbconfig/20240423-162709-arnaudb.json
  • 16:13 denisse: Backing up LibreNMS DB - T363141
  • 16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61124 and previous config saved to /var/cache/conftool/dbconfig/20240423-161204-arnaudb.json
  • 15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61123 and previous config saved to /var/cache/conftool/dbconfig/20240423-155657-arnaudb.json
  • 15:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:49 brennen@deploy1002: Finished deploy [phabricator/deployment@12abb76]: deploy phab1004 for T363174 (duration: 00m 32s)
  • 15:49 brennen@deploy1002: Started deploy [phabricator/deployment@12abb76]: deploy phab1004 for T363174
  • 15:48 brennen@deploy1002: Finished deploy [phabricator/deployment@12abb76]: test deploy phab2002 for T363174 (duration: 00m 32s)
  • 15:48 brennen@deploy1002: Started deploy [phabricator/deployment@12abb76]: test deploy phab2002 for T363174
  • 15:46 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: T363174
  • 15:46 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: T363174
  • 15:46 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phabricator.wikimedia.org with reason: T363174
  • 15:45 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phabricator.wikimedia.org with reason: T363174
  • 15:45 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: T363174
  • 15:45 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: T363174
  • 15:44 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: T363174
  • 15:44 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: T363174
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61122 and previous config saved to /var/cache/conftool/dbconfig/20240423-154152-arnaudb.json
  • 15:30 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:30 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 15:27 ladsgroup@deploy1002: Finished scap: Backport for logos: revert back the tagline (T363165) (duration: 13m 30s)
  • 15:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 15:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P61121 and previous config saved to /var/cache/conftool/dbconfig/20240423-152725-ladsgroup.json
  • 15:27 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61120 and previous config saved to /var/cache/conftool/dbconfig/20240423-152646-arnaudb.json
  • 15:19 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=1) rolling restart_daemons on A:durum
  • 15:19 moritzm: restarting FPM on phab1004
  • 15:16 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 15:16 ladsgroup@deploy1002: ladsgroup: Backport for logos: revert back the tagline (T363165) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:14 ladsgroup@deploy1002: Started scap: Backport for logos: revert back the tagline (T363165)
  • 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61119 and previous config saved to /var/cache/conftool/dbconfig/20240423-151240-arnaudb.json
  • 15:12 ladsgroup@deploy1002: Finished scap: Backport for logos: Add the override for 1M variant of fawiki (T363165) (duration: 14m 28s)
  • 15:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P61118 and previous config saved to /var/cache/conftool/dbconfig/20240423-151216-ladsgroup.json
  • 15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2136 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61117 and previous config saved to /var/cache/conftool/dbconfig/20240423-151140-arnaudb.json
  • 15:10 arnaudb@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2136.codfw.wmnet with OS bookworm
  • 15:08 jmm@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling restart_daemons on A:durum
  • 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Deploy new TLS Keystore - PKI - elukey@cumin1002
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling restart_daemons on A:durum-drmrs
  • 15:01 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 15:01 ladsgroup@deploy1002: ladsgroup: Backport for logos: Add the override for 1M variant of fawiki (T363165) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:59 vgutierrez: repool ncredir6001
  • 14:58 ladsgroup@deploy1002: Started scap: Backport for logos: Add the override for 1M variant of fawiki (T363165)
  • 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61116 and previous config saved to /var/cache/conftool/dbconfig/20240423-145734-arnaudb.json
  • 14:57 ladsgroup@deploy1002: Finished scap: Backport for logos: Add fawiki logo for 1,000,000 article (T363165) (duration: 17m 38s)
  • 14:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P61115 and previous config saved to /var/cache/conftool/dbconfig/20240423-145709-ladsgroup.json
  • 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61114 and previous config saved to /var/cache/conftool/dbconfig/20240423-145603-arnaudb.json
  • 14:53 jmm@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling restart_daemons on A:durum-drmrs
  • 14:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2136.codfw.wmnet with reason: host reimage
  • 14:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2136.codfw.wmnet with reason: host reimage
  • 14:47 jclark@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts parse1002.eqiad.wmnet
  • 14:46 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 14:44 vgutierrez: depool ncredir6001
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
  • 14:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61113 and previous config saved to /var/cache/conftool/dbconfig/20240423-144229-arnaudb.json
  • 14:42 ladsgroup@deploy1002: ladsgroup: Backport for logos: Add fawiki logo for 1,000,000 article (T363165) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P61112 and previous config saved to /var/cache/conftool/dbconfig/20240423-144202-ladsgroup.json
  • 14:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61111 and previous config saved to /var/cache/conftool/dbconfig/20240423-144057-arnaudb.json
  • 14:39 ladsgroup@deploy1002: Started scap: Backport for logos: Add fawiki logo for 1,000,000 article (T363165)
  • 14:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
  • 14:35 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts parse1002.eqiad.wmnet
  • 14:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['parse1002.eqiad.wmnet']
  • 14:35 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['parse1002.eqiad.wmnet']
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
  • 14:29 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2136.codfw.wmnet with OS bookworm
  • 14:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61110 and previous config saved to /var/cache/conftool/dbconfig/20240423-142723-arnaudb.json
  • 14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P61109 and previous config saved to /var/cache/conftool/dbconfig/20240423-142630-arnaudb.json
  • 14:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2136.codfw.wmnet with reason: T362746
  • 14:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2136.codfw.wmnet with reason: T362746
  • 14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61108 and previous config saved to /var/cache/conftool/dbconfig/20240423-142551-arnaudb.json
  • 14:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2140.codfw.wmnet with OS bookworm
  • 14:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
  • 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir
  • 14:14 zabe@deploy1002: Finished scap: Backport for Update interwiki cache (T363093) (duration: 13m 41s)
  • 14:14 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir
  • 14:13 effie: upload prometheus-memcached-exporter_0.14.2-2~wmf1_amd64 to bookworm-wikimedia - T350807
  • 14:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61107 and previous config saved to /var/cache/conftool/dbconfig/20240423-141045-arnaudb.json
  • 14:10 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir-ulsfo
  • 14:09 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir-ulsfo
  • 14:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2140.codfw.wmnet with reason: host reimage
  • 14:03 zabe@deploy1002: zabe: Continuing with sync
  • 14:03 zabe@deploy1002: zabe: Backport for Update interwiki cache (T363093) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2140.codfw.wmnet with reason: host reimage
  • 14:00 zabe@deploy1002: Started scap: Backport for Update interwiki cache (T363093)
  • 13:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61106 and previous config saved to /var/cache/conftool/dbconfig/20240423-135540-arnaudb.json
  • 13:43 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2140.codfw.wmnet with OS bookworm
  • 13:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2140.codfw.wmnet with reason: T362746
  • 13:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2140.codfw.wmnet with reason: T362746
  • 13:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: Post reimage', diff saved to https://phabricator.wikimedia.org/P61105 and previous config saved to /var/cache/conftool/dbconfig/20240423-134034-arnaudb.json
  • 13:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2147.codfw.wmnet with OS bookworm
  • 13:38 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:37 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:36 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:36 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:35 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:35 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:35 moritzm: installing glibc security updates
  • 13:34 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Deploy new TLS Keystore - PKI - elukey@cumin1002
  • 13:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: Sanitarium master', diff saved to https://phabricator.wikimedia.org/P61103 and previous config saved to /var/cache/conftool/dbconfig/20240423-132633-arnaudb.json
  • 13:19 sukhe: running authdns-update for T362921
  • 13:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2147.codfw.wmnet with reason: host reimage
  • 13:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2147.codfw.wmnet with reason: host reimage
  • 13:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: Sanitarium master', diff saved to https://phabricator.wikimedia.org/P61102 and previous config saved to /var/cache/conftool/dbconfig/20240423-131128-arnaudb.json
  • 12:58 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2147.codfw.wmnet with OS bookworm
  • 12:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2147.codfw.wmnet with reason: T362746
  • 12:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2147.codfw.wmnet with reason: T362746
  • 12:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2147', diff saved to https://phabricator.wikimedia.org/P61101 and previous config saved to /var/cache/conftool/dbconfig/20240423-125703-arnaudb.json
  • 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: Sanitarium master', diff saved to https://phabricator.wikimedia.org/P61100 and previous config saved to /var/cache/conftool/dbconfig/20240423-125622-arnaudb.json
  • 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2155.codfw.wmnet with reason: Reimage db2155
  • 12:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2155.codfw.wmnet with reason: Reimage db2155
  • 12:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 depool', diff saved to https://phabricator.wikimedia.org/P61099 and previous config saved to /var/cache/conftool/dbconfig/20240423-125430-arnaudb.json
  • 12:45 hashar@deploy1002: Finished deploy [gerrit/gerrit@ff51759]: Remove registerStyleModule() for Gerrit 3.8 - T354886 (duration: 00m 07s)
  • 12:17 taavi@deploy1002: taavi: Continuing with sync
  • 12:17 taavi@deploy1002: taavi: Backport for Add cawiki 750k logo (T363057) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:13 taavi@deploy1002: Started scap: Backport for Add cawiki 750k logo (T363057)
  • 11:47 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(mw1414.eqiad.wmnet|mw1415.eqiad.wmnet|mw1416.eqiad.wmnet|mw1448.eqiad.wmnet|mw1449.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 11:47 claime: Pooling and uncordoning mw1414.eqiad.wmnet,mw1415.eqiad.wmnet,mw1416.eqiad.wmnet,mw1448.eqiad.wmnet,mw1449.eqiad.wmnet - T351074
  • 11:39 claime: Running homer 'cr*eqiad*' commit 'T351074'
  • 11:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: Host has hardware issues
  • 11:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: Host has hardware issues
  • 11:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1415.eqiad.wmnet with OS bullseye
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1448.eqiad.wmnet with OS bullseye
  • 11:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1416.eqiad.wmnet with OS bullseye
  • 11:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1449.eqiad.wmnet with OS bullseye
  • 11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1414.eqiad.wmnet with OS bullseye
  • 11:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1415.eqiad.wmnet with reason: host reimage
  • 11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1448.eqiad.wmnet with reason: host reimage
  • 11:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1416.eqiad.wmnet with reason: host reimage
  • 11:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1449.eqiad.wmnet with reason: host reimage
  • 11:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1414.eqiad.wmnet with reason: host reimage
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1449.eqiad.wmnet with reason: host reimage
  • 11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1448.eqiad.wmnet with reason: host reimage
  • 11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1416.eqiad.wmnet with reason: host reimage
  • 11:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1415.eqiad.wmnet with reason: host reimage
  • 11:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1414.eqiad.wmnet with reason: host reimage
  • 10:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 100%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61098 and previous config saved to /var/cache/conftool/dbconfig/20240423-105812-arnaudb.json
  • 10:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1449.eqiad.wmnet with OS bullseye
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1448.eqiad.wmnet with OS bullseye
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1416.eqiad.wmnet with OS bullseye
  • 10:53 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1415.eqiad.wmnet with OS bullseye
  • 10:53 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1414.eqiad.wmnet with OS bullseye
  • 10:45 claime: Depooling mw1414.eqiad.wmnet,mw1415.eqiad.wmnet,mw1416.eqiad.wmnet,mw1448.eqiad.wmnet,mw1449.eqiad.wmnet for reimage to kubernetes - T351074
  • 10:43 jayme: kubectl cordon parse1002.eqiad.wmnet - T363086
  • 10:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 75%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61097 and previous config saved to /var/cache/conftool/dbconfig/20240423-104306-arnaudb.json
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2196.codfw.wmnet
  • 10:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 50%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61094 and previous config saved to /var/cache/conftool/dbconfig/20240423-102801-arnaudb.json
  • 10:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1245.eqiad.wmnet with reason: T360116
  • 10:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1245.eqiad.wmnet with reason: T360116
  • 10:22 btullis@deploy1002: Finished deploy [analytics/hdfs-tools/deploy@3618aab]: (no justification provided) (duration: 00m 11s)
  • 10:22 btullis@deploy1002: Started deploy [analytics/hdfs-tools/deploy@3618aab]: (no justification provided)
  • 10:17 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2196.codfw.wmnet
  • 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1220.eqiad.wmnet
  • 10:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 25%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61093 and previous config saved to /var/cache/conftool/dbconfig/20240423-101255-arnaudb.json
  • 09:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 15%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61092 and previous config saved to /var/cache/conftool/dbconfig/20240423-095749-arnaudb.json
  • 09:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:44 hashar@deploy1002: Finished scap: Backport for logging: always register udp2log handlers (T228838) (duration: 15m 11s)
  • 09:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 10%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61091 and previous config saved to /var/cache/conftool/dbconfig/20240423-094244-arnaudb.json
  • 09:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P61090 and previous config saved to /var/cache/conftool/dbconfig/20240423-094030-arnaudb.json
  • 09:39 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1220.eqiad.wmnet
  • 09:33 hashar@deploy1002: hashar: Continuing with sync
  • 09:31 hashar@deploy1002: hashar: Backport for logging: always register udp2log handlers (T228838) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1179.eqiad.wmnet
  • 09:29 hashar@deploy1002: Started scap: Backport for logging: always register udp2log handlers (T228838)
  • 09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 5%: post upgrade repool', diff saved to https://phabricator.wikimedia.org/P61089 and previous config saved to /var/cache/conftool/dbconfig/20240423-092738-arnaudb.json
  • 09:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P61088 and previous config saved to /var/cache/conftool/dbconfig/20240423-092525-arnaudb.json
  • 09:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2172.codfw.wmnet with OS bookworm
  • 09:18 hashar@deploy1002: Finished deploy [gerrit/gerrit@8b4ae00]: wm-zuul-status: filter based solely on change number - T358253 (duration: 00m 07s)
  • 09:18 hashar@deploy1002: Started deploy [gerrit/gerrit@8b4ae00]: wm-zuul-status: filter based solely on change number - T358253
  • 09:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P61087 and previous config saved to /var/cache/conftool/dbconfig/20240423-091019-arnaudb.json
  • 09:04 hashar: Backport & config window completed
  • 09:04 godog: delete tags for docker-registry.discovery.wmnet/jaeger-es-index-cleaner - T344953
  • 09:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2172.codfw.wmnet with reason: host reimage
  • 09:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1179.eqiad.wmnet
  • 09:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2172.codfw.wmnet with reason: host reimage
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2215.codfw.wmnet
  • 08:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P61086 and previous config saved to /var/cache/conftool/dbconfig/20240423-085514-arnaudb.json
  • 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
  • 08:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
  • 08:43 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2172.codfw.wmnet with OS bookworm
  • 08:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2172.codfw.wmnet with reason: T362746
  • 08:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2172.codfw.wmnet with reason: T362746
  • 08:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2172', diff saved to https://phabricator.wikimedia.org/P61085 and previous config saved to /var/cache/conftool/dbconfig/20240423-084146-arnaudb.json
  • 08:40 hashar@deploy1002: Finished scap: Backport for ParserOutput: don't complain if TOCHTML is unset from ParserCache (T363107) (duration: 16m 13s)
  • 08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P61084 and previous config saved to /var/cache/conftool/dbconfig/20240423-084008-arnaudb.json
  • 08:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2215.codfw.wmnet
  • 08:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2206.codfw.wmnet
  • 08:28 hashar@deploy1002: cscott and hashar: Continuing with sync
  • 08:28 hashar@deploy1002: cscott and hashar: Backport for ParserOutput: don't complain if TOCHTML is unset from ParserCache (T363107) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:28 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2206.codfw.wmnet
  • 08:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2206.codfw.wmnet with reason: T362746
  • 08:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2206.codfw.wmnet with reason: T362746
  • 08:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2206', diff saved to https://phabricator.wikimedia.org/P61083 and previous config saved to /var/cache/conftool/dbconfig/20240423-082621-arnaudb.json
  • 08:24 hashar@deploy1002: Started scap: Backport for ParserOutput: don't complain if TOCHTML is unset from ParserCache (T363107)
  • 08:20 hashar@deploy1002: Sync cancelled.
  • 08:20 hashar@deploy1002: hashar and cscott: Backport for ParserOutput: don't complain if TOCHTML is unset from ParserCache (T363107) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
  • 08:15 hashar@deploy1002: Started scap: Backport for ParserOutput: don't complain if TOCHTML is unset from ParserCache (T363107)
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2191.codfw.wmnet
  • 08:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
  • 08:12 kartik@deploy1002: Finished scap: Backport for CX: Initialize publishNamespace for CXTarget (T349959) (duration: 44m 53s)
  • 08:05 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2191.codfw.wmnet
  • 08:04 godog: restore sre business hour escalation policy - T350192
  • 07:59 kartik@deploy1002: kartik: Continuing with sync
  • 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2131.codfw.wmnet
  • 07:42 kartik@deploy1002: kartik: Backport for CX: Initialize publishNamespace for CXTarget (T349959) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:38 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2131.codfw.wmnet
  • 07:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 07:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 07:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P61082 and previous config saved to /var/cache/conftool/dbconfig/20240423-073658-ladsgroup.json
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2115.codfw.wmnet
  • 07:27 kartik@deploy1002: Started scap: Backport for CX: Initialize publishNamespace for CXTarget (T349959)
  • 07:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2115.codfw.wmnet
  • 07:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P61081 and previous config saved to /var/cache/conftool/dbconfig/20240423-072151-ladsgroup.json
  • 07:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P61080 and previous config saved to /var/cache/conftool/dbconfig/20240423-070643-ladsgroup.json
  • 06:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P61079 and previous config saved to /var/cache/conftool/dbconfig/20240423-065136-ladsgroup.json
  • 03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P61078 and previous config saved to /var/cache/conftool/dbconfig/20240423-034652-ladsgroup.json
  • 03:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 03:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 03:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P61077 and previous config saved to /var/cache/conftool/dbconfig/20240423-034628-ladsgroup.json
  • 03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P61076 and previous config saved to /var/cache/conftool/dbconfig/20240423-033120-ladsgroup.json
  • 03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P61075 and previous config saved to /var/cache/conftool/dbconfig/20240423-031613-ladsgroup.json
  • 03:05 mwpresync@deploy1002: Pruned MediaWiki: 1.42.0-wmf.25 (duration: 05m 37s)
  • 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P61074 and previous config saved to /var/cache/conftool/dbconfig/20240423-030106-ladsgroup.json
  • 00:24 sukhe@cumin2002: dbctl commit (dc=all): 'depool db1246', diff saved to https://phabricator.wikimedia.org/P61073 and previous config saved to /var/cache/conftool/dbconfig/20240423-002413-sukhe.json

2024-04-22

  • 22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P61072 and previous config saved to /var/cache/conftool/dbconfig/20240422-222830-ladsgroup.json
  • 22:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 16:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P61071 and previous config saved to /var/cache/conftool/dbconfig/20240422-162340-ladsgroup.json
  • 16:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 16:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 14:58 mforns@deploy1002: Finished deploy [airflow-dags/analytics@70946de]: (no justification provided) (duration: 00m 27s)
  • 14:57 mforns@deploy1002: Started deploy [airflow-dags/analytics@70946de]: (no justification provided)
  • 14:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1234.eqiad.wmnet with reason: Down
  • 14:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1234.eqiad.wmnet with reason: Down
  • 14:10 mforns@deploy1002: Finished deploy [analytics/refinery@a7af5a6] (hadoop-test): Deploying Commons Impact Metrics dump queries TEST [analytics/refinery@a7af5a6b] (duration: 02m 36s)
  • 14:08 mforns@deploy1002: Started deploy [analytics/refinery@a7af5a6] (hadoop-test): Deploying Commons Impact Metrics dump queries TEST [analytics/refinery@a7af5a6b]
  • 14:05 mforns@deploy1002: Finished deploy [analytics/refinery@a7af5a6] (thin): Deploy Commons Impact Metrics dumps queries THIN [analytics/refinery@a7af5a6b] (duration: 03m 34s)
  • 14:01 mforns@deploy1002: Started deploy [analytics/refinery@a7af5a6] (thin): Deploy Commons Impact Metrics dumps queries THIN [analytics/refinery@a7af5a6b]
  • 14:00 mforns@deploy1002: Finished deploy [analytics/refinery@a7af5a6]: Deploying Commons Impact Metrics dumps queries [analytics/refinery@a7af5a6b] (duration: 13m 02s)
  • 13:47 mforns@deploy1002: Started deploy [analytics/refinery@a7af5a6]: Deploying Commons Impact Metrics dumps queries [analytics/refinery@a7af5a6b]
  • 11:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P61069 and previous config saved to /var/cache/conftool/dbconfig/20240422-112625-ladsgroup.json
  • 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P61068 and previous config saved to /var/cache/conftool/dbconfig/20240422-111117-ladsgroup.json
  • 10:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P61067 and previous config saved to /var/cache/conftool/dbconfig/20240422-105610-ladsgroup.json
  • 10:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P61066 and previous config saved to /var/cache/conftool/dbconfig/20240422-104102-ladsgroup.json
  • 05:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P61065 and previous config saved to /var/cache/conftool/dbconfig/20240422-053006-ladsgroup.json
  • 05:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P61064 and previous config saved to /var/cache/conftool/dbconfig/20240422-051459-ladsgroup.json
  • 04:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P61063 and previous config saved to /var/cache/conftool/dbconfig/20240422-045952-ladsgroup.json
  • 04:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P61062 and previous config saved to /var/cache/conftool/dbconfig/20240422-044444-ladsgroup.json
  • 01:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P61061 and previous config saved to /var/cache/conftool/dbconfig/20240422-010520-ladsgroup.json
  • 01:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 01:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 01:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P61060 and previous config saved to /var/cache/conftool/dbconfig/20240422-010457-ladsgroup.json
  • 00:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P61059 and previous config saved to /var/cache/conftool/dbconfig/20240422-004950-ladsgroup.json
  • 00:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P61058 and previous config saved to /var/cache/conftool/dbconfig/20240422-003442-ladsgroup.json
  • 00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P61057 and previous config saved to /var/cache/conftool/dbconfig/20240422-001933-ladsgroup.json

2024-04-21

  • 19:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P61056 and previous config saved to /var/cache/conftool/dbconfig/20240421-193927-ladsgroup.json
  • 19:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 19:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 19:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P61055 and previous config saved to /var/cache/conftool/dbconfig/20240421-193904-ladsgroup.json
  • 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P61054 and previous config saved to /var/cache/conftool/dbconfig/20240421-192356-ladsgroup.json
  • 19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P61053 and previous config saved to /var/cache/conftool/dbconfig/20240421-190849-ladsgroup.json
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P61052 and previous config saved to /var/cache/conftool/dbconfig/20240421-185342-ladsgroup.json
  • 14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P61050 and previous config saved to /var/cache/conftool/dbconfig/20240421-142433-ladsgroup.json
  • 14:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P61049 and previous config saved to /var/cache/conftool/dbconfig/20240421-142411-ladsgroup.json
  • 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P61048 and previous config saved to /var/cache/conftool/dbconfig/20240421-140904-ladsgroup.json
  • 13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P61047 and previous config saved to /var/cache/conftool/dbconfig/20240421-135356-ladsgroup.json
  • 13:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P61046 and previous config saved to /var/cache/conftool/dbconfig/20240421-133848-ladsgroup.json
  • 09:19 topranks: putting eqiad <-> codfw traffic back on primary 100G transport link following service restoration and stability T362486
  • 09:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P61045 and previous config saved to /var/cache/conftool/dbconfig/20240421-091828-ladsgroup.json
  • 09:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 09:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P61044 and previous config saved to /var/cache/conftool/dbconfig/20240421-015952-ladsgroup.json
  • 01:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 01:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P61043 and previous config saved to /var/cache/conftool/dbconfig/20240421-015929-ladsgroup.json
  • 01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P61042 and previous config saved to /var/cache/conftool/dbconfig/20240421-014422-ladsgroup.json
  • 01:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P61041 and previous config saved to /var/cache/conftool/dbconfig/20240421-012913-ladsgroup.json
  • 01:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P61040 and previous config saved to /var/cache/conftool/dbconfig/20240421-011406-ladsgroup.json

2024-04-20

  • 21:15 taavi: restart gerrit
  • 14:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P61039 and previous config saved to /var/cache/conftool/dbconfig/20240420-131519-ladsgroup.json
  • 13:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P61038 and previous config saved to /var/cache/conftool/dbconfig/20240420-131455-ladsgroup.json
  • 12:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P61037 and previous config saved to /var/cache/conftool/dbconfig/20240420-125948-ladsgroup.json
  • 09:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1234.eqiad.wmnet with reason: Down
  • 09:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1234.eqiad.wmnet with reason: Down
  • 09:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Down', diff saved to https://phabricator.wikimedia.org/P61034 and previous config saved to /var/cache/conftool/dbconfig/20240420-092358-ladsgroup.json
  • 07:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 07:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P61033 and previous config saved to /var/cache/conftool/dbconfig/20240420-003950-ladsgroup.json
  • 00:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P61032 and previous config saved to /var/cache/conftool/dbconfig/20240420-003927-ladsgroup.json
  • 00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P61031 and previous config saved to /var/cache/conftool/dbconfig/20240420-002420-ladsgroup.json
  • 00:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P61030 and previous config saved to /var/cache/conftool/dbconfig/20240420-000912-ladsgroup.json

2024-04-19

  • 23:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P61029 and previous config saved to /var/cache/conftool/dbconfig/20240419-235405-ladsgroup.json
  • 22:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 22:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 21:03 taavi: taavi@mwmaint1002 ~ $ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki Kou.i5h 'Renamed user 8356771833137' # T362942
  • 21:02 taavi: taavi@mwmaint1002 ~ $ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=eowiki --logwiki=metawiki 'Gzsimonfbi' 'Renamed user 2409354752759' # T362941
  • 20:22 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:21 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:12 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:12 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T362508, journal in uncertain state) xfer wikidata from wdqs2022.codfw.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 19:51 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 19:49 jforrester@deploy1002: Finished deploy [integration/docroot@c090350]: I1c1c25 trivial CI fix (duration: 00m 06s)
  • 19:49 jforrester@deploy1002: Started deploy [integration/docroot@c090350]: I1c1c25 trivial CI fix
  • 19:41 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:16 ryankemper: [WDQS] T363004 Restarted wdqs2012 to clear out its in-application-memory ban lists (it had pybal's twisted user agent banned)
  • 18:50 ryankemper: [WDQS] T363004 Restarted wdqs2010 and wdqs2024 to clear out their in-application-memory ban lists
  • 18:34 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T362508, journal in uncertain state) xfer wikidata from wdqs2022.codfw.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 18:33 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:33 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding more reverse v6 INCLUDES into dns for magru transport links - cmooney@cumin1002"
  • 18:32 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding more reverse v6 INCLUDES into dns for magru transport links - cmooney@cumin1002"
  • 18:24 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 18:08 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on mr1-ulsfo,mr1-ulsfo IPv6,mr1-ulsfo.oob,mr1-ulsfo.oob IPv6 with reason: disabling oob link on mr1-ulsfo to stop the SSH attempts long enough to get a homer run in
  • 18:07 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on mr1-ulsfo,mr1-ulsfo IPv6,mr1-ulsfo.oob,mr1-ulsfo.oob IPv6 with reason: disabling oob link on mr1-ulsfo to stop the SSH attempts long enough to get a homer run in
  • 17:58 sukhe: sudo cookbook -d sre.dns.netbox "test"
  • 17:49 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:48 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 16:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 16:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 16:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 16:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 16:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 16:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 16:00 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 15:48 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 15:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 15:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P61028 and previous config saved to /var/cache/conftool/dbconfig/20240419-154430-ladsgroup.json
  • 15:35 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 15:35 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 15:35 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 15:35 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 15:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P61027 and previous config saved to /var/cache/conftool/dbconfig/20240419-152922-ladsgroup.json
  • 15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P61026 and previous config saved to /var/cache/conftool/dbconfig/20240419-151415-ladsgroup.json
  • 15:11 vgutierrez: repool ncredir2001
  • 14:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P61025 and previous config saved to /var/cache/conftool/dbconfig/20240419-145907-ladsgroup.json
  • 14:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T352010)', diff saved to https://phabricator.wikimedia.org/P61023 and previous config saved to /var/cache/conftool/dbconfig/20240419-142726-ladsgroup.json
  • 14:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P61022 and previous config saved to /var/cache/conftool/dbconfig/20240419-141218-ladsgroup.json
  • 13:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P61021 and previous config saved to /var/cache/conftool/dbconfig/20240419-135711-ladsgroup.json
  • 13:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T352010)', diff saved to https://phabricator.wikimedia.org/P61020 and previous config saved to /var/cache/conftool/dbconfig/20240419-134204-ladsgroup.json
  • 13:28 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.copy (exit_code=0) Will create a clone of db1178.eqiad.wmnet onto db1178.eqiad.wmnet
  • 13:27 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db1178.eqiad.wmnet onto db1178.eqiad.wmnet
  • 13:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.copy (exit_code=0) Will create a clone of db1178.eqiad.wmnet onto db1178.eqiad.wmnet
  • 13:22 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db1178.eqiad.wmnet onto db1178.eqiad.wmnet
  • 13:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.copy (exit_code=0) Will create a clone of db1178.eqiad.wmnet onto db1178.eqiad.wmnet
  • 13:22 arnaudb@cumin1002: START - Cookbook sre.mysql.copy Will create a clone of db1178.eqiad.wmnet onto db1178.eqiad.wmnet
  • 12:19 vgutierrez: depool ncredir2001
  • 12:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P61018 and previous config saved to /var/cache/conftool/dbconfig/20240419-120853-ladsgroup.json
  • 12:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 12:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 12:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T352010)', diff saved to https://phabricator.wikimedia.org/P61017 and previous config saved to /var/cache/conftool/dbconfig/20240419-120831-ladsgroup.json
  • 12:01 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:00 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding network interface DNS magru. - cmooney@cumin1002"
  • 12:00 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding network interface DNS magru. - cmooney@cumin1002"
  • 11:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P61015 and previous config saved to /var/cache/conftool/dbconfig/20240419-115323-ladsgroup.json
  • 11:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P61014 and previous config saved to /var/cache/conftool/dbconfig/20240419-113816-ladsgroup.json
  • 11:28 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts matomo1002.eqiad.wmnet
  • 11:28 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:28 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: matomo1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 11:27 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 11:27 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: matomo1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T352010)', diff saved to https://phabricator.wikimedia.org/P61013 and previous config saved to /var/cache/conftool/dbconfig/20240419-112309-ladsgroup.json
  • 11:21 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 11:15 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts matomo1002.eqiad.wmnet
  • 11:05 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2007-dev.codfw.wmnet
  • 11:00 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudnet2007-dev.codfw.wmnet
  • 10:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T352010)', diff saved to https://phabricator.wikimedia.org/P61010 and previous config saved to /var/cache/conftool/dbconfig/20240419-104144-ladsgroup.json
  • 10:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 10:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P61009 and previous config saved to /var/cache/conftool/dbconfig/20240419-102438-root.json
  • 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P61008 and previous config saved to /var/cache/conftool/dbconfig/20240419-100933-root.json
  • 09:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P61007 and previous config saved to /var/cache/conftool/dbconfig/20240419-095427-root.json
  • 09:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P61006 and previous config saved to /var/cache/conftool/dbconfig/20240419-093921-root.json
  • 09:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P61005 and previous config saved to /var/cache/conftool/dbconfig/20240419-092415-root.json
  • 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P61004 and previous config saved to /var/cache/conftool/dbconfig/20240419-090910-root.json
  • 09:03 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 09:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 08:55 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 08:54 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 08:54 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P61003 and previous config saved to /var/cache/conftool/dbconfig/20240419-085404-root.json
  • 08:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS bookworm
  • 07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
  • 07:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
  • 07:38 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS bookworm
  • 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1194', diff saved to https://phabricator.wikimedia.org/P61001 and previous config saved to /var/cache/conftool/dbconfig/20240419-073638-root.json
  • 07:24 moritzm: installing Linux 6.1.85 on Bookworm hosts
  • 07:15 moritzm: installing PHP 7.4 security updates on cloudweb and bullseye snapshot hosts
  • 07:03 moritzm: imported PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf11u2 to component/php74 (backport of latest PHP security fixes)
  • 06:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60999 and previous config saved to /var/cache/conftool/dbconfig/20240419-065142-root.json
  • 06:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P60998 and previous config saved to /var/cache/conftool/dbconfig/20240419-063847-ladsgroup.json
  • 06:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 06:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 06:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P60997 and previous config saved to /var/cache/conftool/dbconfig/20240419-063825-ladsgroup.json
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60996 and previous config saved to /var/cache/conftool/dbconfig/20240419-063636-root.json
  • 06:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P60995 and previous config saved to /var/cache/conftool/dbconfig/20240419-062317-ladsgroup.json
  • 06:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60994 and previous config saved to /var/cache/conftool/dbconfig/20240419-062130-root.json
  • 06:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P60993 and previous config saved to /var/cache/conftool/dbconfig/20240419-060810-ladsgroup.json
  • 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60992 and previous config saved to /var/cache/conftool/dbconfig/20240419-060625-root.json
  • 05:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P60991 and previous config saved to /var/cache/conftool/dbconfig/20240419-055303-ladsgroup.json
  • 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60990 and previous config saved to /var/cache/conftool/dbconfig/20240419-055118-root.json
  • 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60989 and previous config saved to /var/cache/conftool/dbconfig/20240419-053612-root.json
  • 05:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS bookworm
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60988 and previous config saved to /var/cache/conftool/dbconfig/20240419-052107-root.json
  • 05:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage
  • 05:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage
  • 05:02 marostegui: dbmaint Upgrade s7 eqiad to Bookworm and MariaDB 10.6 T362745
  • 05:02 marostegui: dbmaint Upgrade s7 codfw to Bookworm and MariaDB 10.6 T362745
  • 04:50 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS bookworm
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1202', diff saved to https://phabricator.wikimedia.org/P60987 and previous config saved to /var/cache/conftool/dbconfig/20240419-044906-root.json
  • 04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 04:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 04:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 04:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance

2024-04-18

  • 23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T352010)', diff saved to https://phabricator.wikimedia.org/P60986 and previous config saved to /var/cache/conftool/dbconfig/20240418-234247-ladsgroup.json
  • 23:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 23:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P60985 and previous config saved to /var/cache/conftool/dbconfig/20240418-234225-ladsgroup.json
  • 23:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P60984 and previous config saved to /var/cache/conftool/dbconfig/20240418-232717-ladsgroup.json
  • 23:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P60983 and previous config saved to /var/cache/conftool/dbconfig/20240418-231210-ladsgroup.json
  • 23:06 mutante: graphite - switched SSL cert provider from cergen to cfssl - restarted envoyproxy
  • 22:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P60982 and previous config saved to /var/cache/conftool/dbconfig/20240418-225702-ladsgroup.json
  • 22:31 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T362508, excessive lag) xfer wikidata from wdqs2022.codfw.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 21:34 damilare: civicrm upgraded from 28adb4da to e95e03d9
  • 21:11 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T362508, excessive lag) xfer wikidata from wdqs2022.codfw.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 21:01 cjming: end of UTC late backport window
  • 21:00 cjming@deploy1002: Finished scap: Backport for Add templateeditor right to sysops in dawiki and fix typo in group name (T361461) (duration: 16m 24s)
  • 20:48 cjming@deploy1002: cjming and nmw03: Continuing with sync
  • 20:46 cjming@deploy1002: cjming and nmw03: Backport for Add templateeditor right to sysops in dawiki and fix typo in group name (T361461) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:43 cjming@deploy1002: Started scap: Backport for Add templateeditor right to sysops in dawiki and fix typo in group name (T361461)
  • 20:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs2023.codfw.wmnet with reason: T362508
  • 20:42 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on wdqs2023.codfw.wmnet with reason: T362508
  • 20:42 cjming@deploy1002: Finished scap: Backport for Temporarily restore wgMinervaApplyKnownTemplateHacks for cached HTML (T362747) (duration: 17m 14s)
  • 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P60980 and previous config saved to /var/cache/conftool/dbconfig/20240418-203256-ladsgroup.json
  • 20:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P60979 and previous config saved to /var/cache/conftool/dbconfig/20240418-203234-ladsgroup.json
  • 20:30 cjming@deploy1002: jdlrobson and cjming: Continuing with sync
  • 20:27 cjming@deploy1002: jdlrobson and cjming: Backport for Temporarily restore wgMinervaApplyKnownTemplateHacks for cached HTML (T362747) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:25 cjming@deploy1002: Started scap: Backport for Temporarily restore wgMinervaApplyKnownTemplateHacks for cached HTML (T362747)
  • 20:23 cjming@deploy1002: Finished scap: Backport for Revert "WikimediaEvents: Set IPoid URL and enable ip_reputation/score (2nd attempt)", Revert "ext-EventLogging: Add mediawiki.ip_reputation.score" (duration: 14m 56s)
  • 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P60978 and previous config saved to /var/cache/conftool/dbconfig/20240418-201727-ladsgroup.json
  • 20:11 cjming@deploy1002: cjming and phuedx: Continuing with sync
  • 20:11 cjming@deploy1002: cjming and phuedx: Backport for Revert "WikimediaEvents: Set IPoid URL and enable ip_reputation/score (2nd attempt)", Revert "ext-EventLogging: Add mediawiki.ip_reputation.score" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:09 cjming@deploy1002: Started scap: Backport for Revert "WikimediaEvents: Set IPoid URL and enable ip_reputation/score (2nd attempt)", Revert "ext-EventLogging: Add mediawiki.ip_reputation.score"
  • 20:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P60977 and previous config saved to /var/cache/conftool/dbconfig/20240418-200218-ladsgroup.json
  • 20:00 ejegg: donorwiki upgraded from 5e39bdc5 to b005071a
  • 19:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T361627)', diff saved to https://phabricator.wikimedia.org/P60976 and previous config saved to /var/cache/conftool/dbconfig/20240418-195244-marostegui.json
  • 19:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P60975 and previous config saved to /var/cache/conftool/dbconfig/20240418-194711-ladsgroup.json
  • 19:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P60974 and previous config saved to /var/cache/conftool/dbconfig/20240418-193737-marostegui.json
  • 19:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P60973 and previous config saved to /var/cache/conftool/dbconfig/20240418-192229-marostegui.json
  • 19:21 Amir1: dropping wikiadmin user on 10.64.% on RW es sections
  • 19:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T361627)', diff saved to https://phabricator.wikimedia.org/P60972 and previous config saved to /var/cache/conftool/dbconfig/20240418-190722-marostegui.json
  • 19:05 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=codfw
  • 19:03 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic2088\.codfw\.wmnet
  • 19:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T361627)', diff saved to https://phabricator.wikimedia.org/P60971 and previous config saved to /var/cache/conftool/dbconfig/20240418-190249-marostegui.json
  • 19:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 19:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 19:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T361627)', diff saved to https://phabricator.wikimedia.org/P60970 and previous config saved to /var/cache/conftool/dbconfig/20240418-190226-marostegui.json
  • 18:51 dancy@deploy1002: Installation of scap version "4.78.0" completed for 330 hosts
  • 18:50 dancy@deploy1002: Installing scap version "4.78.0" for 330 hosts
  • 18:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P60969 and previous config saved to /var/cache/conftool/dbconfig/20240418-184718-marostegui.json
  • 18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P60968 and previous config saved to /var/cache/conftool/dbconfig/20240418-184645-ladsgroup.json
  • 18:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 18:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P60967 and previous config saved to /var/cache/conftool/dbconfig/20240418-184623-ladsgroup.json
  • 18:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P60966 and previous config saved to /var/cache/conftool/dbconfig/20240418-183211-marostegui.json
  • 18:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P60965 and previous config saved to /var/cache/conftool/dbconfig/20240418-183116-ladsgroup.json
  • 18:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.1 refs T361395
  • 18:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T361627)', diff saved to https://phabricator.wikimedia.org/P60964 and previous config saved to /var/cache/conftool/dbconfig/20240418-181704-marostegui.json
  • 18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P60963 and previous config saved to /var/cache/conftool/dbconfig/20240418-181606-ladsgroup.json
  • 18:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T361627)', diff saved to https://phabricator.wikimedia.org/P60962 and previous config saved to /var/cache/conftool/dbconfig/20240418-181450-marostegui.json
  • 18:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 18:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 18:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 18:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 18:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T361627)', diff saved to https://phabricator.wikimedia.org/P60961 and previous config saved to /var/cache/conftool/dbconfig/20240418-181048-marostegui.json
  • 18:09 joal@deploy1002: Finished deploy [airflow-dags/analytics@980dc72]: Deploy of Analytics airflow dags for canary-events job [airflow-dags/analytics@980dc725] (duration: 00m 31s)
  • 18:09 joal@deploy1002: Started deploy [airflow-dags/analytics@980dc72]: Deploy of Analytics airflow dags for canary-events job [airflow-dags/analytics@980dc725]
  • 18:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P60960 and previous config saved to /var/cache/conftool/dbconfig/20240418-180059-ladsgroup.json
  • 17:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P60959 and previous config saved to /var/cache/conftool/dbconfig/20240418-175541-marostegui.json
  • 17:41 joal@deploy1002: Finished deploy [airflow-dags/analytics@0a13b42]: Deploy of Analytics airflow dags for canary-events job [airflow-dags/analytics@0a13b420] (duration: 00m 28s)
  • 17:41 joal@deploy1002: Started deploy [airflow-dags/analytics@0a13b42]: Deploy of Analytics airflow dags for canary-events job [airflow-dags/analytics@0a13b420]
  • 17:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P60958 and previous config saved to /var/cache/conftool/dbconfig/20240418-174033-marostegui.json
  • 17:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T361627)', diff saved to https://phabricator.wikimedia.org/P60957 and previous config saved to /var/cache/conftool/dbconfig/20240418-172525-marostegui.json
  • 17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T361627)', diff saved to https://phabricator.wikimedia.org/P60956 and previous config saved to /var/cache/conftool/dbconfig/20240418-172412-marostegui.json
  • 17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 17:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 17:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T361627)', diff saved to https://phabricator.wikimedia.org/P60955 and previous config saved to /var/cache/conftool/dbconfig/20240418-172349-marostegui.json
  • 17:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P60954 and previous config saved to /var/cache/conftool/dbconfig/20240418-170842-marostegui.json
  • 16:57 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P60952 and previous config saved to /var/cache/conftool/dbconfig/20240418-165334-marostegui.json
  • 16:45 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on matomo1002.eqiad.wmnet with reason: Migrating to new version
  • 16:44 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on matomo1002.eqiad.wmnet with reason: Migrating to new version
  • 16:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T361627)', diff saved to https://phabricator.wikimedia.org/P60951 and previous config saved to /var/cache/conftool/dbconfig/20240418-163827-marostegui.json
  • 16:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T361627)', diff saved to https://phabricator.wikimedia.org/P60950 and previous config saved to /var/cache/conftool/dbconfig/20240418-163612-marostegui.json
  • 16:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T361627)', diff saved to https://phabricator.wikimedia.org/P60949 and previous config saved to /var/cache/conftool/dbconfig/20240418-163600-marostegui.json
  • 16:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P60948 and previous config saved to /var/cache/conftool/dbconfig/20240418-162053-marostegui.json
  • 16:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P60947 and previous config saved to /var/cache/conftool/dbconfig/20240418-160546-marostegui.json
  • 16:03 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:03 vgutierrez: repool ncredir2001
  • 16:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:01 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:01 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:01 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:59 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:59 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:58 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:57 elukey@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad
  • 15:54 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:53 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:53 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:50 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T361627)', diff saved to https://phabricator.wikimedia.org/P60946 and previous config saved to /var/cache/conftool/dbconfig/20240418-155038-marostegui.json
  • 15:50 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:49 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:45 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T361627)', diff saved to https://phabricator.wikimedia.org/P60945 and previous config saved to /var/cache/conftool/dbconfig/20240418-154547-marostegui.json
  • 15:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 15:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 15:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T361627)', diff saved to https://phabricator.wikimedia.org/P60944 and previous config saved to /var/cache/conftool/dbconfig/20240418-154524-marostegui.json
  • 15:44 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:44 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:43 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:43 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:42 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:32 moritzm: installing util-linux security updates on buster
  • 15:31 elukey@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad
  • 15:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P60943 and previous config saved to /var/cache/conftool/dbconfig/20240418-153017-marostegui.json
  • 15:26 volans: rolling python3-wmflib upgrade to 1.2.5 across the fleet
  • 15:19 mforns@deploy1002: Finished deploy [airflow-dags/analytics@5fb4f99]: (no justification provided) (duration: 00m 32s)
  • 15:18 mforns@deploy1002: Started deploy [airflow-dags/analytics@5fb4f99]: (no justification provided)
  • 15:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P60942 and previous config saved to /var/cache/conftool/dbconfig/20240418-151510-marostegui.json
  • 15:13 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(mw1355.eqiad.wmnet|mw1480.eqiad.wmnet|mw1481.eqiad.wmnet|mw1487.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 15:12 claime: Pooling and uncordoning mw1355.eqiad.wmnet,mw1480.eqiad.wmnet,mw1481.eqiad.wmnet,mw1487.eqiad.wmnet - T351074
  • 15:09 elukey@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs2012.codfw.wmnet*: Deploy new TLS Keystore - PKI - elukey@cumin2002
  • 15:04 claime: Running homer 'cr*eqiad*' commit 'T351074'
  • 15:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1480.eqiad.wmnet with OS bullseye
  • 15:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1355.eqiad.wmnet with OS bullseye
  • 15:02 elukey@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs2012.codfw.wmnet*: Deploy new TLS Keystore - PKI - elukey@cumin2002
  • 15:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T361627)', diff saved to https://phabricator.wikimedia.org/P60941 and previous config saved to /var/cache/conftool/dbconfig/20240418-150003-marostegui.json
  • 14:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1487.eqiad.wmnet with OS bullseye
  • 14:56 elukey@cumin2002: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching aqs20[09-12]*: Deploy new TLS Keystore - PKI - elukey@cumin2002
  • 14:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1481.eqiad.wmnet with OS bullseye
  • 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T361627)', diff saved to https://phabricator.wikimedia.org/P60940 and previous config saved to /var/cache/conftool/dbconfig/20240418-145512-marostegui.json
  • 14:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T361627)', diff saved to https://phabricator.wikimedia.org/P60939 and previous config saved to /var/cache/conftool/dbconfig/20240418-145435-marostegui.json
  • 14:49 volans: uploaded python3-wmflib_1.2.5 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
  • 14:48 moritzm: installing PHP 7.4 security updates (as packaged in Debian, not the WMF-internal build)
  • 14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1480.eqiad.wmnet with reason: host reimage
  • 14:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1355.eqiad.wmnet with reason: host reimage
  • 14:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1487.eqiad.wmnet with reason: host reimage
  • 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P60938 and previous config saved to /var/cache/conftool/dbconfig/20240418-143928-marostegui.json
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1481.eqiad.wmnet with reason: host reimage
  • 14:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1487.eqiad.wmnet with reason: host reimage
  • 14:35 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1355.eqiad.wmnet with reason: host reimage
  • 14:35 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1480.eqiad.wmnet with reason: host reimage
  • 14:34 elukey@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[09-12]*: Deploy new TLS Keystore - PKI - elukey@cumin2002
  • 14:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1481.eqiad.wmnet with reason: host reimage
  • 14:28 elukey@cumin1002: END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching aqs20[9-12]*: Deploy new TLS Keystore - PKI - elukey@cumin1002
  • 14:28 moritzm: installing cryptsetup bugfix updates from bookworm point release
  • 14:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P60937 and previous config saved to /var/cache/conftool/dbconfig/20240418-142420-marostegui.json
  • 14:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1487.eqiad.wmnet with OS bullseye
  • 14:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1481.eqiad.wmnet with OS bullseye
  • 14:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1480.eqiad.wmnet with OS bullseye
  • 14:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1355.eqiad.wmnet with OS bullseye
  • 14:19 moritzm: installing usrmerge bugfix updates from bookworm point release
  • 14:12 claime: Depooling mw1355.eqiad.wmnet,mw1480.eqiad.wmnet,mw1481.eqiad.wmnet,mw1487.eqiad.wmnet - T351074
  • 14:12 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 14:11 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:11 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Added extendedconfirmed and templateeditor rights to dawiki (T361461) (duration: 16m 51s)
  • 14:09 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[9-12]*: Deploy new TLS Keystore - PKI - elukey@cumin1002
  • 14:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T361627)', diff saved to https://phabricator.wikimedia.org/P60936 and previous config saved to /var/cache/conftool/dbconfig/20240418-140913-marostegui.json
  • 14:08 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 14:08 moritzm: installing postgresql-15 security updates
  • 14:06 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 14:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T361627)', diff saved to https://phabricator.wikimedia.org/P60935 and previous config saved to /var/cache/conftool/dbconfig/20240418-140421-marostegui.json
  • 14:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 14:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 14:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T361627)', diff saved to https://phabricator.wikimedia.org/P60934 and previous config saved to /var/cache/conftool/dbconfig/20240418-140359-marostegui.json
  • 14:00 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 13:59 logmsgbot: lucaswerkmeister-wmde@deploy1002 nmw03 and lucaswerkmeister-wmde: Continuing with sync
  • 13:58 elukey@cumin1002: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching aqs20[02-12]*: Deploy new TLS Keystore - PKI - elukey@cumin1002
  • 13:57 logmsgbot: lucaswerkmeister-wmde@deploy1002 nmw03 and lucaswerkmeister-wmde: Backport for Added extendedconfirmed and templateeditor rights to dawiki (T361461) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:56 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:54 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Added extendedconfirmed and templateeditor rights to dawiki (T361461)
  • 13:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: sync on main
  • 13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add 'mainpage-title-loggedin' to $wgForceUIMsgAsContentMsg (T361171) (duration: 19m 37s)
  • 13:51 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 13:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P60932 and previous config saved to /var/cache/conftool/dbconfig/20240418-134852-marostegui.json
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1024.eqiad.wmnet
  • 13:47 jynus: add grants for dbprov1005 at dbbackups (m1) T362509
  • 13:40 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es1024.eqiad.wmnet
  • 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1023.eqiad.wmnet
  • 13:39 moritzm: installing Linux 6.1.85 on Bookworm hosts
  • 13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde and jhsoby: Continuing with sync
  • 13:37 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde and jhsoby: Backport for Add 'mainpage-title-loggedin' to $wgForceUIMsgAsContentMsg (T361171) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:37 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 13:34 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es1023.eqiad.wmnet
  • 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2025.codfw.wmnet
  • 13:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P60930 and previous config saved to /var/cache/conftool/dbconfig/20240418-133344-marostegui.json
  • 13:32 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Add 'mainpage-title-loggedin' to $wgForceUIMsgAsContentMsg (T361171)
  • 13:28 moritzm: installing apache2 security updates
  • 13:28 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2025.codfw.wmnet
  • 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2024.codfw.wmnet
  • 13:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2103.codfw.wmnet
  • 13:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2103.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 13:19 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2103.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 13:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T361627)', diff saved to https://phabricator.wikimedia.org/P60928 and previous config saved to /var/cache/conftool/dbconfig/20240418-131836-marostegui.json
  • 13:17 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 13:14 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2024.codfw.wmnet
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T361627)', diff saved to https://phabricator.wikimedia.org/P60927 and previous config saved to /var/cache/conftool/dbconfig/20240418-131311-marostegui.json
  • 13:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 13:12 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2103.codfw.wmnet
  • 13:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T361627)', diff saved to https://phabricator.wikimedia.org/P60926 and previous config saved to /var/cache/conftool/dbconfig/20240418-131248-marostegui.json
  • 13:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2103 depool', diff saved to https://phabricator.wikimedia.org/P60925 and previous config saved to /var/cache/conftool/dbconfig/20240418-131027-arnaudb.json
  • 13:07 elukey@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[02-12]*: Deploy new TLS Keystore - PKI - elukey@cumin1002
  • 13:06 elukey: aqs2001's Cassandra instances moved to PKI TLS certs
  • 13:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2105.codfw.wmnet
  • 13:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2105.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 13:00 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2105.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 12:58 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 12:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P60923 and previous config saved to /var/cache/conftool/dbconfig/20240418-125739-marostegui.json
  • 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2023.codfw.wmnet
  • 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2105.codfw.wmnet
  • 12:54 sukhe: sudo cumin -b1 -s600 "A:dnsbox" "systemctl restart ntp.service" to pick up magru /24: T346722
  • 12:53 arnaudb@cumin1002: dbctl commit (dc=all): 'db2105 depool', diff saved to https://phabricator.wikimedia.org/P60922 and previous config saved to /var/cache/conftool/dbconfig/20240418-125338-arnaudb.json
  • 12:49 elukey: move aqs codfw cassandra instances to PKI TLS certs - T352647
  • 12:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2106.codfw.wmnet
  • 12:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2106.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 12:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T361627)', diff saved to https://phabricator.wikimedia.org/P60919 and previous config saved to /var/cache/conftool/dbconfig/20240418-122721-marostegui.json
  • 12:26 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T361627)', diff saved to https://phabricator.wikimedia.org/P60918 and previous config saved to /var/cache/conftool/dbconfig/20240418-122227-marostegui.json
  • 12:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 12:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 12:21 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2107.codfw.wmnet
  • 12:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db2107 depool', diff saved to https://phabricator.wikimedia.org/P60917 and previous config saved to /var/cache/conftool/dbconfig/20240418-122122-arnaudb.json
  • 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T361627)', diff saved to https://phabricator.wikimedia.org/P60916 and previous config saved to /var/cache/conftool/dbconfig/20240418-121559-marostegui.json
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host crm2001.codfw.wmnet
  • 12:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host crm2001.codfw.wmnet
  • 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1003.eqiad.wmnet
  • 12:13 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host matomo1003.eqiad.wmnet
  • 12:08 vgutierrez: depool ncredir2001
  • 12:06 eoghan: Switching phab1004 to use cfssl issued ssl cert https://gerrit.wikimedia.org/r/c/operations/puppet/+/1020190
  • 12:02 moritzm: installing PHP 8.2 security updates
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P60915 and previous config saved to /var/cache/conftool/dbconfig/20240418-120051-marostegui.json
  • 12:00 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:54 moritzm: upgrading PHP security updates on eqiad baremetal servers T362511
  • 11:52 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(mw2302.codfw.wmnet|mw2303.codfw.wmnet|mw2304.codfw.wmnet|mw2332.codfw.wmnet|mw2333.codfw.wmnet|mw2334.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 11:52 claime: Pooling and uncordoning mw2302.codfw.wmnet,mw2303.codfw.wmnet,mw2304.codfw.wmnet,mw2332.codfw.wmnet,mw2333.codfw.wmnet,mw2334.codfw.wmnet - T351074
  • 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P60914 and previous config saved to /var/cache/conftool/dbconfig/20240418-114544-marostegui.json
  • 11:42 claime: Running homer 'cr*codfw*' commit 'T351074'
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2333.codfw.wmnet with OS bullseye
  • 11:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T361627)', diff saved to https://phabricator.wikimedia.org/P60913 and previous config saved to /var/cache/conftool/dbconfig/20240418-113037-marostegui.json
  • 11:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2302.codfw.wmnet with OS bullseye
  • 11:29 cgoubert@deploy1002: Finished scap: Redeploy mw-on-k8s with full rebuild - Fix setting php.timeout - T358308 (duration: 37m 04s)
  • 11:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T361627)', diff saved to https://phabricator.wikimedia.org/P60912 and previous config saved to /var/cache/conftool/dbconfig/20240418-112827-marostegui.json
  • 11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P60911 and previous config saved to /var/cache/conftool/dbconfig/20240418-112816-ladsgroup.json
  • 11:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1021.eqiad.wmnet
  • 11:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 11:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2304.codfw.wmnet with OS bullseye
  • 11:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T361627)', diff saved to https://phabricator.wikimedia.org/P60910 and previous config saved to /var/cache/conftool/dbconfig/20240418-112459-marostegui.json
  • 11:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2334.codfw.wmnet with OS bullseye
  • 11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2332.codfw.wmnet with OS bullseye
  • 11:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es1021.eqiad.wmnet
  • 11:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2303.codfw.wmnet with OS bullseye
  • 11:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1020.eqiad.wmnet
  • 11:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T355609)', diff saved to https://phabricator.wikimedia.org/P60909 and previous config saved to /var/cache/conftool/dbconfig/20240418-111132-marostegui.json
  • 11:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2302.codfw.wmnet with reason: host reimage
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P60908 and previous config saved to /var/cache/conftool/dbconfig/20240418-110950-marostegui.json
  • 11:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2333.codfw.wmnet with reason: host reimage
  • 11:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2304.codfw.wmnet with reason: host reimage
  • 11:03 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es1020.eqiad.wmnet
  • 11:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2334.codfw.wmnet with reason: host reimage
  • 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2022.codfw.wmnet
  • 11:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2332.codfw.wmnet with reason: host reimage
  • 10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2303.codfw.wmnet with reason: host reimage
  • 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P60907 and previous config saved to /var/cache/conftool/dbconfig/20240418-105624-marostegui.json
  • 10:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2334.codfw.wmnet with reason: host reimage
  • 10:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2333.codfw.wmnet with reason: host reimage
  • 10:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2304.codfw.wmnet with reason: host reimage
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2332.codfw.wmnet with reason: host reimage
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P60906 and previous config saved to /var/cache/conftool/dbconfig/20240418-105441-marostegui.json
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2302.codfw.wmnet with reason: host reimage
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2303.codfw.wmnet with reason: host reimage
  • 10:52 cgoubert@deploy1002: Started scap: Redeploy mw-on-k8s with full rebuild - Fix setting php.timeout - T358308
  • 10:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2022.codfw.wmnet
  • 10:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2021.codfw.wmnet
  • 10:45 claime: Rebuild php7.4-fpm production images - T358308
  • 10:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P60905 and previous config saved to /var/cache/conftool/dbconfig/20240418-104117-marostegui.json
  • 10:40 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2021.codfw.wmnet
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2334.codfw.wmnet with OS bullseye
  • 10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T361627)', diff saved to https://phabricator.wikimedia.org/P60904 and previous config saved to /var/cache/conftool/dbconfig/20240418-103933-marostegui.json
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2020.codfw.wmnet
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2333.codfw.wmnet with OS bullseye
  • 10:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2332.codfw.wmnet with OS bullseye
  • 10:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2304.codfw.wmnet with OS bullseye
  • 10:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2303.codfw.wmnet with OS bullseye
  • 10:37 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2302.codfw.wmnet with OS bullseye
  • 10:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T361627)', diff saved to https://phabricator.wikimedia.org/P60903 and previous config saved to /var/cache/conftool/dbconfig/20240418-103422-marostegui.json
  • 10:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 10:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 10:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T361627)', diff saved to https://phabricator.wikimedia.org/P60902 and previous config saved to /var/cache/conftool/dbconfig/20240418-103359-marostegui.json
  • 10:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2020.codfw.wmnet
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T355609)', diff saved to https://phabricator.wikimedia.org/P60901 and previous config saved to /var/cache/conftool/dbconfig/20240418-102609-marostegui.json
  • 10:25 claime: Depooling mw2302.codfw.wmnet,mw2303.codfw.wmnet,mw2304.codfw.wmnet,mw2332.codfw.wmnet,mw2333.codfw.wmnet,mw2334.codfw.wmnet for reimage - T351074
  • 10:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P60900 and previous config saved to /var/cache/conftool/dbconfig/20240418-101852-marostegui.json
  • 10:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T355609)', diff saved to https://phabricator.wikimedia.org/P60899 and previous config saved to /var/cache/conftool/dbconfig/20240418-101841-marostegui.json
  • 10:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 10:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host contint1002.wikimedia.org
  • 10:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P60898 and previous config saved to /var/cache/conftool/dbconfig/20240418-100338-marostegui.json
  • 09:57 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host contint1002.wikimedia.org
  • 09:54 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1087.eqiad.wmnet
  • 09:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P60897 and previous config saved to /var/cache/conftool/dbconfig/20240418-095331-ladsgroup.json
  • 09:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
  • 09:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
  • 09:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P60896 and previous config saved to /var/cache/conftool/dbconfig/20240418-095308-ladsgroup.json
  • 09:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T361627)', diff saved to https://phabricator.wikimedia.org/P60895 and previous config saved to /var/cache/conftool/dbconfig/20240418-094830-marostegui.json
  • 09:46 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1087.eqiad.wmnet
  • 09:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T361627)', diff saved to https://phabricator.wikimedia.org/P60894 and previous config saved to /var/cache/conftool/dbconfig/20240418-094619-marostegui.json
  • 09:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T361627)', diff saved to https://phabricator.wikimedia.org/P60893 and previous config saved to /var/cache/conftool/dbconfig/20240418-094556-marostegui.json
  • 09:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T356166)', diff saved to https://phabricator.wikimedia.org/P60892 and previous config saved to /var/cache/conftool/dbconfig/20240418-094235-marostegui.json
  • 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1034.eqiad.wmnet
  • 09:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P60891 and previous config saved to /var/cache/conftool/dbconfig/20240418-093759-ladsgroup.json
  • 09:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2108.codfw.wmnet
  • 09:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2108.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 09:34 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2108.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 09:32 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 09:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P60890 and previous config saved to /var/cache/conftool/dbconfig/20240418-093049-marostegui.json
  • 09:27 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2108.codfw.wmnet
  • 09:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P60889 and previous config saved to /var/cache/conftool/dbconfig/20240418-092728-marostegui.json
  • 09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2108', diff saved to https://phabricator.wikimedia.org/P60888 and previous config saved to /var/cache/conftool/dbconfig/20240418-092718-arnaudb.json
  • 09:25 mforns@deploy1002: Finished deploy [analytics/refinery@be07da9]: Regular analytics weekly train [analytics/refinery@be07da9e] (duration: 00m 20s)
  • 09:25 mforns@deploy1002: Started deploy [analytics/refinery@be07da9]: Regular analytics weekly train [analytics/refinery@be07da9e]
  • 09:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2108', diff saved to https://phabricator.wikimedia.org/P60887 and previous config saved to /var/cache/conftool/dbconfig/20240418-092504-arnaudb.json
  • 09:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es1034.eqiad.wmnet
  • 09:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P60886 and previous config saved to /var/cache/conftool/dbconfig/20240418-092252-ladsgroup.json
  • 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1028.eqiad.wmnet
  • 09:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2109.codfw.wmnet
  • 09:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2109.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 09:19 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2109.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 09:17 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P60885 and previous config saved to /var/cache/conftool/dbconfig/20240418-091541-marostegui.json
  • 09:13 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2109.codfw.wmnet
  • 09:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2109', diff saved to https://phabricator.wikimedia.org/P60884 and previous config saved to /var/cache/conftool/dbconfig/20240418-091251-arnaudb.json
  • 09:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P60883 and previous config saved to /var/cache/conftool/dbconfig/20240418-091126-marostegui.json
  • 09:09 mforns@deploy1002: Finished deploy [analytics/refinery@be07da9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@be07da9e] (duration: 02m 46s)
  • 09:08 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es1028.eqiad.wmnet
  • 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2034.codfw.wmnet
  • 09:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P60882 and previous config saved to /var/cache/conftool/dbconfig/20240418-090744-ladsgroup.json
  • 09:06 mforns@deploy1002: Started deploy [analytics/refinery@be07da9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@be07da9e]
  • 09:06 mforns@deploy1002: Finished deploy [analytics/refinery@be07da9] (thin): Regular analytics weekly train THIN [analytics/refinery@be07da9e] (duration: 03m 45s)
  • 09:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2110.codfw.wmnet
  • 09:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2110.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 09:03 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2110.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 09:02 mforns@deploy1002: Started deploy [analytics/refinery@be07da9] (thin): Regular analytics weekly train THIN [analytics/refinery@be07da9e]
  • 09:01 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T361627)', diff saved to https://phabricator.wikimedia.org/P60881 and previous config saved to /var/cache/conftool/dbconfig/20240418-090032-marostegui.json
  • 08:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T361627)', diff saved to https://phabricator.wikimedia.org/P60880 and previous config saved to /var/cache/conftool/dbconfig/20240418-085922-marostegui.json
  • 08:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T361627)', diff saved to https://phabricator.wikimedia.org/P60879 and previous config saved to /var/cache/conftool/dbconfig/20240418-085900-marostegui.json
  • 08:57 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2110.codfw.wmnet
  • 08:57 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2034.codfw.wmnet
  • 08:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T356166)', diff saved to https://phabricator.wikimedia.org/P60878 and previous config saved to /var/cache/conftool/dbconfig/20240418-085619-marostegui.json
  • 08:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2110', diff saved to https://phabricator.wikimedia.org/P60877 and previous config saved to /var/cache/conftool/dbconfig/20240418-085608-arnaudb.json
  • 08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2110 depool', diff saved to https://phabricator.wikimedia.org/P60876 and previous config saved to /var/cache/conftool/dbconfig/20240418-085235-arnaudb.json
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T356166)', diff saved to https://phabricator.wikimedia.org/P60875 and previous config saved to /var/cache/conftool/dbconfig/20240418-084510-marostegui.json
  • 08:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 08:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P60874 and previous config saved to /var/cache/conftool/dbconfig/20240418-084353-marostegui.json
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2029.codfw.wmnet
  • 08:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60873 and previous config saved to /var/cache/conftool/dbconfig/20240418-084223-arnaudb.json
  • 08:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2111.codfw.wmnet
  • 08:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2111.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 08:41 mforns@deploy1002: Finished deploy [analytics/refinery@be07da9]: Regular analytics weekly train [analytics/refinery@be07da9e] (duration: 00m 15s)
  • 08:41 mforns@deploy1002: Started deploy [analytics/refinery@be07da9]: Regular analytics weekly train [analytics/refinery@be07da9e]
  • 08:40 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2111.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 08:38 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 08:34 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2111.codfw.wmnet
  • 08:34 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2029.codfw.wmnet
  • 08:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 depool', diff saved to https://phabricator.wikimedia.org/P60872 and previous config saved to /var/cache/conftool/dbconfig/20240418-083422-arnaudb.json
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2027.codfw.wmnet
  • 08:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 depool', diff saved to https://phabricator.wikimedia.org/P60871 and previous config saved to /var/cache/conftool/dbconfig/20240418-083245-arnaudb.json
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P60870 and previous config saved to /var/cache/conftool/dbconfig/20240418-082845-marostegui.json
  • 08:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60869 and previous config saved to /var/cache/conftool/dbconfig/20240418-082717-arnaudb.json
  • 08:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2112.codfw.wmnet
  • 08:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2112.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 08:25 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2112.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 08:25 mforns@deploy1002: Finished deploy [analytics/refinery@be07da9]: Regular analytics weekly train [analytics/refinery@be07da9e] (duration: 14m 07s)
  • 08:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2027.codfw.wmnet
  • 08:23 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 08:15 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2112.codfw.wmnet
  • 08:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2112 depool', diff saved to https://phabricator.wikimedia.org/P60867 and previous config saved to /var/cache/conftool/dbconfig/20240418-081439-arnaudb.json
  • 08:13 kostajh: UTC morning deploys done
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T361627)', diff saved to https://phabricator.wikimedia.org/P60866 and previous config saved to /var/cache/conftool/dbconfig/20240418-081338-marostegui.json
  • 08:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60865 and previous config saved to /var/cache/conftool/dbconfig/20240418-081210-arnaudb.json
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T361627)', diff saved to https://phabricator.wikimedia.org/P60864 and previous config saved to /var/cache/conftool/dbconfig/20240418-081127-marostegui.json
  • 08:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 08:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T361627)', diff saved to https://phabricator.wikimedia.org/P60863 and previous config saved to /var/cache/conftool/dbconfig/20240418-081104-marostegui.json
  • 08:11 mforns@deploy1002: Started deploy [analytics/refinery@be07da9]: Regular analytics weekly train [analytics/refinery@be07da9e]
  • 08:10 kharlan@deploy1002: Finished scap: Backport for EventStreamConfig: Fix stream title for mediawiki.ip_reputation.score (T354597) (duration: 19m 36s)
  • 08:07 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2113.codfw.wmnet
  • 08:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2113.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 08:00 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2113.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 07:58 kharlan@deploy1002: urbanecm and kharlan: Continuing with sync
  • 07:57 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 07:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60862 and previous config saved to /var/cache/conftool/dbconfig/20240418-075704-arnaudb.json
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P60861 and previous config saved to /var/cache/conftool/dbconfig/20240418-075557-marostegui.json
  • 07:54 kharlan@deploy1002: urbanecm and kharlan: Backport for EventStreamConfig: Fix stream title for mediawiki.ip_reputation.score (T354597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:52 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2113.codfw.wmnet
  • 07:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db2113 depool', diff saved to https://phabricator.wikimedia.org/P60860 and previous config saved to /var/cache/conftool/dbconfig/20240418-075154-arnaudb.json
  • 07:51 kharlan@deploy1002: Started scap: Backport for EventStreamConfig: Fix stream title for mediawiki.ip_reputation.score (T354597)
  • 07:47 urbanecm@deploy1002: Finished scap: Backport for WikimediaEvents: Set IPoid URL and enable ip_reputation/score (2nd attempt) (T354597), ext-EventLogging: Add mediawiki.ip_reputation.score (T354597) (duration: 22m 27s)
  • 07:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 15%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60859 and previous config saved to /var/cache/conftool/dbconfig/20240418-074158-arnaudb.json
  • 07:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P60858 and previous config saved to /var/cache/conftool/dbconfig/20240418-074049-marostegui.json
  • 07:34 urbanecm@deploy1002: kharlan and urbanecm: Continuing with sync
  • 07:31 moritzm: upgrading PHP security updates on codfw baremetal servers T362511
  • 07:28 urbanecm@deploy1002: kharlan and urbanecm: Backport for WikimediaEvents: Set IPoid URL and enable ip_reputation/score (2nd attempt) (T354597), ext-EventLogging: Add mediawiki.ip_reputation.score (T354597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60857 and previous config saved to /var/cache/conftool/dbconfig/20240418-072653-arnaudb.json
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T361627)', diff saved to https://phabricator.wikimedia.org/P60856 and previous config saved to /var/cache/conftool/dbconfig/20240418-072542-marostegui.json
  • 07:25 urbanecm@deploy1002: Started scap: Backport for WikimediaEvents: Set IPoid URL and enable ip_reputation/score (2nd attempt) (T354597), ext-EventLogging: Add mediawiki.ip_reputation.score (T354597)
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2108 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60855 and previous config saved to /var/cache/conftool/dbconfig/20240418-072410-root.json
  • 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T361627)', diff saved to https://phabricator.wikimedia.org/P60854 and previous config saved to /var/cache/conftool/dbconfig/20240418-072331-marostegui.json
  • 07:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T361627)', diff saved to https://phabricator.wikimedia.org/P60853 and previous config saved to /var/cache/conftool/dbconfig/20240418-072309-marostegui.json
  • 07:21 urbanecm@deploy1002: Finished scap: Backport for [plwiki] Limit Content Translation publishing to mainspace for non-editors (T362756) (duration: 17m 15s)
  • 07:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 5%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60852 and previous config saved to /var/cache/conftool/dbconfig/20240418-071147-arnaudb.json
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2108 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60851 and previous config saved to /var/cache/conftool/dbconfig/20240418-070904-root.json
  • 07:08 urbanecm@deploy1002: msz2001 and urbanecm: Continuing with sync
  • 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P60850 and previous config saved to /var/cache/conftool/dbconfig/20240418-070801-marostegui.json
  • 07:07 urbanecm@deploy1002: msz2001 and urbanecm: Backport for [plwiki] Limit Content Translation publishing to mainspace for non-editors (T362756) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:04 urbanecm@deploy1002: Started scap: Backport for [plwiki] Limit Content Translation publishing to mainspace for non-editors (T362756)
  • 06:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 2%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60849 and previous config saved to /var/cache/conftool/dbconfig/20240418-065641-arnaudb.json
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2108 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60848 and previous config saved to /var/cache/conftool/dbconfig/20240418-065358-root.json
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P60847 and previous config saved to /var/cache/conftool/dbconfig/20240418-065254-marostegui.json
  • 06:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 1%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60846 and previous config saved to /var/cache/conftool/dbconfig/20240418-064135-arnaudb.json
  • 06:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 06:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2108 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60845 and previous config saved to /var/cache/conftool/dbconfig/20240418-063852-root.json
  • 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T361627)', diff saved to https://phabricator.wikimedia.org/P60844 and previous config saved to /var/cache/conftool/dbconfig/20240418-063746-marostegui.json
  • 06:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1183.eqiad.wmnet with OS bookworm
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T361627)', diff saved to https://phabricator.wikimedia.org/P60843 and previous config saved to /var/cache/conftool/dbconfig/20240418-063536-marostegui.json
  • 06:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 06:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2108 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60842 and previous config saved to /var/cache/conftool/dbconfig/20240418-062346-root.json
  • 06:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: host reimage
  • 06:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: host reimage
  • 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2108 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60841 and previous config saved to /var/cache/conftool/dbconfig/20240418-060841-root.json
  • 06:02 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1183.eqiad.wmnet with OS bookworm
  • 06:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 05:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: upgrade db1183 T360116
  • 05:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: upgrade db1183 T360116
  • 05:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2108.codfw.wmnet with OS bookworm
  • 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2108 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60840 and previous config saved to /var/cache/conftool/dbconfig/20240418-055335-root.json
  • 05:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 05:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 05:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1183 T362668', diff saved to https://phabricator.wikimedia.org/P60838 and previous config saved to /var/cache/conftool/dbconfig/20240418-054247-arnaudb.json
  • 05:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1230 to s5 primary and set section read-write T362668', diff saved to https://phabricator.wikimedia.org/P60837 and previous config saved to /var/cache/conftool/dbconfig/20240418-053852-arnaudb.json
  • 05:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T362668', diff saved to https://phabricator.wikimedia.org/P60836 and previous config saved to /var/cache/conftool/dbconfig/20240418-053657-arnaudb.json
  • 05:35 arnaudb: Starting s5 eqiad failover from db1183 to db1230 - T362668
  • 05:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2108.codfw.wmnet with reason: host reimage
  • 05:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2108.codfw.wmnet with reason: host reimage
  • 05:20 marostegui: dbmaint Upgrade s7 codfw to Bookworm and MariaDB 10.6 T362745
  • 05:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1230 with weight 0 T362668', diff saved to https://phabricator.wikimedia.org/P60835 and previous config saved to /var/cache/conftool/dbconfig/20240418-051639-arnaudb.json
  • 05:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s5 T362668
  • 05:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s5 T362668
  • 05:13 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2108.codfw.wmnet with OS bookworm
  • 05:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2108', diff saved to https://phabricator.wikimedia.org/P60834 and previous config saved to /var/cache/conftool/dbconfig/20240418-051129-root.json
  • 00:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P60833 and previous config saved to /var/cache/conftool/dbconfig/20240418-000639-ladsgroup.json
  • 00:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P60832 and previous config saved to /var/cache/conftool/dbconfig/20240418-000616-ladsgroup.json

2024-04-17

  • 23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P60831 and previous config saved to /var/cache/conftool/dbconfig/20240417-235105-ladsgroup.json
  • 23:48 amastilovic@deploy1002: Finished deploy [airflow-dags/analytics@c9d6969]: (no justification provided) (duration: 00m 37s)
  • 23:47 amastilovic@deploy1002: Started deploy [airflow-dags/analytics@c9d6969]: (no justification provided)
  • 23:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 23:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 23:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P60830 and previous config saved to /var/cache/conftool/dbconfig/20240417-233731-ladsgroup.json
  • 23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P60829 and previous config saved to /var/cache/conftool/dbconfig/20240417-233557-ladsgroup.json
  • 23:22 sukhe: sukhe@cp1114:~$ sudo -i haproxy-restart
  • 23:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P60828 and previous config saved to /var/cache/conftool/dbconfig/20240417-232221-ladsgroup.json
  • 23:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P60827 and previous config saved to /var/cache/conftool/dbconfig/20240417-232050-ladsgroup.json
  • 23:14 mutante: rsyncing jenkins data from contint2002 to contint1002, pre-sync in preparation for migration next week - /srv/jenkins (291G) and much smaller zuul and jenkins data dirs T334517
  • 23:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P60826 and previous config saved to /var/cache/conftool/dbconfig/20240417-230714-ladsgroup.json
  • 22:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P60825 and previous config saved to /var/cache/conftool/dbconfig/20240417-225206-ladsgroup.json
  • 22:42 zabe@deploy1002: Finished scap: Backport for Revert "REST: Deprecate using "post" as the parameter source" (T362817) (duration: 17m 14s)
  • 22:29 zabe@deploy1002: jforrester and zabe: Continuing with sync
  • 22:28 zabe@deploy1002: jforrester and zabe: Backport for Revert "REST: Deprecate using "post" as the parameter source" (T362817) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:24 zabe@deploy1002: Started scap: Backport for Revert "REST: Deprecate using "post" as the parameter source" (T362817)
  • 22:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 19 hosts with reason: T362508
  • 22:10 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 19 hosts with reason: T362508
  • 21:50 mutante: deploying scap config change (gerrit:1020321) - [cumin2002:~] $ sudo cumin -b 4 -s 40 'C:scap AND mw*' 'run-puppet-agent' T359643
  • 21:09 mutante: DNS - created ae.wikimedia.org for United Arab Emirates User Group wiki - T362529
  • 21:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T361627)', diff saved to https://phabricator.wikimedia.org/P60824 and previous config saved to /var/cache/conftool/dbconfig/20240417-210256-marostegui.json
  • 20:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P60823 and previous config saved to /var/cache/conftool/dbconfig/20240417-204748-marostegui.json
  • 20:44 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 20:44 eevans@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 20:44 cjming: end of UTC late backport window
  • 20:44 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 20:43 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 20:43 cjming@deploy1002: Finished scap: Backport for Upstream tablet infobox styles (T3603861), Upstream tablet infobox styles (T3603861) (duration: 17m 30s)
  • 20:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P60822 and previous config saved to /var/cache/conftool/dbconfig/20240417-203241-marostegui.json
  • 20:30 cjming@deploy1002: cjming and jdlrobson: Continuing with sync
  • 20:29 cjming@deploy1002: cjming and jdlrobson: Backport for Upstream tablet infobox styles (T3603861), Upstream tablet infobox styles (T3603861) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:26 cjming@deploy1002: Started scap: Backport for Upstream tablet infobox styles (T3603861), Upstream tablet infobox styles (T3603861)
  • 20:25 cjming@deploy1002: Finished scap: Backport for Enable WikimediaSkinStyles on English Wikipedia Vector 2022 skin (T362726), Enable night mode in AMC for all projects (T361555) (duration: 18m 13s)
  • 20:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T361627)', diff saved to https://phabricator.wikimedia.org/P60821 and previous config saved to /var/cache/conftool/dbconfig/20240417-201733-marostegui.json
  • 20:11 cjming@deploy1002: cjming and jdlrobson: Continuing with sync
  • 20:09 cjming@deploy1002: cjming and jdlrobson: Backport for Enable WikimediaSkinStyles on English Wikipedia Vector 2022 skin (T362726), Enable night mode in AMC for all projects (T361555) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:06 cjming@deploy1002: Started scap: Backport for Enable WikimediaSkinStyles on English Wikipedia Vector 2022 skin (T362726), Enable night mode in AMC for all projects (T361555)
  • 19:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T361627)', diff saved to https://phabricator.wikimedia.org/P60820 and previous config saved to /var/cache/conftool/dbconfig/20240417-195628-marostegui.json
  • 19:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 19:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 19:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T361627)', diff saved to https://phabricator.wikimedia.org/P60819 and previous config saved to /var/cache/conftool/dbconfig/20240417-195605-marostegui.json
  • 19:46 eileen: civicrm upgraded from fdd12ed1 to 28adb4da
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P60818 and previous config saved to /var/cache/conftool/dbconfig/20240417-194058-marostegui.json
  • 19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P60817 and previous config saved to /var/cache/conftool/dbconfig/20240417-192551-marostegui.json
  • 19:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T361627)', diff saved to https://phabricator.wikimedia.org/P60816 and previous config saved to /var/cache/conftool/dbconfig/20240417-191043-marostegui.json
  • 18:56 ejegg: payments-wiki upgraded from 72e3bf19 to fb0367a4
  • 18:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T361627)', diff saved to https://phabricator.wikimedia.org/P60815 and previous config saved to /var/cache/conftool/dbconfig/20240417-184931-marostegui.json
  • 18:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 18:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 18:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T361627)', diff saved to https://phabricator.wikimedia.org/P60814 and previous config saved to /var/cache/conftool/dbconfig/20240417-184908-marostegui.json
  • 18:35 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.1 refs T361395
  • 18:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P60813 and previous config saved to /var/cache/conftool/dbconfig/20240417-183401-marostegui.json
  • 18:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P60812 and previous config saved to /var/cache/conftool/dbconfig/20240417-181854-marostegui.json
  • 18:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T361627)', diff saved to https://phabricator.wikimedia.org/P60810 and previous config saved to /var/cache/conftool/dbconfig/20240417-180346-marostegui.json
  • 17:59 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:59 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:57 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:57 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T361627)', diff saved to https://phabricator.wikimedia.org/P60809 and previous config saved to /var/cache/conftool/dbconfig/20240417-174233-marostegui.json
  • 17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 17:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 17:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T361627)', diff saved to https://phabricator.wikimedia.org/P60808 and previous config saved to /var/cache/conftool/dbconfig/20240417-174210-marostegui.json
  • 17:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P60807 and previous config saved to /var/cache/conftool/dbconfig/20240417-172702-marostegui.json
  • 17:14 sukhe: running authdns-update for adding magru geo-resources/IPs: T346722
  • 17:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P60805 and previous config saved to /var/cache/conftool/dbconfig/20240417-171154-marostegui.json
  • 16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T361627)', diff saved to https://phabricator.wikimedia.org/P60804 and previous config saved to /var/cache/conftool/dbconfig/20240417-165647-marostegui.json
  • 16:56 topranks: running authdns-update to make magru dns records live T362421
  • 16:47 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 16:46 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 16:45 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 16:45 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 16:45 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 16:44 btullis@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 16:39 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 16:39 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 16:39 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 16:39 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 16:38 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 16:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 16:36 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:36 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 16:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 16:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T361627)', diff saved to https://phabricator.wikimedia.org/P60803 and previous config saved to /var/cache/conftool/dbconfig/20240417-163532-marostegui.json
  • 16:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 16:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 16:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T361627)', diff saved to https://phabricator.wikimedia.org/P60802 and previous config saved to /var/cache/conftool/dbconfig/20240417-163518-marostegui.json
  • 16:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T360332)', diff saved to https://phabricator.wikimedia.org/P60801 and previous config saved to /var/cache/conftool/dbconfig/20240417-163506-arnaudb.json
  • 16:30 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 16:29 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:29 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 16:29 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 16:27 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 16:25 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:24 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 16:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P60800 and previous config saved to /var/cache/conftool/dbconfig/20240417-162008-marostegui.json
  • 16:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P60799 and previous config saved to /var/cache/conftool/dbconfig/20240417-161958-arnaudb.json
  • 16:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:18 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 16:17 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 16:14 claime: restarted rsyslog on mw2412 - T357616
  • 16:13 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2119.codfw.wmnet
  • 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 cdanis: above conftool actions had no impact on production, no dbctl config commit was performed.
  • 16:09 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 16:08 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 16:07 cdanis@cumin1002: conftool action : set/host_ip=10.64.16.8; selector: name=db1211
  • 16:07 cdanis@cumin1002: conftool action : set/host_ip=1.1.1.1; selector: name=db1211
  • 16:06 cdanis@cumin1002: conftool action : set/host_ip=10.64.16.8; selector: name=db1211
  • 16:06 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 16:05 cdanis@cumin1002: conftool action : set/host_ip=69.69.69.69; selector: name=db1211
  • 16:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P60798 and previous config saved to /var/cache/conftool/dbconfig/20240417-160501-marostegui.json
  • 16:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P60797 and previous config saved to /var/cache/conftool/dbconfig/20240417-160451-arnaudb.json
  • 16:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db2119 depool T358741', diff saved to https://phabricator.wikimedia.org/P60796 and previous config saved to /var/cache/conftool/dbconfig/20240417-160443-arnaudb.json
  • 16:04 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 16:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1114.eqiad.wmnet,service=(cdn|ats-be)
  • 16:04 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 16:04 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 16:03 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 16:03 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 16:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 16:03 btullis@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 16:02 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2119.codfw.wmnet
  • 16:00 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:00 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 15:59 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 15:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:53 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 15:52 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 15:51 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1114.eqiad.wmnet with OS bullseye
  • 15:50 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2120.codfw.wmnet
  • 15:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2120.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 15:44 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2120.codfw.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 15:42 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 15:40 topranks: merging patch and updating dns servers with new magru ranges T362421
  • 15:35 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2120.codfw.wmnet
  • 15:34 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 15:33 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding first entries for magru IPs - cmooney@cumin1002"
  • 15:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T360332)', diff saved to https://phabricator.wikimedia.org/P60795 and previous config saved to /var/cache/conftool/dbconfig/20240417-153238-arnaudb.json
  • 15:31 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 15:31 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:30 topranks: making magru IPs live in netbox and generating DNS records with cookbook T362421
  • 15:27 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
  • 15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T361627)', diff saved to https://phabricator.wikimedia.org/P60794 and previous config saved to /var/cache/conftool/dbconfig/20240417-152023-marostegui.json
  • 15:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2120 depool T358741', diff saved to https://phabricator.wikimedia.org/P60793 and previous config saved to /var/cache/conftool/dbconfig/20240417-151811-arnaudb.json
  • 15:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2127 (T360332)', diff saved to https://phabricator.wikimedia.org/P60792 and previous config saved to /var/cache/conftool/dbconfig/20240417-151653-arnaudb.json
  • 15:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:13 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:12 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp1115.eqiad.wmnet
  • 15:07 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS bullseye
  • 15:06 vgutierrez: repool ncredir2001
  • 15:05 Lucas_WMDE: UTC afternoon backport+config window (belatedly) done
  • 15:04 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes mlwiki --fix # T362653: 0 pages to fix, 0 were resolvable; 82 links to fix, 82 were resolvable, 0 were deleted.
  • 15:03 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for mlwiki: create draft namespace (T362653) (duration: 32m 43s)
  • 14:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T361627)', diff saved to https://phabricator.wikimedia.org/P60790 and previous config saved to /var/cache/conftool/dbconfig/20240417-145916-marostegui.json
  • 14:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 14:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 14:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T361627)', diff saved to https://phabricator.wikimedia.org/P60789 and previous config saved to /var/cache/conftool/dbconfig/20240417-145838-marostegui.json
  • 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P60788 and previous config saved to /var/cache/conftool/dbconfig/20240417-145136-ladsgroup.json
  • 14:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P60787 and previous config saved to /var/cache/conftool/dbconfig/20240417-145113-ladsgroup.json
  • 14:50 logmsgbot: lucaswerkmeister-wmde@deploy1002 anzx and lucaswerkmeister-wmde: Continuing with sync
  • 14:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P60786 and previous config saved to /var/cache/conftool/dbconfig/20240417-144330-marostegui.json
  • 14:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P60785 and previous config saved to /var/cache/conftool/dbconfig/20240417-143606-ladsgroup.json
  • 14:34 logmsgbot: lucaswerkmeister-wmde@deploy1002 anzx and lucaswerkmeister-wmde: Backport for mlwiki: create draft namespace (T362653) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: insetup::data_persistence
  • 14:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2120 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60784 and previous config saved to /var/cache/conftool/dbconfig/20240417-143103-root.json
  • 14:31 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for mlwiki: create draft namespace (T362653)
  • 14:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P60783 and previous config saved to /var/cache/conftool/dbconfig/20240417-142823-marostegui.json
  • 14:22 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:22 sukhe@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp1114.eqiad.wmnet
  • 14:21 sukhe@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1114.eqiad.wmnet
  • 14:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P60782 and previous config saved to /var/cache/conftool/dbconfig/20240417-142057-ladsgroup.json
  • 14:20 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:20 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:20 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1114.eqiad.wmnet,service=(cdn|ats-be)
  • 14:20 sukhe: depool cp1114.eqiad.wmnet for PXE boot testing issues and downgrade NIC firmware: T350179
  • 14:19 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:19 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:18 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2120 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60781 and previous config saved to /var/cache/conftool/dbconfig/20240417-141557-root.json
  • 14:15 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:13 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T361627)', diff saved to https://phabricator.wikimedia.org/P60780 and previous config saved to /var/cache/conftool/dbconfig/20240417-141314-marostegui.json
  • 14:13 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:08 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: insetup::data_persistence
  • 14:08 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Revert "WikimediaEvents: Set IPoid URL and enable ip_reputation/score" (duration: 16m 49s)
  • 14:06 vgutierrez: depool ncredir2001
  • 14:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P60779 and previous config saved to /var/cache/conftool/dbconfig/20240417-140549-ladsgroup.json
  • 14:02 sukhe: running authdns-update for adding magru to geo-maps: T346722
  • 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2120 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60778 and previous config saved to /var/cache/conftool/dbconfig/20240417-140051-root.json
  • 13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 trainbranchbot and lucaswerkmeister-wmde: Continuing with sync
  • 13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 trainbranchbot and lucaswerkmeister-wmde: Backport for Revert "WikimediaEvents: Set IPoid URL and enable ip_reputation/score" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T361627)', diff saved to https://phabricator.wikimedia.org/P60777 and previous config saved to /var/cache/conftool/dbconfig/20240417-135253-marostegui.json
  • 13:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 13:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Revert "WikimediaEvents: Set IPoid URL and enable ip_reputation/score"
  • 13:49 logmsgbot: lucaswerkmeister-wmde@deploy1002 Sync cancelled.
  • 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2120 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60776 and previous config saved to /var/cache/conftool/dbconfig/20240417-134545-root.json
  • 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1033.eqiad.wmnet
  • 13:36 sukhe: running authdns-update for CR 1020823
  • 13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 kharlan and lucaswerkmeister-wmde: Backport for WikimediaEvents: Set IPoid URL and enable ip_reputation/score (T354597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for WikimediaEvents: Set IPoid URL and enable ip_reputation/score (T354597)
  • 13:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T361627)', diff saved to https://phabricator.wikimedia.org/P60775 and previous config saved to /var/cache/conftool/dbconfig/20240417-133318-marostegui.json
  • 13:32 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1115.eqiad.wmnet,service=(cdn|ats-be)
  • 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2120 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60774 and previous config saved to /var/cache/conftool/dbconfig/20240417-133040-root.json
  • 13:29 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es1033.eqiad.wmnet
  • 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1026.eqiad.wmnet
  • 13:23 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp1115.eqiad.wmnet
  • 13:23 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp1115.eqiad.wmnet
  • 13:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P60773 and previous config saved to /var/cache/conftool/dbconfig/20240417-131811-marostegui.json
  • 13:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es1026.eqiad.wmnet
  • 13:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2033.codfw.wmnet
  • 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2120 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60772 and previous config saved to /var/cache/conftool/dbconfig/20240417-131533-root.json
  • 13:11 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2033.codfw.wmnet
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2031.codfw.wmnet
  • 13:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2120.codfw.wmnet with OS bookworm
  • 13:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P60771 and previous config saved to /var/cache/conftool/dbconfig/20240417-130303-marostegui.json
  • 13:01 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2031.codfw.wmnet
  • 13:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2120 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60770 and previous config saved to /var/cache/conftool/dbconfig/20240417-130027-root.json
  • 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2026.codfw.wmnet
  • 12:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T361627)', diff saved to https://phabricator.wikimedia.org/P60769 and previous config saved to /var/cache/conftool/dbconfig/20240417-124756-marostegui.json
  • 12:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2127 (T361627)', diff saved to https://phabricator.wikimedia.org/P60768 and previous config saved to /var/cache/conftool/dbconfig/20240417-122748-marostegui.json
  • 12:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T361627)', diff saved to https://phabricator.wikimedia.org/P60767 and previous config saved to /var/cache/conftool/dbconfig/20240417-122725-marostegui.json
  • 12:25 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2120.codfw.wmnet with OS bookworm
  • 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2120', diff saved to https://phabricator.wikimedia.org/P60766 and previous config saved to /var/cache/conftool/dbconfig/20240417-122150-root.json
  • 12:12 vgutierrez: repool ncredir2001
  • 12:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P60765 and previous config saved to /var/cache/conftool/dbconfig/20240417-121218-marostegui.json
  • 12:06 moritzm: upgrading PHP on mediawiki baremetal canaries servers T362511
  • 11:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P60763 and previous config saved to /var/cache/conftool/dbconfig/20240417-115709-marostegui.json
  • 11:57 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2032.codfw.wmnet
  • 11:44 vgutierrez: depool ncredir2001
  • 11:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T361627)', diff saved to https://phabricator.wikimedia.org/P60762 and previous config saved to /var/cache/conftool/dbconfig/20240417-114201-marostegui.json
  • 11:36 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 11:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2032.codfw.wmnet
  • 11:30 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:30 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1032.eqiad.wmnet
  • 11:29 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:24 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P60761 and previous config saved to /var/cache/conftool/dbconfig/20240417-112418-ladsgroup.json
  • 11:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:23 jiji@deploy1002: Finished scap: NoOp (duration: 09m 38s)
  • 11:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es1032.eqiad.wmnet
  • 11:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2109 (T361627)', diff saved to https://phabricator.wikimedia.org/P60760 and previous config saved to /var/cache/conftool/dbconfig/20240417-112040-marostegui.json
  • 11:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 11:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 11:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T361627)', diff saved to https://phabricator.wikimedia.org/P60759 and previous config saved to /var/cache/conftool/dbconfig/20240417-112017-marostegui.json
  • 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2030.codfw.wmnet
  • 11:13 jiji@deploy1002: Started scap: NoOp
  • 11:06 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2030.codfw.wmnet
  • 11:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P60758 and previous config saved to /var/cache/conftool/dbconfig/20240417-110510-marostegui.json
  • 11:04 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:53 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:53 jiji@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-ro,name=eqiad
  • 10:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P60757 and previous config saved to /var/cache/conftool/dbconfig/20240417-105002-marostegui.json
  • 10:46 jiji@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-ro,name=eqiad
  • 10:45 effie: pool eqiad back for mw-web-ro, mw-api-int-ro and mw-api-ext-ro
  • 10:44 jiji@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-int-ro,name=eqiad
  • 10:42 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1027.eqiad.wmnet
  • 10:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 10:41 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:41 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 10:38 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 10:37 jiji@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:36 jiji@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:36 jiji@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 10:36 jiji@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:35 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:35 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:35 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:35 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T361627)', diff saved to https://phabricator.wikimedia.org/P60756 and previous config saved to /var/cache/conftool/dbconfig/20240417-103455-marostegui.json
  • 10:34 akosiaris: apply the coredns patches for bumping instances from 4 to 6. They are noop, I am applying them to update helm's state.
  • 10:34 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:34 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:34 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:34 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es1027.eqiad.wmnet
  • 10:33 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es2028.codfw.wmnet
  • 10:22 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es2028.codfw.wmnet
  • 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2105 (T361627)', diff saved to https://phabricator.wikimedia.org/P60755 and previous config saved to /var/cache/conftool/dbconfig/20240417-101446-marostegui.json
  • 10:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:08 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
  • 10:08 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
  • 10:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:00 akosiaris: manually bump coredns in eqiad to 6
  • 09:59 akosiaris: manually bump coredns in codfw to 6
  • 09:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 09:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 09:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T361627)', diff saved to https://phabricator.wikimedia.org/P60753 and previous config saved to /var/cache/conftool/dbconfig/20240417-095731-marostegui.json
  • 09:44 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-api-ext-ro,name=eqiad
  • 09:44 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-api-int-ro,name=eqiad
  • 09:44 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-web-ro,name=eqiad
  • 09:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P60750 and previous config saved to /var/cache/conftool/dbconfig/20240417-094223-marostegui.json
  • 09:31 jiji@deploy1002: scap failed: KeyError 'production' (duration: 22m 21s)
  • 09:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60749 and previous config saved to /var/cache/conftool/dbconfig/20240417-092923-root.json
  • 09:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P60748 and previous config saved to /var/cache/conftool/dbconfig/20240417-092714-marostegui.json
  • 09:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60747 and previous config saved to /var/cache/conftool/dbconfig/20240417-091418-root.json
  • 09:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T361627)', diff saved to https://phabricator.wikimedia.org/P60746 and previous config saved to /var/cache/conftool/dbconfig/20240417-091203-marostegui.json
  • 09:08 jiji@deploy1002: Started scap: Switch mediawiki in eqiad to use node-local mcrouter ds - T346690
  • 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T361627)', diff saved to https://phabricator.wikimedia.org/P60745 and previous config saved to /var/cache/conftool/dbconfig/20240417-090539-marostegui.json
  • 09:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 09:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 09:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T361627)', diff saved to https://phabricator.wikimedia.org/P60744 and previous config saved to /var/cache/conftool/dbconfig/20240417-090516-marostegui.json
  • 09:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 08:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60743 and previous config saved to /var/cache/conftool/dbconfig/20240417-085912-root.json
  • 08:57 hashar@deploy1002: Finished scap: Backport for logging: pluralize $wmgDefaultMonologHandler (T238838) (duration: 16m 37s)
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P60742 and previous config saved to /var/cache/conftool/dbconfig/20240417-085009-marostegui.json
  • 08:44 hashar@deploy1002: hashar: Continuing with sync
  • 08:44 hashar@deploy1002: hashar: Backport for logging: pluralize $wmgDefaultMonologHandler (T238838) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60741 and previous config saved to /var/cache/conftool/dbconfig/20240417-084407-root.json
  • 08:41 hashar@deploy1002: Started scap: Backport for logging: pluralize $wmgDefaultMonologHandler (T238838)
  • 08:40 aqu: Deployed refinery using scap, then deployed onto hdfs
  • 08:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P60739 and previous config saved to /var/cache/conftool/dbconfig/20240417-083501-marostegui.json
  • 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60738 and previous config saved to /var/cache/conftool/dbconfig/20240417-082901-root.json
  • 08:26 aqu@deploy1002: Finished deploy [analytics/refinery@c4e197f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c4e197fa] (duration: 02m 23s)
  • 08:24 aqu@deploy1002: Started deploy [analytics/refinery@c4e197f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c4e197fa]
  • 08:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T361627)', diff saved to https://phabricator.wikimedia.org/P60737 and previous config saved to /var/cache/conftool/dbconfig/20240417-081953-marostegui.json
  • 08:16 aqu@deploy1002: Finished deploy [analytics/refinery@c4e197f] (thin): Regular analytics weekly train THIN [analytics/refinery@c4e197fa] (duration: 03m 39s)
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60736 and previous config saved to /var/cache/conftool/dbconfig/20240417-081356-root.json
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T361627)', diff saved to https://phabricator.wikimedia.org/P60735 and previous config saved to /var/cache/conftool/dbconfig/20240417-081326-marostegui.json
  • 08:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 08:13 aqu@deploy1002: Started deploy [analytics/refinery@c4e197f] (thin): Regular analytics weekly train THIN [analytics/refinery@c4e197fa]
  • 08:13 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 08:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T361627)', diff saved to https://phabricator.wikimedia.org/P60734 and previous config saved to /var/cache/conftool/dbconfig/20240417-081256-marostegui.json
  • 08:10 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2002.codfw.wmnet
  • 08:07 aqu@deploy1002: Finished deploy [analytics/refinery@c4e197f]: Regular analytics weekly train [analytics/refinery@c4e197fa] (duration: 27m 57s)
  • 08:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS bookworm
  • 08:00 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60733 and previous config saved to /var/cache/conftool/dbconfig/20240417-075850-root.json
  • 07:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P60732 and previous config saved to /var/cache/conftool/dbconfig/20240417-075748-marostegui.json
  • 07:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1173.eqiad.wmnet
  • 07:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage
  • 07:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P60731 and previous config saved to /var/cache/conftool/dbconfig/20240417-074241-marostegui.json
  • 07:40 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1173.eqiad.wmnet
  • 07:39 aqu@deploy1002: Started deploy [analytics/refinery@c4e197f]: Regular analytics weekly train [analytics/refinery@c4e197fa]
  • 07:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage
  • 07:39 aqu: analytics/refinery deploy begin (added source jars 0.2.35)
  • 07:38 jynus: restart db1216 database for mariadb upgrade
  • 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2214.codfw.wmnet
  • 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T361627)', diff saved to https://phabricator.wikimedia.org/P60730 and previous config saved to /var/cache/conftool/dbconfig/20240417-072733-marostegui.json
  • 07:27 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2214.codfw.wmnet
  • 07:26 jynus: restart db1240 database for mariadb upgrade
  • 07:22 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS bookworm
  • 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T361627)', diff saved to https://phabricator.wikimedia.org/P60729 and previous config saved to /var/cache/conftool/dbconfig/20240417-072122-marostegui.json
  • 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2150', diff saved to https://phabricator.wikimedia.org/P60728 and previous config saved to /var/cache/conftool/dbconfig/20240417-072115-root.json
  • 07:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T361627)', diff saved to https://phabricator.wikimedia.org/P60727 and previous config saved to /var/cache/conftool/dbconfig/20240417-072059-marostegui.json
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P60726 and previous config saved to /var/cache/conftool/dbconfig/20240417-070552-marostegui.json
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2182 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60725 and previous config saved to /var/cache/conftool/dbconfig/20240417-070206-root.json
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P60724 and previous config saved to /var/cache/conftool/dbconfig/20240417-065044-marostegui.json
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2182 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60723 and previous config saved to /var/cache/conftool/dbconfig/20240417-064700-root.json
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T361627)', diff saved to https://phabricator.wikimedia.org/P60722 and previous config saved to /var/cache/conftool/dbconfig/20240417-063537-marostegui.json
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2182 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60721 and previous config saved to /var/cache/conftool/dbconfig/20240417-063155-root.json
  • 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T361627)', diff saved to https://phabricator.wikimedia.org/P60720 and previous config saved to /var/cache/conftool/dbconfig/20240417-062918-marostegui.json
  • 06:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T361627)', diff saved to https://phabricator.wikimedia.org/P60719 and previous config saved to /var/cache/conftool/dbconfig/20240417-062856-marostegui.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2182 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60718 and previous config saved to /var/cache/conftool/dbconfig/20240417-061649-root.json
  • 06:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P60717 and previous config saved to /var/cache/conftool/dbconfig/20240417-061349-marostegui.json
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2182 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60716 and previous config saved to /var/cache/conftool/dbconfig/20240417-060143-root.json
  • 05:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P60715 and previous config saved to /var/cache/conftool/dbconfig/20240417-055841-marostegui.json
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2182 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60714 and previous config saved to /var/cache/conftool/dbconfig/20240417-054637-root.json
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T361627)', diff saved to https://phabricator.wikimedia.org/P60713 and previous config saved to /var/cache/conftool/dbconfig/20240417-054333-marostegui.json
  • 05:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T361627)', diff saved to https://phabricator.wikimedia.org/P60712 and previous config saved to /var/cache/conftool/dbconfig/20240417-053716-marostegui.json
  • 05:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 05:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T361627)', diff saved to https://phabricator.wikimedia.org/P60711 and previous config saved to /var/cache/conftool/dbconfig/20240417-053653-marostegui.json
  • 05:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS bookworm
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2182 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60710 and previous config saved to /var/cache/conftool/dbconfig/20240417-053131-root.json
  • 05:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P60709 and previous config saved to /var/cache/conftool/dbconfig/20240417-052600-ladsgroup.json
  • 05:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 05:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 05:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P60708 and previous config saved to /var/cache/conftool/dbconfig/20240417-052537-ladsgroup.json
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P60707 and previous config saved to /var/cache/conftool/dbconfig/20240417-052145-marostegui.json
  • 05:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
  • 05:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
  • 05:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P60706 and previous config saved to /var/cache/conftool/dbconfig/20240417-051029-ladsgroup.json
  • 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P60705 and previous config saved to /var/cache/conftool/dbconfig/20240417-050638-marostegui.json
  • 05:05 marostegui: Rename machine_vision tables on db1249 eqiad dbmaint s4 T362229
  • 05:00 marostegui: dbmaint Upgrade s7 codfw to Bookworm and MariaDB 10.6 T362745
  • 04:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P60704 and previous config saved to /var/cache/conftool/dbconfig/20240417-045522-ladsgroup.json
  • 04:55 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS bookworm
  • 04:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2182', diff saved to https://phabricator.wikimedia.org/P60703 and previous config saved to /var/cache/conftool/dbconfig/20240417-045353-root.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T361627)', diff saved to https://phabricator.wikimedia.org/P60702 and previous config saved to /var/cache/conftool/dbconfig/20240417-045130-marostegui.json
  • 04:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T361627)', diff saved to https://phabricator.wikimedia.org/P60701 and previous config saved to /var/cache/conftool/dbconfig/20240417-044517-marostegui.json
  • 04:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 04:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P60700 and previous config saved to /var/cache/conftool/dbconfig/20240417-044015-ladsgroup.json
  • 04:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 04:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 03:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P60699 and previous config saved to /var/cache/conftool/dbconfig/20240417-033948-ladsgroup.json
  • 03:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 03:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 03:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P60698 and previous config saved to /var/cache/conftool/dbconfig/20240417-033926-ladsgroup.json
  • 03:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P60697 and previous config saved to /var/cache/conftool/dbconfig/20240417-032418-ladsgroup.json
  • 03:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P60696 and previous config saved to /var/cache/conftool/dbconfig/20240417-030911-ladsgroup.json
  • 02:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P60695 and previous config saved to /var/cache/conftool/dbconfig/20240417-025403-ladsgroup.json
  • 02:48 ryankemper: T361525 Trying to powercycle `elastic2088` thru mgmt port (host not responding to ssh)
  • 02:43 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 02:43 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 02:43 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 02:43 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 02:43 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 02:42 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply

2024-04-16

  • 23:25 hmonroy@deploy1002: Finished scap: Backport for [mediawikiwiki] enable CodeMirror V6 (T357795) (duration: 17m 29s)
  • 23:12 hmonroy@deploy1002: musikanimal and hmonroy: Continuing with sync
  • 23:11 hmonroy@deploy1002: musikanimal and hmonroy: Backport for [mediawikiwiki] enable CodeMirror V6 (T357795) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:08 hmonroy@deploy1002: Started scap: Backport for [mediawikiwiki] enable CodeMirror V6 (T357795)
  • 23:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2009-dev.codfw.wmnet with OS bookworm
  • 23:06 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2009-dev.codfw.wmnet with reason: host reimage
  • 22:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2009-dev.codfw.wmnet with reason: host reimage
  • 22:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2009-dev.codfw.wmnet with OS bookworm
  • 21:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2009-dev.codfw.wmnet with OS bookworm
  • 21:48 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:47 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:47 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:47 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:46 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:46 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:46 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:45 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:45 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:45 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:44 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:42 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:38 cjming: end of UTC late backport window
  • 21:38 cjming@deploy1002: Finished scap: Backport for Use WikimediaMessages for template overrides (T361589) (duration: 19m 30s)
  • 21:25 cjming@deploy1002: jdlrobson and cjming: Continuing with sync
  • 21:21 cjming@deploy1002: jdlrobson and cjming: Backport for Use WikimediaMessages for template overrides (T361589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:18 cjming@deploy1002: Started scap: Backport for Use WikimediaMessages for template overrides (T361589)
  • 21:16 cjming@deploy1002: Finished scap: Backport for [phase 4] Vector-2022.js should no longer load legacy Vector site and user scripts/styles (T301212) (duration: 18m 26s)
  • 21:02 cjming@deploy1002: cjming and jdlrobson: Continuing with sync
  • 21:01 cjming@deploy1002: cjming and jdlrobson: Backport for [phase 4] Vector-2022.js should no longer load legacy Vector site and user scripts/styles (T301212) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:57 cjming@deploy1002: Started scap: Backport for [phase 4] Vector-2022.js should no longer load legacy Vector site and user scripts/styles (T301212)
  • 20:56 cjming@deploy1002: Finished scap: Backport for Thumbnail styles generalized and moved to core (T360388) (duration: 22m 48s)
  • 20:42 cjming@deploy1002: cjming and jdlrobson: Continuing with sync
  • 20:36 cjming@deploy1002: cjming and jdlrobson: Backport for Thumbnail styles generalized and moved to core (T360388) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:33 cjming@deploy1002: Started scap: Backport for Thumbnail styles generalized and moved to core (T360388)
  • 20:30 mutante: CI - jenkins and zuul-merger are re-enabled on contint1002 after distro upgrade to bullseye - T334517
  • 20:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2009-dev.codfw.wmnet with OS bookworm
  • 20:22 mutante: CI - re-enabled jenkins and zuul-merged on contint1002 after distro upgrade - T360964
  • 20:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T361627)', diff saved to https://phabricator.wikimedia.org/P60693 and previous config saved to /var/cache/conftool/dbconfig/20240416-202206-marostegui.json
  • 20:08 aqu: Weekly deploy of refinery using scap, then deployed onto hdfs
  • 20:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P60691 and previous config saved to /var/cache/conftool/dbconfig/20240416-200659-marostegui.json
  • 19:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2009-dev']
  • 19:53 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2009-dev']
  • 19:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P60690 and previous config saved to /var/cache/conftool/dbconfig/20240416-195151-marostegui.json
  • 19:47 hashar@deploy1002: Finished deploy [zuul/deploy@efce3ee]: Redeploy Zuul following host reimaging - T334517 (duration: 00m 08s)
  • 19:47 hashar@deploy1002: Started deploy [zuul/deploy@efce3ee]: Redeploy Zuul following host reimaging - T334517
  • 19:42 aqu@deploy1002: Finished deploy [analytics/refinery@59f7d09] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@59f7d091] (duration: 02m 24s)
  • 19:40 aqu@deploy1002: Started deploy [analytics/refinery@59f7d09] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@59f7d091]
  • 19:38 aqu@deploy1002: Finished deploy [analytics/refinery@59f7d09] (thin): Regular analytics weekly train THIN [analytics/refinery@59f7d091] (duration: 04m 10s)
  • 19:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T361627)', diff saved to https://phabricator.wikimedia.org/P60689 and previous config saved to /var/cache/conftool/dbconfig/20240416-193643-marostegui.json
  • 19:33 aqu@deploy1002: Started deploy [analytics/refinery@59f7d09] (thin): Regular analytics weekly train THIN [analytics/refinery@59f7d091]
  • 19:31 aqu@deploy1002: Finished deploy [analytics/refinery@59f7d09]: Regular analytics weekly train [analytics/refinery@59f7d091] (duration: 13m 08s)
  • 19:24 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@9208108]: Regular analytics weekly train [airflow-dags/analytics_test@9208108e] (duration: 00m 10s)
  • 19:23 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@9208108]: Regular analytics weekly train [airflow-dags/analytics_test@9208108e]
  • 19:18 aqu@deploy1002: Started deploy [analytics/refinery@59f7d09]: Regular analytics weekly train [analytics/refinery@59f7d091]
  • 19:17 aqu: Deployment train for analytics/refinery
  • 19:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T361627)', diff saved to https://phabricator.wikimedia.org/P60687 and previous config saved to /var/cache/conftool/dbconfig/20240416-191522-marostegui.json
  • 19:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 19:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 19:14 hashar@deploy1002: Finished deploy [zuul/deploy@efce3ee]: Redeploy Zuul following host reimaging - T334517 (duration: 00m 13s)
  • 19:14 hashar@deploy1002: Started deploy [zuul/deploy@efce3ee]: Redeploy Zuul following host reimaging - T334517
  • 19:12 hashar@deploy1002: Finished deploy [zuul/deploy@efce3ee]: Redeploy Zuul following host reimaging - T334517 (duration: 00m 03s)
  • 19:12 hashar@deploy1002: Started deploy [zuul/deploy@efce3ee]: Redeploy Zuul following host reimaging - T334517
  • 19:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 19:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 19:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T361627)', diff saved to https://phabricator.wikimedia.org/P60686 and previous config saved to /var/cache/conftool/dbconfig/20240416-191128-marostegui.json
  • 19:08 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.1 refs T361395
  • 18:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P60685 and previous config saved to /var/cache/conftool/dbconfig/20240416-185621-marostegui.json
  • 18:52 aqu@deploy1002: Finished deploy [airflow-dags/analytics@9208108]: Regular analytics weekly train [airflow-dags/analytics@9208108e] (duration: 00m 26s)
  • 18:52 aqu@deploy1002: Started deploy [airflow-dags/analytics@9208108]: Regular analytics weekly train [airflow-dags/analytics@9208108e]
  • 18:50 dancy@deploy1002: Installation of scap version "4.77.0" completed for 340 hosts
  • 18:49 dancy@deploy1002: Installing scap version "4.77.0" for 340 hosts
  • 18:48 dancy@deploy1002: Finished scap: Backport for [Parser] Temporarily disable deprecation warnings for dynamic properties (T362692) (duration: 22m 56s)
  • 18:44 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host contint1002.wikimedia.org with OS bullseye
  • 18:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P60684 and previous config saved to /var/cache/conftool/dbconfig/20240416-184113-marostegui.json
  • 18:40 mutante: contint1002 - sudo a2dismod mpm_event to work around known race condition and fix failed initial puppet run - T334517
  • 18:35 dancy@deploy1002: cscott and dancy: Continuing with sync
  • 18:29 bearloga@deploy1002: Finished deploy [airflow-dags/analytics_product@77af7cb]: (no justification provided) (duration: 00m 07s)
  • 18:29 bearloga@deploy1002: Started deploy [airflow-dags/analytics_product@77af7cb]: (no justification provided)
  • 18:29 dancy@deploy1002: cscott and dancy: Backport for [Parser] Temporarily disable deprecation warnings for dynamic properties (T362692) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T361627)', diff saved to https://phabricator.wikimedia.org/P60683 and previous config saved to /var/cache/conftool/dbconfig/20240416-182606-marostegui.json
  • 18:26 dancy@deploy1002: Started scap: Backport for [Parser] Temporarily disable deprecation warnings for dynamic properties (T362692)
  • 18:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T361627)', diff saved to https://phabricator.wikimedia.org/P60682 and previous config saved to /var/cache/conftool/dbconfig/20240416-181001-marostegui.json
  • 18:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 18:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 18:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T361627)', diff saved to https://phabricator.wikimedia.org/P60681 and previous config saved to /var/cache/conftool/dbconfig/20240416-180938-marostegui.json
  • 18:07 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on contint1002.wikimedia.org with reason: host reimage
  • 18:04 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on contint1002.wikimedia.org with reason: host reimage
  • 17:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P60680 and previous config saved to /var/cache/conftool/dbconfig/20240416-175431-marostegui.json
  • 17:52 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host contint1002.wikimedia.org with OS bullseye
  • 17:51 mutante: CI - jenkins on contint1002 disabled - reimaging in progress
  • 17:50 bearloga@deploy1002: Finished deploy [airflow-dags/analytics_product@bb33843]: (no justification provided) (duration: 00m 06s)
  • 17:50 bearloga@deploy1002: Started deploy [airflow-dags/analytics_product@bb33843]: (no justification provided)
  • 17:49 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on contint1002.wikimedia.org with reason: reimage https://phabricator.wikmedia.org/T334517
  • 17:48 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on contint1002.wikimedia.org with reason: reimage https://phabricator.wikmedia.org/T334517
  • 17:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P60679 and previous config saved to /var/cache/conftool/dbconfig/20240416-173923-marostegui.json
  • 17:37 mutante: CI - disabling zuul-merger on contint1002 - there is another on contint2002
  • 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2009-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60678 and previous config saved to /var/cache/conftool/dbconfig/20240416-172515-root.json
  • 17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T361627)', diff saved to https://phabricator.wikimedia.org/P60677 and previous config saved to /var/cache/conftool/dbconfig/20240416-172415-marostegui.json
  • 17:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T361627)', diff saved to https://phabricator.wikimedia.org/P60676 and previous config saved to /var/cache/conftool/dbconfig/20240416-172201-marostegui.json
  • 17:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 17:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 17:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 17:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T361627)', diff saved to https://phabricator.wikimedia.org/P60675 and previous config saved to /var/cache/conftool/dbconfig/20240416-171738-marostegui.json
  • 17:16 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 17:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P60674 and previous config saved to /var/cache/conftool/dbconfig/20240416-171047-ladsgroup.json
  • 17:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 17:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60673 and previous config saved to /var/cache/conftool/dbconfig/20240416-171010-root.json
  • 17:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P60672 and previous config saved to /var/cache/conftool/dbconfig/20240416-171006-ladsgroup.json
  • 17:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2009-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2009 DNS add - pt1979@cumin2002"
  • 17:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P60671 and previous config saved to /var/cache/conftool/dbconfig/20240416-170231-marostegui.json
  • 17:01 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2009 DNS add - pt1979@cumin2002"
  • 16:59 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60670 and previous config saved to /var/cache/conftool/dbconfig/20240416-165504-root.json
  • 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P60669 and previous config saved to /var/cache/conftool/dbconfig/20240416-165458-ladsgroup.json
  • 16:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P60668 and previous config saved to /var/cache/conftool/dbconfig/20240416-164722-marostegui.json
  • 16:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60666 and previous config saved to /var/cache/conftool/dbconfig/20240416-163958-root.json
  • 16:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P60665 and previous config saved to /var/cache/conftool/dbconfig/20240416-163951-ladsgroup.json
  • 16:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T361627)', diff saved to https://phabricator.wikimedia.org/P60664 and previous config saved to /var/cache/conftool/dbconfig/20240416-163215-marostegui.json
  • 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60663 and previous config saved to /var/cache/conftool/dbconfig/20240416-162926-arnaudb.json
  • 16:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T361627)', diff saved to https://phabricator.wikimedia.org/P60662 and previous config saved to /var/cache/conftool/dbconfig/20240416-162900-marostegui.json
  • 16:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T361627)', diff saved to https://phabricator.wikimedia.org/P60661 and previous config saved to /var/cache/conftool/dbconfig/20240416-162838-marostegui.json
  • 16:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60660 and previous config saved to /var/cache/conftool/dbconfig/20240416-162452-root.json
  • 16:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P60659 and previous config saved to /var/cache/conftool/dbconfig/20240416-162443-ladsgroup.json
  • 16:16 brennen: finished phabricator deploy for T362689 - believe things are currently stable
  • 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60657 and previous config saved to /var/cache/conftool/dbconfig/20240416-161420-arnaudb.json
  • 16:14 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:13 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:13 brennen@deploy1002: Finished deploy [phabricator/deployment@098b9c2]: deploy phab1004 for T362689 (duration: 00m 42s)
  • 16:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P60656 and previous config saved to /var/cache/conftool/dbconfig/20240416-161330-marostegui.json
  • 16:13 brennen@deploy1002: Started deploy [phabricator/deployment@098b9c2]: deploy phab1004 for T362689
  • 16:13 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_globalblocking-fixGlobalBlockWhitelist.service # T360516
  • 16:12 brennen@deploy1002: Finished deploy [phabricator/deployment@098b9c2]: test deploy phab2002 for T362689 (duration: 00m 32s)
  • 16:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2217.codfw.wmnet
  • 16:12 brennen@deploy1002: Started deploy [phabricator/deployment@098b9c2]: test deploy phab2002 for T362689
  • 16:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60655 and previous config saved to /var/cache/conftool/dbconfig/20240416-160946-root.json
  • 16:07 brennen: starting phabricator deploy for T362689
  • 16:06 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:01 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on cp1115.eqiad.wmnet with reason: testing PXE boot issues
  • 16:00 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on cp1115.eqiad.wmnet with reason: testing PXE boot issues
  • 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60654 and previous config saved to /var/cache/conftool/dbconfig/20240416-155914-arnaudb.json
  • 15:58 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2217.codfw.wmnet
  • 15:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P60653 and previous config saved to /var/cache/conftool/dbconfig/20240416-155823-marostegui.json
  • 15:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2127.codfw.wmnet with OS bookworm
  • 15:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1231.eqiad.wmnet
  • 15:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60652 and previous config saved to /var/cache/conftool/dbconfig/20240416-155440-root.json
  • 15:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 100%: Post clone', diff saved to https://phabricator.wikimedia.org/P60651 and previous config saved to /var/cache/conftool/dbconfig/20240416-154915-arnaudb.json
  • 15:48 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:47 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1231.eqiad.wmnet
  • 15:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1168.eqiad.wmnet
  • 15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60650 and previous config saved to /var/cache/conftool/dbconfig/20240416-154408-arnaudb.json
  • 15:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T361627)', diff saved to https://phabricator.wikimedia.org/P60649 and previous config saved to /var/cache/conftool/dbconfig/20240416-154316-marostegui.json
  • 15:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T361627)', diff saved to https://phabricator.wikimedia.org/P60648 and previous config saved to /var/cache/conftool/dbconfig/20240416-153902-marostegui.json
  • 15:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 15:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 15:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T361627)', diff saved to https://phabricator.wikimedia.org/P60647 and previous config saved to /var/cache/conftool/dbconfig/20240416-153839-marostegui.json
  • 15:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2127.codfw.wmnet with reason: host reimage
  • 15:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 75%: Post clone', diff saved to https://phabricator.wikimedia.org/P60646 and previous config saved to /var/cache/conftool/dbconfig/20240416-153408-arnaudb.json
  • 15:32 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1168.eqiad.wmnet
  • 15:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2127.codfw.wmnet with reason: host reimage
  • 15:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2129.codfw.wmnet
  • 15:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 15%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60645 and previous config saved to /var/cache/conftool/dbconfig/20240416-152902-arnaudb.json
  • 15:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P60644 and previous config saved to /var/cache/conftool/dbconfig/20240416-152331-marostegui.json
  • 15:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 50%: Post clone', diff saved to https://phabricator.wikimedia.org/P60643 and previous config saved to /var/cache/conftool/dbconfig/20240416-151902-arnaudb.json
  • 15:17 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2129.codfw.wmnet
  • 15:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 15:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 15:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T360332)', diff saved to https://phabricator.wikimedia.org/P60642 and previous config saved to /var/cache/conftool/dbconfig/20240416-151649-arnaudb.json
  • 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1224.eqiad.wmnet
  • 15:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60641 and previous config saved to /var/cache/conftool/dbconfig/20240416-151357-arnaudb.json
  • 15:13 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2127.codfw.wmnet with OS bookworm
  • 15:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2127 T362616', diff saved to https://phabricator.wikimedia.org/P60640 and previous config saved to /var/cache/conftool/dbconfig/20240416-151032-root.json
  • 15:09 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2205 to s3 primary T362616', diff saved to https://phabricator.wikimedia.org/P60639 and previous config saved to /var/cache/conftool/dbconfig/20240416-150933-root.json
  • 15:08 marostegui: Starting s3 codfw failover from db2127 to db2205 - T362616
  • 15:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P60638 and previous config saved to /var/cache/conftool/dbconfig/20240416-150824-marostegui.json
  • 15:07 brennen@deploy1002: Finished deploy [phabricator/deployment@7773191]: deploy phab1004 for T362666 (duration: 00m 30s)
  • 15:06 brennen@deploy1002: Started deploy [phabricator/deployment@7773191]: deploy phab1004 for T362666
  • 15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@7773191]: test deploy phab2002 for T362666 (duration: 00m 32s)
  • 15:05 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1224.eqiad.wmnet
  • 15:05 brennen@deploy1002: Started deploy [phabricator/deployment@7773191]: test deploy phab2002 for T362666
  • 15:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 20%: Post clone', diff saved to https://phabricator.wikimedia.org/P60637 and previous config saved to /var/cache/conftool/dbconfig/20240416-150356-arnaudb.json
  • 15:03 samtar@deploy1002: Finished scap: Backport for IS: Set Phonos to Inline Audio Player mode on test.wiki (duration: 17m 17s)
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P60636 and previous config saved to /var/cache/conftool/dbconfig/20240416-150141-arnaudb.json
  • 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1201.eqiad.wmnet
  • 14:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60635 and previous config saved to /var/cache/conftool/dbconfig/20240416-145851-arnaudb.json
  • 14:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T361627)', diff saved to https://phabricator.wikimedia.org/P60634 and previous config saved to /var/cache/conftool/dbconfig/20240416-145316-marostegui.json
  • 14:50 samtar@deploy1002: samtar: Continuing with sync
  • 14:50 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2205 with weight 0 T362616', diff saved to https://phabricator.wikimedia.org/P60633 and previous config saved to /var/cache/conftool/dbconfig/20240416-144957-root.json
  • 14:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s3 T362616
  • 14:49 samtar@deploy1002: samtar: Backport for IS: Set Phonos to Inline Audio Player mode on test.wiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s3 T362616
  • 14:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 10%: Post clone', diff saved to https://phabricator.wikimedia.org/P60632 and previous config saved to /var/cache/conftool/dbconfig/20240416-144850-arnaudb.json
  • 14:48 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 14:47 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1201.eqiad.wmnet
  • 14:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T361627)', diff saved to https://phabricator.wikimedia.org/P60631 and previous config saved to /var/cache/conftool/dbconfig/20240416-144727-marostegui.json
  • 14:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T361627)', diff saved to https://phabricator.wikimedia.org/P60630 and previous config saved to /var/cache/conftool/dbconfig/20240416-144704-marostegui.json
  • 14:46 samtar@deploy1002: Started scap: Backport for IS: Set Phonos to Inline Audio Player mode on test.wiki
  • 14:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P60629 and previous config saved to /var/cache/conftool/dbconfig/20240416-144634-arnaudb.json
  • 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2193.codfw.wmnet
  • 14:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 2%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60628 and previous config saved to /var/cache/conftool/dbconfig/20240416-144346-arnaudb.json
  • 14:43 taavi@deploy1002: Finished scap: Backport for Disallow changing email on Wikitech directly (T360883) (duration: 16m 24s)
  • 14:36 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2193.codfw.wmnet
  • 14:36 vgutierrez: pool ncredir2002
  • 14:33 vgutierrez: depool ncredir2002
  • 14:32 vgutierrez: pool ncredir2001
  • 14:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1187.eqiad.wmnet
  • 14:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P60627 and previous config saved to /var/cache/conftool/dbconfig/20240416-143157-marostegui.json
  • 14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T360332)', diff saved to https://phabricator.wikimedia.org/P60626 and previous config saved to /var/cache/conftool/dbconfig/20240416-143126-arnaudb.json
  • 14:30 taavi@deploy1002: taavi: Continuing with sync
  • 14:29 taavi@deploy1002: taavi: Backport for Disallow changing email on Wikitech directly (T360883) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P60625 and previous config saved to /var/cache/conftool/dbconfig/20240416-142840-arnaudb.json
  • 14:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T360332)', diff saved to https://phabricator.wikimedia.org/P60624 and previous config saved to /var/cache/conftool/dbconfig/20240416-142808-arnaudb.json
  • 14:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 14:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 14:26 taavi@deploy1002: Started scap: Backport for Disallow changing email on Wikitech directly (T360883)
  • 14:26 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 14:26 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 14:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1187.eqiad.wmnet
  • 14:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2114.codfw.wmnet
  • 14:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2123.codfw.wmnet with OS bookworm
  • 14:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P60623 and previous config saved to /var/cache/conftool/dbconfig/20240416-141649-marostegui.json
  • 14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Restrict local uploads to uploader user group in azwiki (T360847) (duration: 35m 04s)
  • 14:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2114.codfw.wmnet
  • 14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T361627)', diff saved to https://phabricator.wikimedia.org/P60622 and previous config saved to /var/cache/conftool/dbconfig/20240416-140142-marostegui.json
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T361627)', diff saved to https://phabricator.wikimedia.org/P60621 and previous config saved to /var/cache/conftool/dbconfig/20240416-135928-marostegui.json
  • 13:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T361627)', diff saved to https://phabricator.wikimedia.org/P60620 and previous config saved to /var/cache/conftool/dbconfig/20240416-135906-marostegui.json
  • 13:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: host reimage
  • 13:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: host reimage
  • 13:53 logmsgbot: lucaswerkmeister-wmde@deploy1002 nmw03 and lucaswerkmeister-wmde: Continuing with sync
  • 13:44 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 13:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P60619 and previous config saved to /var/cache/conftool/dbconfig/20240416-134358-marostegui.json
  • 13:43 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 13:43 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 13:38 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2123.codfw.wmnet with OS bookworm
  • 13:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: T360116
  • 13:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: T360116
  • 13:34 logmsgbot: lucaswerkmeister-wmde@deploy1002 nmw03 and lucaswerkmeister-wmde: Backport for Restrict local uploads to uploader user group in azwiki (T360847) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:33 vgutierrez: depool ncredir2001
  • 13:31 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Restrict local uploads to uploader user group in azwiki (T360847)
  • 13:29 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Remove 'obsolete-tag' from $wgSignatureAllowedLintErrors on Polish Wikipedia (T362414) (duration: 18m 39s)
  • 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P60618 and previous config saved to /var/cache/conftool/dbconfig/20240416-132851-marostegui.json
  • 13:23 vgutierrez: pool ncredir2001
  • 13:20 vgutierrez: depool ncredir2001
  • 13:20 vgutierrez: pool ncredir1001
  • 13:18 vgutierrez: depool ncredir1001
  • 13:17 vgutierrez: pool ncredir2001
  • 13:16 logmsgbot: lucaswerkmeister-wmde@deploy1002 msz2001 and lucaswerkmeister-wmde: Continuing with sync
  • 13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 msz2001 and lucaswerkmeister-wmde: Backport for Remove 'obsolete-tag' from $wgSignatureAllowedLintErrors on Polish Wikipedia (T362414) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T361627)', diff saved to https://phabricator.wikimedia.org/P60617 and previous config saved to /var/cache/conftool/dbconfig/20240416-131344-marostegui.json
  • 13:11 vgutierrez: depool ncredir2001
  • 13:11 vgutierrez: pool ncredir1001
  • 13:11 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Remove 'obsolete-tag' from $wgSignatureAllowedLintErrors on Polish Wikipedia (T362414)
  • 13:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T361627)', diff saved to https://phabricator.wikimedia.org/P60616 and previous config saved to /var/cache/conftool/dbconfig/20240416-130710-marostegui.json
  • 13:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:06 vgutierrez: depool ncredir1001
  • 13:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 13:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 13:01 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 13:01 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 12:57 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 12:55 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 12:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2124.codfw.wmnet
  • 12:13 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2124.codfw.wmnet
  • 12:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T361627)', diff saved to https://phabricator.wikimedia.org/P60615 and previous config saved to /var/cache/conftool/dbconfig/20240416-121211-marostegui.json
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2180.codfw.wmnet
  • 11:58 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2180.codfw.wmnet
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2151.codfw.wmnet
  • 11:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P60613 and previous config saved to /var/cache/conftool/dbconfig/20240416-115703-marostegui.json
  • 11:50 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2151.codfw.wmnet
  • 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2169.codfw.wmnet
  • 11:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P60612 and previous config saved to /var/cache/conftool/dbconfig/20240416-114155-marostegui.json
  • 11:30 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T361627)', diff saved to https://phabricator.wikimedia.org/P60611 and previous config saved to /var/cache/conftool/dbconfig/20240416-112648-marostegui.json
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T361627)', diff saved to https://phabricator.wikimedia.org/P60610 and previous config saved to /var/cache/conftool/dbconfig/20240416-112134-marostegui.json
  • 11:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 11:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 11:19 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2169.codfw.wmnet
  • 11:16 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:16 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 11:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T361627)', diff saved to https://phabricator.wikimedia.org/P60609 and previous config saved to /var/cache/conftool/dbconfig/20240416-111602-marostegui.json
  • 11:08 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 11:07 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1042.eqiad.wmnet
  • 11:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P60608 and previous config saved to /var/cache/conftool/dbconfig/20240416-110055-marostegui.json
  • 10:58 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1042.eqiad.wmnet
  • 10:56 hnowlan: disabling puppet on A:restbase before switching to cfssl
  • 10:55 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:55 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P60607 and previous config saved to /var/cache/conftool/dbconfig/20240416-104547-marostegui.json
  • 10:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T361627)', diff saved to https://phabricator.wikimedia.org/P60605 and previous config saved to /var/cache/conftool/dbconfig/20240416-103040-marostegui.json
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60604 and previous config saved to /var/cache/conftool/dbconfig/20240416-102540-root.json
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T361627)', diff saved to https://phabricator.wikimedia.org/P60603 and previous config saved to /var/cache/conftool/dbconfig/20240416-102510-marostegui.json
  • 10:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 10:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T361627)', diff saved to https://phabricator.wikimedia.org/P60602 and previous config saved to /var/cache/conftool/dbconfig/20240416-102447-marostegui.json
  • 10:20 moritzm: upgrading PHP on remaining mwdebug servers T362511
  • 10:19 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 10:15 jayme: updated rsyslog to 8.2404.0-1~bpo11+1 on all k8s nodes - T357616
  • 10:13 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:13 moritzm: uploaded PHP 7.4 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u2 to buster-wikimedia/component/icu67 T362511
  • 10:12 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:10 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:10 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60601 and previous config saved to /var/cache/conftool/dbconfig/20240416-101034-root.json
  • 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P60600 and previous config saved to /var/cache/conftool/dbconfig/20240416-100939-marostegui.json
  • 10:09 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:09 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:08 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:08 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:08 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:08 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:08 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:08 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:07 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:56 hnowlan@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 09:55 hnowlan@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60599 and previous config saved to /var/cache/conftool/dbconfig/20240416-095528-root.json
  • 09:54 hnowlan@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 09:54 hnowlan@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 09:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P60598 and previous config saved to /var/cache/conftool/dbconfig/20240416-095432-marostegui.json
  • 09:49 hnowlan@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 09:48 hnowlan@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 09:48 hnowlan@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 09:48 hnowlan@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60597 and previous config saved to /var/cache/conftool/dbconfig/20240416-094023-root.json
  • 09:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T361627)', diff saved to https://phabricator.wikimedia.org/P60596 and previous config saved to /var/cache/conftool/dbconfig/20240416-093924-marostegui.json
  • 09:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T361627)', diff saved to https://phabricator.wikimedia.org/P60595 and previous config saved to /var/cache/conftool/dbconfig/20240416-093318-marostegui.json
  • 09:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 09:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T361627)', diff saved to https://phabricator.wikimedia.org/P60594 and previous config saved to /var/cache/conftool/dbconfig/20240416-093255-marostegui.json
  • 09:31 arnaudb: Starting s5 codfw failover from db2123 to db2213 - T362614 (forgot to send it)
  • 09:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2123 T362614', diff saved to https://phabricator.wikimedia.org/P60593 and previous config saved to /var/cache/conftool/dbconfig/20240416-093041-arnaudb.json
  • 09:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2213 to s5 primary T362614', diff saved to https://phabricator.wikimedia.org/P60592 and previous config saved to /var/cache/conftool/dbconfig/20240416-092800-arnaudb.json
  • 09:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60591 and previous config saved to /var/cache/conftool/dbconfig/20240416-092517-root.json
  • 09:21 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudidm2001-dev.codfw.wmnet
  • 09:20 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudidm2001-dev.codfw.wmnet with OS bookworm
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P60590 and previous config saved to /var/cache/conftool/dbconfig/20240416-091747-marostegui.json
  • 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60588 and previous config saved to /var/cache/conftool/dbconfig/20240416-091009-root.json
  • 09:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2213 with weight 0 T362614', diff saved to https://phabricator.wikimedia.org/P60587 and previous config saved to /var/cache/conftool/dbconfig/20240416-090755-arnaudb.json
  • 09:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s5 T362614
  • 09:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s5 T362614
  • 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60586 and previous config saved to /var/cache/conftool/dbconfig/20240416-090625-arnaudb.json
  • 09:05 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudidm2001-dev.codfw.wmnet with reason: host reimage
  • 09:02 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudidm2001-dev.codfw.wmnet with reason: host reimage
  • 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P60585 and previous config saved to /var/cache/conftool/dbconfig/20240416-090240-marostegui.json
  • 08:59 jayme: updated rsyslog to 8.2404.0-1~bpo11+1 on wikikube eqiad - T357616
  • 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60584 and previous config saved to /var/cache/conftool/dbconfig/20240416-085503-root.json
  • 08:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60583 and previous config saved to /var/cache/conftool/dbconfig/20240416-085120-arnaudb.json
  • 08:48 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host cloudidm2001-dev.codfw.wmnet with OS bookworm
  • 08:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T361627)', diff saved to https://phabricator.wikimedia.org/P60582 and previous config saved to /var/cache/conftool/dbconfig/20240416-084733-marostegui.json
  • 08:47 jayme: updated rsyslog to 8.2404.0-1~bpo11+1 on wikikube codfw - T357616
  • 08:46 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cloudidm2001-dev.codfw.wmnet - slyngshede@cumin1002"
  • 08:45 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cloudidm2001-dev.codfw.wmnet - slyngshede@cumin1002"
  • 08:45 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudidm2001-dev.codfw.wmnet on all recursors
  • 08:45 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache cloudidm2001-dev.codfw.wmnet on all recursors
  • 08:45 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:45 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudidm2001-dev.codfw.wmnet - slyngshede@cumin1002"
  • 08:44 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cloudidm2001-dev.codfw.wmnet - slyngshede@cumin1002"
  • 08:42 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 08:42 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host cloudidm2001-dev.codfw.wmnet
  • 08:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T361627)', diff saved to https://phabricator.wikimedia.org/P60581 and previous config saved to /var/cache/conftool/dbconfig/20240416-084118-marostegui.json
  • 08:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T361627)', diff saved to https://phabricator.wikimedia.org/P60580 and previous config saved to /var/cache/conftool/dbconfig/20240416-084055-marostegui.json
  • 08:36 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60579 and previous config saved to /var/cache/conftool/dbconfig/20240416-083614-arnaudb.json
  • 08:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2105.codfw.wmnet with OS bookworm
  • 08:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P60578 and previous config saved to /var/cache/conftool/dbconfig/20240416-082548-marostegui.json
  • 08:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60577 and previous config saved to /var/cache/conftool/dbconfig/20240416-082108-arnaudb.json
  • 08:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1161.eqiad.wmnet with OS bookworm
  • 08:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2105.codfw.wmnet with reason: host reimage
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P60576 and previous config saved to /var/cache/conftool/dbconfig/20240416-081040-marostegui.json
  • 08:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2105.codfw.wmnet with reason: host reimage
  • 07:56 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 07:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: host reimage
  • 07:56 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T361627)', diff saved to https://phabricator.wikimedia.org/P60575 and previous config saved to /var/cache/conftool/dbconfig/20240416-075533-marostegui.json
  • 07:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: host reimage
  • 07:52 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2105.codfw.wmnet with OS bookworm
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2105', diff saved to https://phabricator.wikimedia.org/P60574 and previous config saved to /var/cache/conftool/dbconfig/20240416-075056-root.json
  • 07:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T361627)', diff saved to https://phabricator.wikimedia.org/P60573 and previous config saved to /var/cache/conftool/dbconfig/20240416-074952-marostegui.json
  • 07:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 07:49 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 07:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 07:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T361627)', diff saved to https://phabricator.wikimedia.org/P60572 and previous config saved to /var/cache/conftool/dbconfig/20240416-074928-marostegui.json
  • 07:43 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 07:43 volans@cumin1002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox-canary
  • 07:40 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 07:40 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1161.eqiad.wmnet with OS bookworm
  • 07:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db[1154,1161].eqiad.wmnet with reason: T360116
  • 07:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db[1154,1161].eqiad.wmnet with reason: T360116
  • 07:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1161.eqiad.wmnet with reason: T360116
  • 07:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db1161.eqiad.wmnet with reason: T360116
  • 07:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db1161 depool T360116', diff saved to https://phabricator.wikimedia.org/P60571 and previous config saved to /var/cache/conftool/dbconfig/20240416-073521-arnaudb.json
  • 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P60570 and previous config saved to /var/cache/conftool/dbconfig/20240416-073420-marostegui.json
  • 07:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P60569 and previous config saved to /var/cache/conftool/dbconfig/20240416-071913-marostegui.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60568 and previous config saved to /var/cache/conftool/dbconfig/20240416-071611-root.json
  • 07:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T361627)', diff saved to https://phabricator.wikimedia.org/P60567 and previous config saved to /var/cache/conftool/dbconfig/20240416-070405-marostegui.json
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T361627)', diff saved to https://phabricator.wikimedia.org/P60566 and previous config saved to /var/cache/conftool/dbconfig/20240416-070139-marostegui.json
  • 07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60565 and previous config saved to /var/cache/conftool/dbconfig/20240416-070105-root.json
  • 07:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T361627)', diff saved to https://phabricator.wikimedia.org/P60564 and previous config saved to /var/cache/conftool/dbconfig/20240416-070100-marostegui.json
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60563 and previous config saved to /var/cache/conftool/dbconfig/20240416-064559-root.json
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P60562 and previous config saved to /var/cache/conftool/dbconfig/20240416-064552-marostegui.json
  • 06:37 volans: upgraed spicerack to v8.5.0 on cumin1002
  • 06:36 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v8.5.0
  • 06:36 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2002.codfw.wmnet with reason: test spicerack v8.5.0
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60561 and previous config saved to /var/cache/conftool/dbconfig/20240416-063053-root.json
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P60560 and previous config saved to /var/cache/conftool/dbconfig/20240416-063045-marostegui.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60559 and previous config saved to /var/cache/conftool/dbconfig/20240416-061546-root.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T361627)', diff saved to https://phabricator.wikimedia.org/P60558 and previous config saved to /var/cache/conftool/dbconfig/20240416-061536-marostegui.json
  • 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T361627)', diff saved to https://phabricator.wikimedia.org/P60557 and previous config saved to /var/cache/conftool/dbconfig/20240416-060826-marostegui.json
  • 06:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 06:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T361627)', diff saved to https://phabricator.wikimedia.org/P60556 and previous config saved to /var/cache/conftool/dbconfig/20240416-060803-marostegui.json
  • 06:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60555 and previous config saved to /var/cache/conftool/dbconfig/20240416-060034-root.json
  • 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P60554 and previous config saved to /var/cache/conftool/dbconfig/20240416-055256-marostegui.json
  • 05:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P60553 and previous config saved to /var/cache/conftool/dbconfig/20240416-055237-ladsgroup.json
  • 05:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P60552 and previous config saved to /var/cache/conftool/dbconfig/20240416-055215-ladsgroup.json
  • 05:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2156.codfw.wmnet with OS bookworm
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60551 and previous config saved to /var/cache/conftool/dbconfig/20240416-054528-root.json
  • 05:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2107', diff saved to https://phabricator.wikimedia.org/P60550 and previous config saved to /var/cache/conftool/dbconfig/20240416-053749-marostegui.json
  • 05:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P60549 and previous config saved to /var/cache/conftool/dbconfig/20240416-053706-ladsgroup.json
  • 05:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2156.codfw.wmnet with reason: host reimage
  • 05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2156.codfw.wmnet with reason: host reimage
  • 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T361627)', diff saved to https://phabricator.wikimedia.org/P60548 and previous config saved to /var/cache/conftool/dbconfig/20240416-052241-marostegui.json
  • 05:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P60547 and previous config saved to /var/cache/conftool/dbconfig/20240416-052158-ladsgroup.json
  • 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2107 (T361627)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240416-051623-marostegui.json
  • 05:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 05:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 05:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 05:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 05:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P60546 and previous config saved to /var/cache/conftool/dbconfig/20240416-050651-ladsgroup.json
  • 05:04 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2156.codfw.wmnet with OS bookworm
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2156', diff saved to https://phabricator.wikimedia.org/P60545 and previous config saved to /var/cache/conftool/dbconfig/20240416-050315-root.json
  • 04:03 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.1 refs T361395 (duration: 57m 31s)
  • 03:05 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.1 refs T361395
  • 03:03 mwpresync@deploy1002: Pruned MediaWiki: 1.42.0-wmf.24 (duration: 03m 11s)
  • 02:51 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:51 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:42 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:42 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:38 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:38 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply

2024-04-15

  • 23:35 eileen: civicrm upgraded from 0445bfaa to fdd12ed1
  • 23:17 eileen: civicrm upgraded from 4d5a4fc3 to 0445bfaa
  • 22:57 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:57 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:44 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:44 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 19 hosts with reason: T362508
  • 22:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 19 hosts with reason: T362508
  • 21:48 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:48 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:45 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:45 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:44 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:44 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:38 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:37 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:30 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:30 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:14 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:14 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:01 kindrobot: closing the UTC late backport window
  • 21:00 kindrobot@deploy1002: Finished scap: Backport for zhwikivoyage: Make RelatedArticles extension usable on zhwikivoyage (T361427) (duration: 18m 30s)
  • 20:48 kindrobot@deploy1002: s8321414 and kindrobot: Continuing with sync
  • 20:44 kindrobot@deploy1002: s8321414 and kindrobot: Backport for zhwikivoyage: Make RelatedArticles extension usable on zhwikivoyage (T361427) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:42 kindrobot@deploy1002: Started scap: Backport for zhwikivoyage: Make RelatedArticles extension usable on zhwikivoyage (T361427)
  • 20:37 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:37 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:36 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:36 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:36 kindrobot@deploy1002: Finished scap: Backport for Enable desktop watchlist on beta cluster, clean up old references (T109277), Enable night mode on template namespace (duration: 17m 06s)
  • 20:35 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:34 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:34 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:34 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 20:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 20:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T356166)', diff saved to https://phabricator.wikimedia.org/P60539 and previous config saved to /var/cache/conftool/dbconfig/20240415-202943-marostegui.json
  • 20:24 kindrobot@deploy1002: jdlrobson and kindrobot: Continuing with sync
  • 20:21 kindrobot@deploy1002: jdlrobson and kindrobot: Backport for Enable desktop watchlist on beta cluster, clean up old references (T109277), Enable night mode on template namespace synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 kindrobot@deploy1002: Started scap: Backport for Enable desktop watchlist on beta cluster, clean up old references (T109277), Enable night mode on template namespace
  • 20:19 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:19 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P60538 and previous config saved to /var/cache/conftool/dbconfig/20240415-201436-marostegui.json
  • 20:13 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:13 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:12 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:12 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:06 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:06 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:05 kindrobot: staring UTC late backport window
  • 19:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P60537 and previous config saved to /var/cache/conftool/dbconfig/20240415-195928-marostegui.json
  • 19:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T356166)', diff saved to https://phabricator.wikimedia.org/P60536 and previous config saved to /var/cache/conftool/dbconfig/20240415-194420-marostegui.json
  • 19:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P60535 and previous config saved to /var/cache/conftool/dbconfig/20240415-193921-ladsgroup.json
  • 19:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 19:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 19:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P60534 and previous config saved to /var/cache/conftool/dbconfig/20240415-193858-ladsgroup.json
  • 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P60533 and previous config saved to /var/cache/conftool/dbconfig/20240415-192350-ladsgroup.json
  • 19:12 mutante: deleting unused kibana-next.svc records from DNS - T234854
  • 19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P60532 and previous config saved to /var/cache/conftool/dbconfig/20240415-190842-ladsgroup.json
  • 19:01 mutante: deleting unused cas-logstash.wikimedia.org from DNS
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P60531 and previous config saved to /var/cache/conftool/dbconfig/20240415-185334-ladsgroup.json
  • 18:51 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir1001.eqiad.wmnet,service=nginx
  • 18:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir1001.eqiad.wmnet,service=nginx
  • 18:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T356166)', diff saved to https://phabricator.wikimedia.org/P60530 and previous config saved to /var/cache/conftool/dbconfig/20240415-185008-marostegui.json
  • 18:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 18:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 18:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T356166)', diff saved to https://phabricator.wikimedia.org/P60529 and previous config saved to /var/cache/conftool/dbconfig/20240415-184945-marostegui.json
  • 18:45 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir1002.eqiad.wmnet,service=nginx
  • 18:45 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir1001.eqiad.wmnet,service=nginx
  • 18:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov1006.eqiad.wmnet with OS bullseye
  • 18:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P60528 and previous config saved to /var/cache/conftool/dbconfig/20240415-183437-marostegui.json
  • 18:34 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 18:24 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 18:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir1002.eqiad.wmnet with OS bullseye
  • 18:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P60527 and previous config saved to /var/cache/conftool/dbconfig/20240415-181930-marostegui.json
  • 18:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1006.eqiad.wmnet with reason: host reimage
  • 18:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1006.eqiad.wmnet with reason: host reimage
  • 18:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T356166)', diff saved to https://phabricator.wikimedia.org/P60526 and previous config saved to /var/cache/conftool/dbconfig/20240415-180422-marostegui.json
  • 18:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 18:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbprov1006.eqiad.wmnet with OS bullseye
  • 17:58 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 17:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1002.eqiad.wmnet with OS bullseye
  • 17:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: T361647 - bking@cumin2002
  • 17:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir1001.eqiad.wmnet with OS bullseye
  • 17:23 taavi@deploy1002: Finished scap: Backport for wmf-config: add private subnets for magru (T346722) (duration: 17m 21s)
  • 17:13 jynus: stop db2139 dbs for upgrade T360751
  • 17:10 taavi@deploy1002: taavi and sukhe: Continuing with sync
  • 17:08 taavi@deploy1002: taavi and sukhe: Backport for wmf-config: add private subnets for magru (T346722) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:06 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2101.codfw.wmnet
  • 17:06 jynus@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:06 jynus@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2101.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
  • 17:06 taavi@deploy1002: Started scap: Backport for wmf-config: add private subnets for magru (T346722)
  • 17:05 jynus@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2101.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
  • 17:03 jynus@cumin2002: START - Cookbook sre.dns.netbox
  • 16:57 jynus@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2101.codfw.wmnet
  • 16:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T356166)', diff saved to https://phabricator.wikimedia.org/P60524 and previous config saved to /var/cache/conftool/dbconfig/20240415-163011-marostegui.json
  • 16:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T356166)', diff saved to https://phabricator.wikimedia.org/P60523 and previous config saved to /var/cache/conftool/dbconfig/20240415-162949-marostegui.json
  • 16:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: T361647 - bking@cumin2002
  • 16:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 16:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P60522 and previous config saved to /var/cache/conftool/dbconfig/20240415-161441-marostegui.json
  • 16:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:11 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 16:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1001.eqiad.wmnet with OS bullseye
  • 16:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P60521 and previous config saved to /var/cache/conftool/dbconfig/20240415-155932-marostegui.json
  • 15:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:55 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ncredir1001.eqiad.wmnet with OS bullseye
  • 15:55 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir1001.eqiad.wmnet with OS bullseye
  • 15:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:50 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 15:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T356166)', diff saved to https://phabricator.wikimedia.org/P60520 and previous config saved to /var/cache/conftool/dbconfig/20240415-154422-marostegui.json
  • 15:40 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 15:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 100%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60519 and previous config saved to /var/cache/conftool/dbconfig/20240415-152132-arnaudb.json
  • 15:19 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:18 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbprov1006.eqiad.wmnet with OS bullseye
  • 15:11 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: T361647 - bking@cumin2002
  • 15:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 75%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60518 and previous config saved to /var/cache/conftool/dbconfig/20240415-150626-arnaudb.json
  • 15:04 dancy@deploy1002: Installation of scap version "4.76.0" completed for 340 hosts
  • 15:04 Daimona: Running query for T362365#9710047
  • 15:03 dancy@deploy1002: Installing scap version "4.76.0" for 340 hosts
  • 15:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T356166)', diff saved to https://phabricator.wikimedia.org/P60517 and previous config saved to /var/cache/conftool/dbconfig/20240415-150257-marostegui.json
  • 15:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T356166)', diff saved to https://phabricator.wikimedia.org/P60516 and previous config saved to /var/cache/conftool/dbconfig/20240415-150235-marostegui.json
  • 14:52 Dreamy_Jazz: Afternoon backport window done
  • 14:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 50%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60515 and previous config saved to /var/cache/conftool/dbconfig/20240415-145120-arnaudb.json
  • 14:50 dreamyjazz@deploy1002: Finished scap: Backport for Define 'useYear' as true for temp user serial mapping config (T349506) (duration: 16m 16s)
  • 14:48 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: T361647 - bking@cumin2002
  • 14:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P60514 and previous config saved to /var/cache/conftool/dbconfig/20240415-144725-marostegui.json
  • 14:41 jynus: fixed grants for db2098
  • 14:37 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 14:36 dreamyjazz@deploy1002: dreamyjazz: Backport for Define 'useYear' as true for temp user serial mapping config (T349506) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 25%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60513 and previous config saved to /var/cache/conftool/dbconfig/20240415-143614-arnaudb.json
  • 14:34 dreamyjazz@deploy1002: Started scap: Backport for Define 'useYear' as true for temp user serial mapping config (T349506)
  • 14:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P60512 and previous config saved to /var/cache/conftool/dbconfig/20240415-143217-marostegui.json
  • 14:31 urbanecm@deploy1002: Finished scap: Backport for Add wgAutoCreateTempUser configuration for production (T349506 T337090), Change mul deployment on beta to limited version (T356169) (duration: 30m 12s)
  • 14:31 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:30 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4051.ulsfo.wmnet,cp5030.eqsin.wmnet,cp5032.eqsin.wmnet} and A:cp
  • 14:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2111.codfw.wmnet with OS bookworm
  • 14:23 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:21 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4051.ulsfo.wmnet,cp5030.eqsin.wmnet,cp5032.eqsin.wmnet} and A:cp
  • 14:18 urbanecm@deploy1002: urbanecm and dreamyjazz and arthurtaylor: Continuing with sync
  • 14:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T356166)', diff saved to https://phabricator.wikimedia.org/P60511 and previous config saved to /var/cache/conftool/dbconfig/20240415-141710-marostegui.json
  • 14:16 elukey: move cassandra instances on cassandra-dev to pki - T352647
  • 14:14 urbanecm@deploy1002: urbanecm and dreamyjazz and arthurtaylor: Backport for Add wgAutoCreateTempUser configuration for production (T349506 T337090), Change mul deployment on beta to limited version (T356169) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:13 vgutierrez: uploaded tcp-mss-clamper 0.4+deb11u2 to bullseye-wikimedia (apt.wm.o)
  • 14:09 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: T361647 - bking@cumin2002
  • 14:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2111.codfw.wmnet with reason: host reimage
  • 14:04 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: T361647 - bking@cumin2002
  • 14:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2111.codfw.wmnet with reason: host reimage
  • 14:04 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster relforge: T361647 - bking@cumin2002
  • 14:04 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster relforge: T361647 - bking@cumin2002
  • 14:01 urbanecm@deploy1002: Started scap: Backport for Add wgAutoCreateTempUser configuration for production (T349506 T337090), Change mul deployment on beta to limited version (T356169)
  • 13:59 jynus: update dbprov2005 dbbackups password T362509
  • 13:58 urbanecm@deploy1002: sync-world aborted: Backport for Add wgAutoCreateTempUser configuration for production (T349506 T337090), Change mul deployment on beta to limited version (T356169) (duration: 52m 11s)
  • 13:54 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=zhwiki --namespaceName User_talk` T362530
  • 13:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbprov1006.eqiad.wmnet with OS bullseye
  • 13:48 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2111.codfw.wmnet with OS bookworm
  • 13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 depool', diff saved to https://phabricator.wikimedia.org/P60510 and previous config saved to /var/cache/conftool/dbconfig/20240415-134710-arnaudb.json
  • 13:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2111.codfw.wmnet with reason: reboot multiinstance replica
  • 13:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2111.codfw.wmnet with reason: reboot multiinstance replica
  • 13:45 arnaudb@cumin1002: dbctl commit (dc=all): 'db2128 (re)pooling @ 100%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60509 and previous config saved to /var/cache/conftool/dbconfig/20240415-134522-arnaudb.json
  • 13:45 vgutierrez: update thirdparty/haproxy28 to 2.8.9 for bullseye-wikimedia (apt.wm.o)
  • 13:37 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-esams and not P{cp3066.esams.wmnet,cp3069.esams.wmnet,cp3070.esams.wmnet,cp3071.esams.wmnet,cp3072.esams.wmnet,cp3073.esams.wmnet} and A:cp
  • 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T356166)', diff saved to https://phabricator.wikimedia.org/P60508 and previous config saved to /var/cache/conftool/dbconfig/20240415-133433-marostegui.json
  • 13:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 13:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T356166)', diff saved to https://phabricator.wikimedia.org/P60507 and previous config saved to /var/cache/conftool/dbconfig/20240415-133410-marostegui.json
  • 13:30 arnaudb@cumin1002: dbctl commit (dc=all): 'db2128 (re)pooling @ 75%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60506 and previous config saved to /var/cache/conftool/dbconfig/20240415-133016-arnaudb.json
  • 13:19 volans: upgraed spicerack to v8.5.0 on cumin2002
  • 13:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P60505 and previous config saved to /var/cache/conftool/dbconfig/20240415-131902-marostegui.json
  • 13:16 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp1115.eqiad.wmnet
  • 13:15 arnaudb@cumin1002: dbctl commit (dc=all): 'db2128 (re)pooling @ 50%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60504 and previous config saved to /var/cache/conftool/dbconfig/20240415-131510-arnaudb.json
  • 13:08 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-esams and not P{cp3066.esams.wmnet,cp3069.esams.wmnet,cp3070.esams.wmnet,cp3071.esams.wmnet,cp3072.esams.wmnet,cp3073.esams.wmnet} and A:cp
  • 13:08 urbanecm@deploy1002: urbanecm and arthurtaylor and dreamyjazz: Backport for Add wgAutoCreateTempUser configuration for production (T349506 T337090), Change mul deployment on beta to limited version (T356169) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:06 urbanecm@deploy1002: Started scap: Backport for Add wgAutoCreateTempUser configuration for production (T349506 T337090), Change mul deployment on beta to limited version (T356169)
  • 13:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P60503 and previous config saved to /var/cache/conftool/dbconfig/20240415-130355-marostegui.json
  • 13:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db2128 (re)pooling @ 25%: post reimage repool', diff saved to https://phabricator.wikimedia.org/P60502 and previous config saved to /var/cache/conftool/dbconfig/20240415-130005-arnaudb.json
  • 12:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2128.codfw.wmnet with OS bookworm
  • 12:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T356166)', diff saved to https://phabricator.wikimedia.org/P60501 and previous config saved to /var/cache/conftool/dbconfig/20240415-124848-marostegui.json
  • 12:12 jynus: deploy new database grants for m1 <- dbbprov1005
  • 12:09 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2128.codfw.wmnet with OS bookworm
  • 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T356166)', diff saved to https://phabricator.wikimedia.org/P60500 and previous config saved to /var/cache/conftool/dbconfig/20240415-120650-marostegui.json
  • 12:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T356166)', diff saved to https://phabricator.wikimedia.org/P60499 and previous config saved to /var/cache/conftool/dbconfig/20240415-120627-marostegui.json
  • 12:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2128.codfw.wmnet
  • 12:04 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqiad and not P{cp1112.eqiad.wmnet,cp1113.eqiad.wmnet,cp1115.eqiad.wmnet} and A:cp
  • 12:00 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2128.codfw.wmnet
  • 11:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2128,2186].codfw.wmnet with reason: upgrade db2128 T360116
  • 11:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2128,2186].codfw.wmnet with reason: upgrade db2128 T360116
  • 11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db2128 depool T360116', diff saved to https://phabricator.wikimedia.org/P60498 and previous config saved to /var/cache/conftool/dbconfig/20240415-115708-arnaudb.json
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P60497 and previous config saved to /var/cache/conftool/dbconfig/20240415-115118-marostegui.json
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P60496 and previous config saved to /var/cache/conftool/dbconfig/20240415-113610-marostegui.json
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T356166)', diff saved to https://phabricator.wikimedia.org/P60495 and previous config saved to /var/cache/conftool/dbconfig/20240415-112102-marostegui.json
  • 11:13 volans: uploaded spicerack_8.5.0 to apt.wikimedia.org bullseye-wikimedia
  • 11:07 moritzm: imported shellcheck 0.7.1-1~bpo10+1 to component/shellcheck T362518
  • 11:03 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 10:46 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T356166)', diff saved to https://phabricator.wikimedia.org/P60494 and previous config saved to /var/cache/conftool/dbconfig/20240415-103853-marostegui.json
  • 10:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:33 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 10:31 godog: bounce prometheus@k8s-staging in eqiad - T343529
  • 10:31 moritzm: imported lilypond/lilypond-data 2.22.0-10~bpo10+1 to component/lilypond T362518
  • 10:22 claime: Launching build-base-images on build2001 - T362518
  • 10:10 hashar@deploy1002: Finished deploy [gerrit/gerrit@47eacb9]: Update Javascript plugins for Gerrit 3.8 - T354886 (duration: 00m 07s)
  • 10:10 hashar@deploy1002: Started deploy [gerrit/gerrit@47eacb9]: Update Javascript plugins for Gerrit 3.8 - T354886
  • 09:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@2f3d3d4]: Gerrit to 3.8.5 on gerrit1003 - T354886 (duration: 00m 06s)
  • 09:56 hashar@deploy1002: Started deploy [gerrit/gerrit@2f3d3d4]: Gerrit to 3.8.5 on gerrit1003 - T354886
  • 09:53 hashar@deploy1002: Finished deploy [gerrit/gerrit@2f3d3d4]: Gerrit to 3.8.5 on gerrit2002 - T354886 (duration: 00m 08s)
  • 09:52 hashar@deploy1002: Started deploy [gerrit/gerrit@2f3d3d4]: Gerrit to 3.8.5 on gerrit2002 - T354886
  • 09:50 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 09:26 cgoubert@deploy1002: Finished scap: T351237 (duration: 11m 43s)
  • 09:14 cgoubert@deploy1002: Started scap: T351237
  • 09:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 09:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 09:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T356166)', diff saved to https://phabricator.wikimedia.org/P60493 and previous config saved to /var/cache/conftool/dbconfig/20240415-091145-marostegui.json
  • 09:09 ladsgroup@deploy1002: Finished scap: Backport for Set all wikis to read new for pagelinks migration except trwiki, zhwiki (T351237) (duration: 08m 51s)
  • 09:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P60492 and previous config saved to /var/cache/conftool/dbconfig/20240415-090834-ladsgroup.json
  • 09:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 09:03 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 09:02 ladsgroup@deploy1002: ladsgroup: Backport for Set all wikis to read new for pagelinks migration except trwiki, zhwiki (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:00 ladsgroup@deploy1002: Started scap: Backport for Set all wikis to read new for pagelinks migration except trwiki, zhwiki (T351237)
  • 08:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P60491 and previous config saved to /var/cache/conftool/dbconfig/20240415-085638-marostegui.json
  • 08:54 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 08:53 jynus: restart dbprov2005
  • 08:46 godog: logstash.w.o now uses sso - T246998
  • 08:45 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
  • 08:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P60490 and previous config saved to /var/cache/conftool/dbconfig/20240415-084130-marostegui.json
  • 08:35 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
  • 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T356166)', diff saved to https://phabricator.wikimedia.org/P60488 and previous config saved to /var/cache/conftool/dbconfig/20240415-082623-marostegui.json
  • 08:01 Emperor: depool wdqs in codfw T362508
  • 08:01 mvernon@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=codfw
  • 07:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: reboot multiinstance replica
  • 07:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1217.eqiad.wmnet with reason: reboot multiinstance replica
  • 07:48 jayme: restarting k8s-mlstaging and k8s-staging prometheus instances - T343529
  • 07:11 dcausse: restarting blazegraph on wdqs1020 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 06:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T356166)', diff saved to https://phabricator.wikimedia.org/P60487 and previous config saved to /var/cache/conftool/dbconfig/20240415-065659-marostegui.json
  • 06:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 06:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T356166)', diff saved to https://phabricator.wikimedia.org/P60486 and previous config saved to /var/cache/conftool/dbconfig/20240415-065636-marostegui.json
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P60485 and previous config saved to /var/cache/conftool/dbconfig/20240415-064129-marostegui.json
  • 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P60484 and previous config saved to /var/cache/conftool/dbconfig/20240415-062621-marostegui.json
  • 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T356166)', diff saved to https://phabricator.wikimedia.org/P60483 and previous config saved to /var/cache/conftool/dbconfig/20240415-061114-marostegui.json
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T356166)', diff saved to https://phabricator.wikimedia.org/P60482 and previous config saved to /var/cache/conftool/dbconfig/20240415-053001-marostegui.json
  • 05:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance

2024-04-14

  • 16:00 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2142 as x2 codfw master', diff saved to https://phabricator.wikimedia.org/P60481 and previous config saved to /var/cache/conftool/dbconfig/20240414-160016-marostegui.json
  • 11:22 marostegui: Restart x2 codfw master
  • 11:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Investigating
  • 11:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Investigating

2024-04-13

  • 23:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T356166)', diff saved to https://phabricator.wikimedia.org/P60479 and previous config saved to /var/cache/conftool/dbconfig/20240413-233953-marostegui.json
  • 23:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P60478 and previous config saved to /var/cache/conftool/dbconfig/20240413-232443-marostegui.json
  • 23:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P60477 and previous config saved to /var/cache/conftool/dbconfig/20240413-230935-marostegui.json
  • 22:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T356166)', diff saved to https://phabricator.wikimedia.org/P60476 and previous config saved to /var/cache/conftool/dbconfig/20240413-225428-marostegui.json
  • 15:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T356166)', diff saved to https://phabricator.wikimedia.org/P60475 and previous config saved to /var/cache/conftool/dbconfig/20240413-154240-marostegui.json
  • 15:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 15:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 15:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T356166)', diff saved to https://phabricator.wikimedia.org/P60474 and previous config saved to /var/cache/conftool/dbconfig/20240413-154217-marostegui.json
  • 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P60473 and previous config saved to /var/cache/conftool/dbconfig/20240413-152709-marostegui.json
  • 15:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P60472 and previous config saved to /var/cache/conftool/dbconfig/20240413-151201-marostegui.json
  • 14:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T356166)', diff saved to https://phabricator.wikimedia.org/P60471 and previous config saved to /var/cache/conftool/dbconfig/20240413-145653-marostegui.json
  • 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T356166)', diff saved to https://phabricator.wikimedia.org/P60470 and previous config saved to /var/cache/conftool/dbconfig/20240413-060646-marostegui.json
  • 06:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 06:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 00:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp1115.eqiad.wmnet

2024-04-12

  • 21:03 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 21:03 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:43 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:43 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:36 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 19:36 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 18:56 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudbackup2002.codfw.wmnet
  • 18:56 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:56 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudbackup2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 18:55 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudbackup2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 18:52 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 18:47 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudbackup2002.codfw.wmnet
  • 18:46 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudbackup2001.codfw.wmnet
  • 18:46 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:46 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudbackup2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 18:44 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudbackup2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 18:40 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 18:35 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudbackup2001.codfw.wmnet
  • 17:00 mutante: crm2001 - on initial puppet run adding envoy build-envoy-config failed building config and service failed due to dependency issue. manual run of "sudo /usr/local/sbin/build-envoy-config -c /etc/envoy/" and restarted envoyproxy.service
  • 16:19 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host matomo1003.eqiad.wmnet with OS bookworm
  • 16:16 elukey: move cassandra instances on cassandra-dev to the new truststore (allowing PKI certs) - T352647
  • 15:59 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:56 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on cp1115.eqiad.wmnet with reason: testing PXE boot issues
  • 15:56 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on cp1115.eqiad.wmnet with reason: testing PXE boot issues
  • 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 15:53 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic2090.codfw.wmnet with reason: T353878
  • 15:51 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic2090.codfw.wmnet with reason: T353878
  • 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:50 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic2090 for reboot to get rid of broken systemd units - bking@cumin2002 - T353878
  • 15:50 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2090 for reboot to get rid of broken systemd units - bking@cumin2002 - T353878
  • 15:50 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on matomo1003.eqiad.wmnet with reason: host reimage
  • 15:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:48 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp1115.eqiad.wmnet
  • 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:46 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on matomo1003.eqiad.wmnet with reason: host reimage
  • 15:46 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:32 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm
  • 15:31 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host matomo1003.eqiad.wmnet with OS bookworm
  • 15:23 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "magru - ayounsi@cumin1002"
  • 15:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:21 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "magru - ayounsi@cumin1002"
  • 15:07 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 15:03 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "magru - ayounsi@cumin1002"
  • 15:03 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:02 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm
  • 15:01 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "magru - ayounsi@cumin1002"
  • 14:59 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host matomo1003.eqiad.wmnet with OS bookworm
  • 14:22 hashar@deploy1002: Finished scap: Backport for Parser::statelessFetchTemplate: don't add interwiki redirects to dependencies (T362221) (duration: 16m 29s)
  • 14:19 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:18 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:18 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:17 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm
  • 14:09 hashar@deploy1002: hashar and jforrester: Continuing with sync
  • 14:08 hashar@deploy1002: hashar and jforrester: Backport for Parser::statelessFetchTemplate: don't add interwiki redirects to dependencies (T362221) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:08 sukhe: depool cp1115 for PXE boot issue testing: T350179
  • 14:07 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1115.eqiad.wmnet,service=(cdn|ats-be)
  • 14:05 hashar@deploy1002: Started scap: Backport for Parser::statelessFetchTemplate: don't add interwiki redirects to dependencies (T362221)
  • 12:53 jayme: updated rsyslog to 8.2404.0-1~bpo11+1 on staging-codfw and staging-eqiad k8s clusters - T357616
  • 12:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P60466 and previous config saved to /var/cache/conftool/dbconfig/20240412-122045-marostegui.json
  • 12:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P60464 and previous config saved to /var/cache/conftool/dbconfig/20240412-120537-marostegui.json
  • 12:02 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host matomo1003.eqiad.wmnet with OS bookworm
  • 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T356166)', diff saved to https://phabricator.wikimedia.org/P60463 and previous config saved to /var/cache/conftool/dbconfig/20240412-115029-marostegui.json
  • 11:33 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm
  • 11:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 10:55 urbanecm: mwmaint1002: mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=frwiki --search-index (T362367)
  • 09:58 urbanecm: mwmaint1002: mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --search-index (T362367)
  • 09:36 moritzm: installing postgresql-common bugfix updates from Bullseye point release
  • 09:26 moritzm: installing debootstrap bugfix updates from Bullseye point release
  • 09:25 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on matomo1003.eqiad.wmnet with reason: Still in setup
  • 09:25 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on matomo1003.eqiad.wmnet with reason: Still in setup
  • 08:56 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60461 and previous config saved to /var/cache/conftool/dbconfig/20240412-072435-root.json
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2109 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60460 and previous config saved to /var/cache/conftool/dbconfig/20240412-070930-root.json
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2109 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60459 and previous config saved to /var/cache/conftool/dbconfig/20240412-065424-root.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2109 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60458 and previous config saved to /var/cache/conftool/dbconfig/20240412-063918-root.json
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2109 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60457 and previous config saved to /var/cache/conftool/dbconfig/20240412-062412-root.json
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2109 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60456 and previous config saved to /var/cache/conftool/dbconfig/20240412-060907-root.json
  • 05:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2109.codfw.wmnet with OS bookworm
  • 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2109 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60455 and previous config saved to /var/cache/conftool/dbconfig/20240412-055401-root.json
  • 05:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2109.codfw.wmnet with reason: host reimage
  • 05:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2109.codfw.wmnet with reason: host reimage
  • 05:23 moritzm: prune obsolete nginx debs on apt-staging after switch to new nginx provider scheme T329529
  • 05:17 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2109.codfw.wmnet with OS bookworm
  • 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2109', diff saved to https://phabricator.wikimedia.org/P60454 and previous config saved to /var/cache/conftool/dbconfig/20240412-051606-root.json
  • 03:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T356166)', diff saved to https://phabricator.wikimedia.org/P60453 and previous config saved to /var/cache/conftool/dbconfig/20240412-033317-marostegui.json
  • 03:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 03:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 03:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T356166)', diff saved to https://phabricator.wikimedia.org/P60452 and previous config saved to /var/cache/conftool/dbconfig/20240412-033254-marostegui.json
  • 03:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P60451 and previous config saved to /var/cache/conftool/dbconfig/20240412-031744-marostegui.json
  • 03:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P60450 and previous config saved to /var/cache/conftool/dbconfig/20240412-030237-marostegui.json
  • 02:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T356166)', diff saved to https://phabricator.wikimedia.org/P60449 and previous config saved to /var/cache/conftool/dbconfig/20240412-024729-marostegui.json
  • 01:05 denisse: Manually deleting /srv/syslog/.linux.dhcp.DictModel/syslog.log from November 30 on centrallog1002 and centrallog2002 after the prune_old_srv_syslog_directories.service failed to delete the non-empty directory - T362376

2024-04-11

  • 23:04 cstone: civicrm upgraded from c2569254 to 4d5a4fc3
  • 20:20 urbanecm@deploy1002: Finished scap: Backport for ext-EventLogging: Add mediawiki.product_metrics.wikifunctions_ui to $wgEventLoggingStreamNames (duration: 17m 38s)
  • 20:08 urbanecm@deploy1002: urbanecm and phuedx: Continuing with sync
  • 20:05 urbanecm@deploy1002: urbanecm and phuedx: Backport for ext-EventLogging: Add mediawiki.product_metrics.wikifunctions_ui to $wgEventLoggingStreamNames synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:03 urbanecm@deploy1002: Started scap: Backport for ext-EventLogging: Add mediawiki.product_metrics.wikifunctions_ui to $wgEventLoggingStreamNames
  • 19:41 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 19:40 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 19:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T356166)', diff saved to https://phabricator.wikimedia.org/P60448 and previous config saved to /var/cache/conftool/dbconfig/20240411-193537-marostegui.json
  • 19:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 19:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 19:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T356166)', diff saved to https://phabricator.wikimedia.org/P60447 and previous config saved to /var/cache/conftool/dbconfig/20240411-193514-marostegui.json
  • 19:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P60446 and previous config saved to /var/cache/conftool/dbconfig/20240411-192006-marostegui.json
  • 19:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P60445 and previous config saved to /var/cache/conftool/dbconfig/20240411-190459-marostegui.json
  • 18:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T356166)', diff saved to https://phabricator.wikimedia.org/P60443 and previous config saved to /var/cache/conftool/dbconfig/20240411-184951-marostegui.json
  • 17:50 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.26 refs T360158
  • 17:27 swfrench@deploy1002: Finished scap: (no justification provided) (duration: 07m 57s)
  • 17:20 swfrench@deploy1002: Started scap: (no justification provided)
  • 17:12 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:45 hashar@deploy1002: Finished scap: Backport for Revert "Update mobile search for dark mode, remove unused functions in MobilePage.php" (T362297) (duration: 16m 47s)
  • 16:33 hashar@deploy1002: hashar: Continuing with sync
  • 16:31 hashar@deploy1002: hashar: Backport for Revert "Update mobile search for dark mode, remove unused functions in MobilePage.php" (T362297) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:29 hashar@deploy1002: Started scap: Backport for Revert "Update mobile search for dark mode, remove unused functions in MobilePage.php" (T362297)
  • 16:27 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host matomo1003.eqiad.wmnet with OS bookworm
  • 16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P60442 and previous config saved to /var/cache/conftool/dbconfig/20240411-161536-arnaudb.json
  • 16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P60441 and previous config saved to /var/cache/conftool/dbconfig/20240411-161522-arnaudb.json
  • 16:03 herron: beginning rolling hardware upgrades for titan100[12] T361251
  • 16:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P60440 and previous config saved to /var/cache/conftool/dbconfig/20240411-160030-arnaudb.json
  • 16:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P60439 and previous config saved to /var/cache/conftool/dbconfig/20240411-160016-arnaudb.json
  • 15:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60438 and previous config saved to /var/cache/conftool/dbconfig/20240411-155836-root.json
  • 15:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 15:47 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-drmrs and A:cp
  • 15:45 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P60437 and previous config saved to /var/cache/conftool/dbconfig/20240411-154524-arnaudb.json
  • 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P60436 and previous config saved to /var/cache/conftool/dbconfig/20240411-154510-arnaudb.json
  • 15:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60435 and previous config saved to /var/cache/conftool/dbconfig/20240411-154330-root.json
  • 15:39 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 15:36 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 15:35 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
  • 15:33 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 15:31 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
  • 15:30 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P60434 and previous config saved to /var/cache/conftool/dbconfig/20240411-153019-arnaudb.json
  • 15:30 arnaudb@cumin1002: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P60433 and previous config saved to /var/cache/conftool/dbconfig/20240411-153003-arnaudb.json
  • 15:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60432 and previous config saved to /var/cache/conftool/dbconfig/20240411-152825-root.json
  • 15:24 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:24 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:24 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:23 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:20 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:18 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 15:14 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on matomo1003.eqiad.wmnet with reason: host reimage
  • 15:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60431 and previous config saved to /var/cache/conftool/dbconfig/20240411-151319-root.json
  • 15:12 dancy@deploy1002: Finished scap: Backport for static.php: Handle mediawiki.org/ontology/ontology.owl (T171807 T359643) (duration: 17m 41s)
  • 15:11 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on matomo1003.eqiad.wmnet with reason: host reimage
  • 15:00 dancy@deploy1002: dancy: Continuing with sync
  • 14:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60430 and previous config saved to /var/cache/conftool/dbconfig/20240411-145841-arnaudb.json
  • 14:58 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm
  • 14:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60429 and previous config saved to /var/cache/conftool/dbconfig/20240411-145813-root.json
  • 14:57 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-drmrs and A:cp
  • 14:57 dancy@deploy1002: dancy: Backport for static.php: Handle mediawiki.org/ontology/ontology.owl (T171807 T359643) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2115 (re)pooling @ 100%: Repool', diff saved to https://phabricator.wikimedia.org/P60428 and previous config saved to /var/cache/conftool/dbconfig/20240411-145658-arnaudb.json
  • 14:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-codfw and not P{cp2042.codfw.wmnet} and A:cp
  • 14:54 dancy@deploy1002: Started scap: Backport for static.php: Handle mediawiki.org/ontology/ontology.owl (T171807 T359643)
  • 14:52 sukhe: sudo cumin "A:cp and A:esams" "run-puppet-agent --enable 'merging CR 1014571'"
  • 14:52 dreamyjazz@deploy1002: Finished scap: Backport for Set wgMFFallbackEditor to visual for most VE wikis (T361134) (duration: 24m 11s)
  • 14:47 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:47 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:47 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:45 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot-master (exit_code=0) rolling restart_daemons on A:maps-master
  • 14:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: reool', diff saved to https://phabricator.wikimedia.org/P60427 and previous config saved to /var/cache/conftool/dbconfig/20240411-144416-arnaudb.json
  • 14:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60426 and previous config saved to /var/cache/conftool/dbconfig/20240411-144336-arnaudb.json
  • 14:43 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot-master rolling restart_daemons on A:maps-master
  • 14:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P60425 and previous config saved to /var/cache/conftool/dbconfig/20240411-144311-arnaudb.json
  • 14:43 sukhe: sudo cumin "A:cp and A:esams" "disable-puppet 'merging CR 1014571'"
  • 14:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60424 and previous config saved to /var/cache/conftool/dbconfig/20240411-144307-root.json
  • 14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2115 (re)pooling @ 75%: Repool', diff saved to https://phabricator.wikimedia.org/P60423 and previous config saved to /var/cache/conftool/dbconfig/20240411-144152-arnaudb.json
  • 14:39 dreamyjazz@deploy1002: dreamyjazz and esanders: Continuing with sync
  • 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 14:34 moritzm: installing distro-info-data updates from Bullseye point release
  • 14:31 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
  • 14:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS bookworm
  • 14:30 dreamyjazz@deploy1002: dreamyjazz and esanders: Backport for Set wgMFFallbackEditor to visual for most VE wikis (T361134) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: reool', diff saved to https://phabricator.wikimedia.org/P60422 and previous config saved to /var/cache/conftool/dbconfig/20240411-142910-arnaudb.json
  • 14:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60421 and previous config saved to /var/cache/conftool/dbconfig/20240411-142830-arnaudb.json
  • 14:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=(cdn|ats-be)
  • 14:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3073.esams.wmnet,service=(cdn|ats-be)
  • 14:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2109 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P60420 and previous config saved to /var/cache/conftool/dbconfig/20240411-142806-arnaudb.json
  • 14:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60419 and previous config saved to /var/cache/conftool/dbconfig/20240411-142801-root.json
  • 14:27 dreamyjazz@deploy1002: Started scap: Backport for Set wgMFFallbackEditor to visual for most VE wikis (T361134)
  • 14:27 Dreamy_Jazz: Extending UTC Afternoon backport window
  • 14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2115 (re)pooling @ 50%: Repool', diff saved to https://phabricator.wikimedia.org/P60418 and previous config saved to /var/cache/conftool/dbconfig/20240411-142645-arnaudb.json
  • 14:26 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2042.codfw.wmnet with OS bullseye
  • 14:24 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3073.esams.wmnet with OS bullseye
  • 14:19 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 14:18 elukey: drain and restart cassandra-b on aqs2007 - didn't pick up the new truststore during the past roll restart - T352647
  • 14:15 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
  • 14:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: reool', diff saved to https://phabricator.wikimedia.org/P60417 and previous config saved to /var/cache/conftool/dbconfig/20240411-141404-arnaudb.json
  • 14:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60416 and previous config saved to /var/cache/conftool/dbconfig/20240411-141324-arnaudb.json
  • 14:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2109 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P60415 and previous config saved to /var/cache/conftool/dbconfig/20240411-141300-arnaudb.json
  • 14:12 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2115 (re)pooling @ 25%: Repool', diff saved to https://phabricator.wikimedia.org/P60414 and previous config saved to /var/cache/conftool/dbconfig/20240411-141139-arnaudb.json
  • 14:11 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:10 elukey: move cassandra instances on aqs1010 to PKI TLS certs - T352647
  • 14:10 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage
  • 14:09 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:09 Dreamy_Jazz: Afternoon UTC backport window finished
  • 14:09 moritzm: installing NSS security updates
  • 14:08 dreamyjazz@deploy1002: Finished scap: Backport for Ignore missing title/page in CheckUserLookupUtils::getManualLogEntryFromRow (T362284) (duration: 17m 42s)
  • 14:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage
  • 14:06 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2042.codfw.wmnet with reason: host reimage
  • 14:06 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host matomo1003.eqiad.wmnet with OS bookworm
  • 14:03 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2042.codfw.wmnet with reason: host reimage
  • 14:01 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3073.esams.wmnet with reason: host reimage
  • 13:59 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1010.eqiad.wmnet with reason: Upgrade to PKI
  • 13:59 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1010.eqiad.wmnet with reason: Upgrade to PKI
  • 13:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: reool', diff saved to https://phabricator.wikimedia.org/P60413 and previous config saved to /var/cache/conftool/dbconfig/20240411-135858-arnaudb.json
  • 13:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60412 and previous config saved to /var/cache/conftool/dbconfig/20240411-135846-root.json
  • 13:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 20%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60411 and previous config saved to /var/cache/conftool/dbconfig/20240411-135819-arnaudb.json
  • 13:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3073.esams.wmnet with reason: host reimage
  • 13:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db2109 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P60410 and previous config saved to /var/cache/conftool/dbconfig/20240411-135754-arnaudb.json
  • 13:57 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-codfw and not P{cp2042.codfw.wmnet} and A:cp
  • 13:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2115 (re)pooling @ 10%: Repool', diff saved to https://phabricator.wikimedia.org/P60409 and previous config saved to /var/cache/conftool/dbconfig/20240411-135634-arnaudb.json
  • 13:55 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 13:55 dreamyjazz@deploy1002: dreamyjazz: Backport for Ignore missing title/page in CheckUserLookupUtils::getManualLogEntryFromRow (T362284) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2008.wikimedia.org
  • 13:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2008.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 13:53 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2008.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 13:51 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS bookworm
  • 13:50 dreamyjazz@deploy1002: Started scap: Backport for Ignore missing title/page in CheckUserLookupUtils::getManualLogEntryFromRow (T362284)
  • 13:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P60408 and previous config saved to /var/cache/conftool/dbconfig/20240411-134932-root.json
  • 13:49 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm
  • 13:46 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host matomo1003.eqiad.wmnet with OS bookworm
  • 13:46 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp2042.codfw.wmnet with OS bullseye
  • 13:45 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=(cdn|ats-be)
  • 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60407 and previous config saved to /var/cache/conftool/dbconfig/20240411-134341-root.json
  • 13:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60406 and previous config saved to /var/cache/conftool/dbconfig/20240411-134312-arnaudb.json
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 13:36 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
  • 13:34 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp3073.esams.wmnet with OS bullseye
  • 13:32 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3073.esams.wmnet,service=(cdn|ats-be)
  • 13:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db2160.codfw.wmnet with reason: reboot multiinstance replica
  • 13:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db2160.codfw.wmnet with reason: reboot multiinstance replica
  • 13:32 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm
  • 13:31 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host matomo1003.eqiad.wmnet with OS bookworm
  • 13:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2133.codfw.wmnet
  • 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60405 and previous config saved to /var/cache/conftool/dbconfig/20240411-132834-root.json
  • 13:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60404 and previous config saved to /var/cache/conftool/dbconfig/20240411-132807-arnaudb.json
  • 13:27 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin and not P{cp[5030,5032].eqsin.wmnet} and A:cp
  • 13:26 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2133.codfw.wmnet
  • 13:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2133,2160].codfw.wmnet with reason: reboot
  • 13:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2133,2160].codfw.wmnet with reason: reboot
  • 13:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2135.codfw.wmnet
  • 13:18 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2135.codfw.wmnet
  • 13:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2135,2160].codfw.wmnet with reason: reboot
  • 13:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2135,2160].codfw.wmnet with reason: reboot
  • 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2134.codfw.wmnet
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60403 and previous config saved to /var/cache/conftool/dbconfig/20240411-131327-root.json
  • 13:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 4%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60402 and previous config saved to /var/cache/conftool/dbconfig/20240411-131301-arnaudb.json
  • 13:12 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm
  • 13:12 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2134.codfw.wmnet
  • 13:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2134,2160].codfw.wmnet with reason: reboot
  • 13:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2134,2160].codfw.wmnet with reason: reboot
  • 13:00 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host matomo1003.eqiad.wmnet with OS bookworm
  • 12:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2132.codfw.wmnet
  • 12:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60401 and previous config saved to /var/cache/conftool/dbconfig/20240411-125821-root.json
  • 12:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 2%: Post upgrade', diff saved to https://phabricator.wikimedia.org/P60400 and previous config saved to /var/cache/conftool/dbconfig/20240411-125755-arnaudb.json
  • 12:54 akosiaris: lower weight of mw1437 back to 10 from the 30 I had upped it to yesterday. The backlog of videoscaling is apparently now served and CPU usage has reached "normal" levels
  • 12:54 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2132.codfw.wmnet
  • 12:54 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:53 akosiaris@cumin1002: conftool action : set/weight=10; selector: name=mw1437.*.wmnet,dc=eqiad
  • 12:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
  • 12:53 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2132,2160].codfw.wmnet with reason: reboot
  • 12:52 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:52 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:51 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:50 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:49 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:49 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:45 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 12:24 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 12:24 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 12:23 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 12:22 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 12:21 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2008.wikimedia.org with OS bookworm
  • 12:21 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 12:20 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 12:20 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 12:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2008.wikimedia.org on all recursors
  • 12:20 btullis@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 12:19 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache testvm2008.wikimedia.org on all recursors
  • 12:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 12:18 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 12:16 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:16 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 12:16 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host testvm2008.wikimedia.org
  • 12:16 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bookworm
  • 12:16 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2008.wikimedia.org
  • 12:16 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:16 ayounsi@cumin1002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 12:16 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 12:15 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:15 moritzm: installing gnutls28 security updates
  • 12:14 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:13 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 12:13 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host testvm2008.wikimedia.org
  • 12:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2008.wikimedia.org
  • 12:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2008.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 12:13 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host matomo1003.eqiad.wmnet with OS bullseye
  • 12:12 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2008.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 12:10 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 12:06 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts testvm2008.wikimedia.org
  • 12:06 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2008.wikimedia.org
  • 12:06 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host testvm2008.wikimedia.org with OS bookworm
  • 12:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:01 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:59 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:59 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:58 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:58 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2008.wikimedia.org with OS bookworm
  • 11:57 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2177.codfw.wmnet with OS bookworm
  • 11:50 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 11:50 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on matomo1003.eqiad.wmnet with reason: host reimage
  • 11:49 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 11:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2008.wikimedia.org on all recursors
  • 11:49 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache testvm2008.wikimedia.org on all recursors
  • 11:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 11:47 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 11:47 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on matomo1003.eqiad.wmnet with reason: host reimage
  • 11:45 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 11:45 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host testvm2008.wikimedia.org
  • 11:33 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host matomo1003.eqiad.wmnet with OS bullseye
  • 11:31 moritzm: installing postgresql-15 security updates
  • 11:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2177.codfw.wmnet with reason: host reimage
  • 11:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2177.codfw.wmnet with reason: host reimage
  • 11:24 effie: upload prometheus-memcached-exporter 0.14.2-1~wmf1 to bookworm-wikimedia main - T350807
  • 11:22 effie: upload memkeys 20181031-2-s1 to bookworm-wikimedia main - T362160
  • 11:22 effie: upload memkeys 20181031-2-s1 to bookworm-wikimedia main
  • 11:10 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2177.codfw.wmnet with OS bookworm
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2177', diff saved to https://phabricator.wikimedia.org/P60394 and previous config saved to /var/cache/conftool/dbconfig/20240411-110938-root.json
  • 10:53 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(mw2412.codfw.wmnet|mw2413.codfw.wmnet|mw2414.codfw.wmnet|mw2415.codfw.wmnet|mw2416.codfw.wmnet|mw2417.codfw.wmnet|mw2418.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 10:52 claime: Pooling and uncordoning mw2412.codfw.wmnet,mw2413.codfw.wmnet,mw2414.codfw.wmnet,mw2415.codfw.wmnet,mw2416.codfw.wmnet,mw2417.codfw.wmnet,mw2418.codfw.wmnet - T351074
  • 10:43 moritzm: installing modsecurity-apache security updates
  • 10:37 claime: Running homer 'cr*codfw*' commit 'T351074'
  • 10:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2413.codfw.wmnet with OS bullseye
  • 10:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2418.codfw.wmnet with OS bullseye
  • 10:30 moritzm: installing xerces-c security updates
  • 10:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60393 and previous config saved to /var/cache/conftool/dbconfig/20240411-103005-root.json
  • 10:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2412.codfw.wmnet with OS bullseye
  • 10:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2415.codfw.wmnet with OS bullseye
  • 10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2416.codfw.wmnet with OS bullseye
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T356166)', diff saved to https://phabricator.wikimedia.org/P60392 and previous config saved to /var/cache/conftool/dbconfig/20240411-102153-marostegui.json
  • 10:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 10:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 10:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: post schema update', diff saved to https://phabricator.wikimedia.org/P60391 and previous config saved to /var/cache/conftool/dbconfig/20240411-102031-arnaudb.json
  • 10:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2417.codfw.wmnet with OS bullseye
  • 10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2413.codfw.wmnet with reason: host reimage
  • 10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2414.codfw.wmnet with OS bullseye
  • 10:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60390 and previous config saved to /var/cache/conftool/dbconfig/20240411-101500-root.json
  • 10:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2418.codfw.wmnet with reason: host reimage
  • 10:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2412.codfw.wmnet with reason: host reimage
  • 10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2415.codfw.wmnet with reason: host reimage
  • 10:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: post schema update', diff saved to https://phabricator.wikimedia.org/P60389 and previous config saved to /var/cache/conftool/dbconfig/20240411-100525-arnaudb.json
  • 10:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2416.codfw.wmnet with reason: host reimage
  • 10:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2417.codfw.wmnet with reason: host reimage
  • 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60388 and previous config saved to /var/cache/conftool/dbconfig/20240411-095954-root.json
  • 09:57 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2418.codfw.wmnet with reason: host reimage
  • 09:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2414.codfw.wmnet with reason: host reimage
  • 09:57 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testvm2007 - ayounsi@cumin1002"
  • 09:57 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2417.codfw.wmnet with reason: host reimage
  • 09:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2416.codfw.wmnet with reason: host reimage
  • 09:56 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testvm2007 - ayounsi@cumin1002"
  • 09:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2415.codfw.wmnet with reason: host reimage
  • 09:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2413.codfw.wmnet with reason: host reimage
  • 09:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2412.codfw.wmnet with reason: host reimage
  • 09:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2007.codfw.wmnet
  • 09:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2007.codfw.wmnet with OS bookworm
  • 09:54 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2414.codfw.wmnet with reason: host reimage
  • 09:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: post schema update', diff saved to https://phabricator.wikimedia.org/P60387 and previous config saved to /var/cache/conftool/dbconfig/20240411-095019-arnaudb.json
  • 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60386 and previous config saved to /var/cache/conftool/dbconfig/20240411-094448-root.json
  • 09:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2418.codfw.wmnet with OS bullseye
  • 09:40 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
  • 09:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2417.codfw.wmnet with OS bullseye
  • 09:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2416.codfw.wmnet with OS bullseye
  • 09:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2415.codfw.wmnet with OS bullseye
  • 09:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2414.codfw.wmnet with OS bullseye
  • 09:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2413.codfw.wmnet with OS bullseye
  • 09:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2412.codfw.wmnet with OS bullseye
  • 09:38 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
  • 09:37 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3072.esams.wmnet
  • 09:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: post schema update', diff saved to https://phabricator.wikimedia.org/P60384 and previous config saved to /var/cache/conftool/dbconfig/20240411-093513-arnaudb.json
  • 09:32 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3072.esams.wmnet with OS bullseye
  • 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 09:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60383 and previous config saved to /var/cache/conftool/dbconfig/20240411-092942-root.json
  • 09:27 arnaudb@cumin1002: dbctl restore of MediaWiki config (dc=all) from /var/cache/conftool/dbconfig/20240411-092622-arnaudb.json
  • 09:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T360332)', diff saved to https://phabricator.wikimedia.org/P60382 and previous config saved to /var/cache/conftool/dbconfig/20240411-092622-arnaudb.json
  • 09:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 09:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 09:25 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm
  • 09:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2007.codfw.wmnet - ayounsi@cumin1002"
  • 09:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 depool', diff saved to https://phabricator.wikimedia.org/P60381 and previous config saved to /var/cache/conftool/dbconfig/20240411-092501-arnaudb.json
  • 09:24 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2007.codfw.wmnet - ayounsi@cumin1002"
  • 09:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2007.codfw.wmnet on all recursors
  • 09:24 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache testvm2007.codfw.wmnet on all recursors
  • 09:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2007.codfw.wmnet - ayounsi@cumin1002"
  • 09:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db2129 weight bump T362302', diff saved to https://phabricator.wikimedia.org/P60380 and previous config saved to /var/cache/conftool/dbconfig/20240411-092318-arnaudb.json
  • 09:20 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2007.codfw.wmnet - ayounsi@cumin1002"
  • 09:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2114 to s6 primary T362302', diff saved to https://phabricator.wikimedia.org/P60379 and previous config saved to /var/cache/conftool/dbconfig/20240411-092012-arnaudb.json
  • 09:19 arnaudb: Starting s6 codfw failover from db2129 to db2114 - T362302
  • 09:16 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 09:16 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host testvm2007.codfw.wmnet
  • 09:13 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 09:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60378 and previous config saved to /var/cache/conftool/dbconfig/20240411-091255-root.json
  • 09:12 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 09:06 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
  • 09:03 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
  • 08:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2114 with weight 0 T362302', diff saved to https://phabricator.wikimedia.org/P60377 and previous config saved to /var/cache/conftool/dbconfig/20240411-085926-arnaudb.json
  • 08:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T362302
  • 08:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T362302
  • 08:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
  • 08:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2006.codfw.wmnet with OS bookworm
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60376 and previous config saved to /var/cache/conftool/dbconfig/20240411-085749-root.json
  • 08:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1198.eqiad.wmnet with OS bookworm
  • 08:50 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:45 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
  • 08:45 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:45 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on matomo1003.eqiad.wmnet with reason: Adding disk
  • 08:45 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on matomo1003.eqiad.wmnet with reason: Adding disk
  • 08:42 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:42 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
  • 08:40 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp3072.esams.wmnet with OS bullseye
  • 08:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1198.eqiad.wmnet with reason: host reimage
  • 08:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1198.eqiad.wmnet with reason: host reimage
  • 08:36 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:36 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1198.eqiad.wmnet with OS bookworm
  • 08:31 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2006.codfw.wmnet with OS bookworm
  • 08:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - ayounsi@cumin1002"
  • 08:29 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - ayounsi@cumin1002"
  • 08:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 08:28 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 08:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - ayounsi@cumin1002"
  • 08:27 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db1198.eqiad.wmnet with OS bookworm
  • 08:27 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - ayounsi@cumin1002"
  • 08:26 fabfur@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3072.esams.wmnet with OS bullseye
  • 08:25 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 08:25 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 08:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1198.eqiad.wmnet with reason: host reimage
  • 08:20 hashar: MediaWiki train is blocked
  • 08:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1198.eqiad.wmnet with reason: host reimage
  • 08:13 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
  • 08:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
  • 08:06 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp2002.wikimedia.org
  • 08:06 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:06 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 08:06 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1198.eqiad.wmnet with OS bookworm
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1198', diff saved to https://phabricator.wikimedia.org/P60374 and previous config saved to /var/cache/conftool/dbconfig/20240411-080502-root.json
  • 08:03 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 08:01 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 07:56 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp2002.wikimedia.org
  • 07:47 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp3072.esams.wmnet with OS bullseye
  • 07:44 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3072.esams.wmnet
  • 07:39 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:39 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60373 and previous config saved to /var/cache/conftool/dbconfig/20240411-072503-root.json
  • 07:10 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp1002.wikimedia.org
  • 07:10 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:10 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60372 and previous config saved to /var/cache/conftool/dbconfig/20240411-070958-root.json
  • 07:08 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 07:05 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 07:00 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp1002.wikimedia.org
  • 06:57 slyngshede@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts idp1002.wikimedia.org
  • 06:56 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp1002.wikimedia.org
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60371 and previous config saved to /var/cache/conftool/dbconfig/20240411-065452-root.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60370 and previous config saved to /var/cache/conftool/dbconfig/20240411-063946-root.json
  • 06:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T360332)', diff saved to https://phabricator.wikimedia.org/P60369 and previous config saved to /var/cache/conftool/dbconfig/20240411-062728-arnaudb.json
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60368 and previous config saved to /var/cache/conftool/dbconfig/20240411-062440-root.json
  • 06:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P60367 and previous config saved to /var/cache/conftool/dbconfig/20240411-061220-arnaudb.json
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60366 and previous config saved to /var/cache/conftool/dbconfig/20240411-060934-root.json
  • 05:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P60365 and previous config saved to /var/cache/conftool/dbconfig/20240411-055712-arnaudb.json
  • 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60364 and previous config saved to /var/cache/conftool/dbconfig/20240411-055428-root.json
  • 05:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1189.eqiad.wmnet with OS bookworm
  • 05:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T360332)', diff saved to https://phabricator.wikimedia.org/P60363 and previous config saved to /var/cache/conftool/dbconfig/20240411-054205-arnaudb.json
  • 05:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T360332)', diff saved to https://phabricator.wikimedia.org/P60362 and previous config saved to /var/cache/conftool/dbconfig/20240411-053903-arnaudb.json
  • 05:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 05:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 05:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T360332)', diff saved to https://phabricator.wikimedia.org/P60361 and previous config saved to /var/cache/conftool/dbconfig/20240411-053840-arnaudb.json
  • 05:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage
  • 05:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage
  • 05:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P60360 and previous config saved to /var/cache/conftool/dbconfig/20240411-052333-arnaudb.json
  • 05:15 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS bookworm
  • 05:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P60359 and previous config saved to /var/cache/conftool/dbconfig/20240411-051341-root.json
  • 05:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P60358 and previous config saved to /var/cache/conftool/dbconfig/20240411-050825-arnaudb.json
  • 04:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T360332)', diff saved to https://phabricator.wikimedia.org/P60357 and previous config saved to /var/cache/conftool/dbconfig/20240411-045317-arnaudb.json
  • 04:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T360332)', diff saved to https://phabricator.wikimedia.org/P60356 and previous config saved to /var/cache/conftool/dbconfig/20240411-045024-arnaudb.json
  • 04:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 04:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 04:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T360332)', diff saved to https://phabricator.wikimedia.org/P60355 and previous config saved to /var/cache/conftool/dbconfig/20240411-045011-arnaudb.json
  • 04:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P60354 and previous config saved to /var/cache/conftool/dbconfig/20240411-043502-arnaudb.json
  • 04:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P60353 and previous config saved to /var/cache/conftool/dbconfig/20240411-041954-arnaudb.json
  • 04:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T360332)', diff saved to https://phabricator.wikimedia.org/P60352 and previous config saved to /var/cache/conftool/dbconfig/20240411-040447-arnaudb.json
  • 04:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T360332)', diff saved to https://phabricator.wikimedia.org/P60351 and previous config saved to /var/cache/conftool/dbconfig/20240411-040147-arnaudb.json
  • 04:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 04:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 04:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T360332)', diff saved to https://phabricator.wikimedia.org/P60350 and previous config saved to /var/cache/conftool/dbconfig/20240411-040124-arnaudb.json
  • 03:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P60349 and previous config saved to /var/cache/conftool/dbconfig/20240411-034617-arnaudb.json
  • 03:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P60348 and previous config saved to /var/cache/conftool/dbconfig/20240411-033109-arnaudb.json
  • 03:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T360332)', diff saved to https://phabricator.wikimedia.org/P60347 and previous config saved to /var/cache/conftool/dbconfig/20240411-031602-arnaudb.json
  • 03:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T360332)', diff saved to https://phabricator.wikimedia.org/P60346 and previous config saved to /var/cache/conftool/dbconfig/20240411-031310-arnaudb.json
  • 03:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 03:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 03:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T360332)', diff saved to https://phabricator.wikimedia.org/P60345 and previous config saved to /var/cache/conftool/dbconfig/20240411-031247-arnaudb.json
  • 02:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P60344 and previous config saved to /var/cache/conftool/dbconfig/20240411-025740-arnaudb.json
  • 02:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P60343 and previous config saved to /var/cache/conftool/dbconfig/20240411-024232-arnaudb.json
  • 02:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 02:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 02:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T356166)', diff saved to https://phabricator.wikimedia.org/P60342 and previous config saved to /var/cache/conftool/dbconfig/20240411-023125-marostegui.json
  • 02:30 cstone: civicrm upgraded from a382a7b0 to c2569254
  • 02:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T360332)', diff saved to https://phabricator.wikimedia.org/P60341 and previous config saved to /var/cache/conftool/dbconfig/20240411-022725-arnaudb.json
  • 02:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T360332)', diff saved to https://phabricator.wikimedia.org/P60340 and previous config saved to /var/cache/conftool/dbconfig/20240411-022433-arnaudb.json
  • 02:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 02:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 02:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T360332)', diff saved to https://phabricator.wikimedia.org/P60339 and previous config saved to /var/cache/conftool/dbconfig/20240411-022410-arnaudb.json
  • 02:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P60338 and previous config saved to /var/cache/conftool/dbconfig/20240411-021617-marostegui.json
  • 02:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P60337 and previous config saved to /var/cache/conftool/dbconfig/20240411-020903-arnaudb.json
  • 02:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P60336 and previous config saved to /var/cache/conftool/dbconfig/20240411-020110-marostegui.json
  • 01:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P60335 and previous config saved to /var/cache/conftool/dbconfig/20240411-015355-arnaudb.json
  • 01:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T356166)', diff saved to https://phabricator.wikimedia.org/P60334 and previous config saved to /var/cache/conftool/dbconfig/20240411-014602-marostegui.json
  • 01:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T360332)', diff saved to https://phabricator.wikimedia.org/P60333 and previous config saved to /var/cache/conftool/dbconfig/20240411-013848-arnaudb.json
  • 01:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T360332)', diff saved to https://phabricator.wikimedia.org/P60332 and previous config saved to /var/cache/conftool/dbconfig/20240411-013657-arnaudb.json
  • 01:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 01:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 01:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 01:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 01:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T360332)', diff saved to https://phabricator.wikimedia.org/P60331 and previous config saved to /var/cache/conftool/dbconfig/20240411-013618-arnaudb.json
  • 01:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P60330 and previous config saved to /var/cache/conftool/dbconfig/20240411-012110-arnaudb.json
  • 01:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P60329 and previous config saved to /var/cache/conftool/dbconfig/20240411-010601-arnaudb.json
  • 00:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T360332)', diff saved to https://phabricator.wikimedia.org/P60328 and previous config saved to /var/cache/conftool/dbconfig/20240411-005054-arnaudb.json
  • 00:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T360332)', diff saved to https://phabricator.wikimedia.org/P60327 and previous config saved to /var/cache/conftool/dbconfig/20240411-004758-arnaudb.json
  • 00:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 00:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 00:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T360332)', diff saved to https://phabricator.wikimedia.org/P60326 and previous config saved to /var/cache/conftool/dbconfig/20240411-004735-arnaudb.json
  • 00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T356166)', diff saved to https://phabricator.wikimedia.org/P60325 and previous config saved to /var/cache/conftool/dbconfig/20240411-004536-marostegui.json
  • 00:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 00:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T356166)', diff saved to https://phabricator.wikimedia.org/P60324 and previous config saved to /var/cache/conftool/dbconfig/20240411-004514-marostegui.json
  • 00:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P60323 and previous config saved to /var/cache/conftool/dbconfig/20240411-003226-arnaudb.json
  • 00:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P60322 and previous config saved to /var/cache/conftool/dbconfig/20240411-003005-marostegui.json
  • 00:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P60321 and previous config saved to /var/cache/conftool/dbconfig/20240411-001718-arnaudb.json
  • 00:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P60320 and previous config saved to /var/cache/conftool/dbconfig/20240411-001458-marostegui.json
  • 00:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T360332)', diff saved to https://phabricator.wikimedia.org/P60319 and previous config saved to /var/cache/conftool/dbconfig/20240411-000211-arnaudb.json

2024-04-10

  • 23:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T356166)', diff saved to https://phabricator.wikimedia.org/P60318 and previous config saved to /var/cache/conftool/dbconfig/20240410-235950-marostegui.json
  • 23:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T360332)', diff saved to https://phabricator.wikimedia.org/P60317 and previous config saved to /var/cache/conftool/dbconfig/20240410-235920-arnaudb.json
  • 23:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 23:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 23:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T360332)', diff saved to https://phabricator.wikimedia.org/P60316 and previous config saved to /var/cache/conftool/dbconfig/20240410-235857-arnaudb.json
  • 23:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P60315 and previous config saved to /var/cache/conftool/dbconfig/20240410-234350-arnaudb.json
  • 23:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P60314 and previous config saved to /var/cache/conftool/dbconfig/20240410-232842-arnaudb.json
  • 23:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T360332)', diff saved to https://phabricator.wikimedia.org/P60313 and previous config saved to /var/cache/conftool/dbconfig/20240410-231335-arnaudb.json
  • 23:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T360332)', diff saved to https://phabricator.wikimedia.org/P60312 and previous config saved to /var/cache/conftool/dbconfig/20240410-231032-arnaudb.json
  • 23:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 23:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 23:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T360332)', diff saved to https://phabricator.wikimedia.org/P60311 and previous config saved to /var/cache/conftool/dbconfig/20240410-231008-arnaudb.json
  • 22:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P60310 and previous config saved to /var/cache/conftool/dbconfig/20240410-225500-arnaudb.json
  • 22:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P60309 and previous config saved to /var/cache/conftool/dbconfig/20240410-223953-arnaudb.json
  • 22:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T360332)', diff saved to https://phabricator.wikimedia.org/P60308 and previous config saved to /var/cache/conftool/dbconfig/20240410-222445-arnaudb.json
  • 22:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T360332)', diff saved to https://phabricator.wikimedia.org/P60307 and previous config saved to /var/cache/conftool/dbconfig/20240410-222150-arnaudb.json
  • 22:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 22:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 22:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 22:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 22:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T360332)', diff saved to https://phabricator.wikimedia.org/P60306 and previous config saved to /var/cache/conftool/dbconfig/20240410-222028-arnaudb.json
  • 22:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P60305 and previous config saved to /var/cache/conftool/dbconfig/20240410-220521-arnaudb.json
  • 21:56 mutante: prometheus - recreating deleted TLS certs/keys in private repo
  • 21:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P60304 and previous config saved to /var/cache/conftool/dbconfig/20240410-215014-arnaudb.json
  • 21:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T360332)', diff saved to https://phabricator.wikimedia.org/P60303 and previous config saved to /var/cache/conftool/dbconfig/20240410-213506-arnaudb.json
  • 21:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T360332)', diff saved to https://phabricator.wikimedia.org/P60302 and previous config saved to /var/cache/conftool/dbconfig/20240410-213203-arnaudb.json
  • 21:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 21:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 21:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T360332)', diff saved to https://phabricator.wikimedia.org/P60301 and previous config saved to /var/cache/conftool/dbconfig/20240410-213140-arnaudb.json
  • 21:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P60300 and previous config saved to /var/cache/conftool/dbconfig/20240410-211632-arnaudb.json
  • 21:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P60298 and previous config saved to /var/cache/conftool/dbconfig/20240410-210125-arnaudb.json
  • 20:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T360332)', diff saved to https://phabricator.wikimedia.org/P60297 and previous config saved to /var/cache/conftool/dbconfig/20240410-204617-arnaudb.json
  • 20:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T360332)', diff saved to https://phabricator.wikimedia.org/P60296 and previous config saved to /var/cache/conftool/dbconfig/20240410-204316-arnaudb.json
  • 20:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 20:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 20:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T360332)', diff saved to https://phabricator.wikimedia.org/P60295 and previous config saved to /var/cache/conftool/dbconfig/20240410-204253-arnaudb.json
  • 20:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P60294 and previous config saved to /var/cache/conftool/dbconfig/20240410-202745-arnaudb.json
  • 20:17 cjming: end of UTC late backport window
  • 20:15 cjming@deploy1002: Finished scap: Backport for LogStash: log HtmlOutputRendererHelper channel (T356157) (duration: 13m 51s)
  • 20:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2112', diff saved to https://phabricator.wikimedia.org/P60293 and previous config saved to /var/cache/conftool/dbconfig/20240410-201237-arnaudb.json
  • 20:04 cjming@deploy1002: cjming and daniel: Continuing with sync
  • 20:04 cjming@deploy1002: cjming and daniel: Backport for LogStash: log HtmlOutputRendererHelper channel (T356157) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:01 cjming@deploy1002: Started scap: Backport for LogStash: log HtmlOutputRendererHelper channel (T356157)
  • 19:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T360332)', diff saved to https://phabricator.wikimedia.org/P60292 and previous config saved to /var/cache/conftool/dbconfig/20240410-195730-arnaudb.json
  • 19:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2112 (T360332)', diff saved to https://phabricator.wikimedia.org/P60291 and previous config saved to /var/cache/conftool/dbconfig/20240410-195430-arnaudb.json
  • 19:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 19:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 19:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 19:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 19:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 19:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 19:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 19:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 19:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 19:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 19:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 19:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 19:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T360332)', diff saved to https://phabricator.wikimedia.org/P60290 and previous config saved to /var/cache/conftool/dbconfig/20240410-194909-arnaudb.json
  • 19:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P60289 and previous config saved to /var/cache/conftool/dbconfig/20240410-193402-arnaudb.json
  • 19:24 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3071.esams.wmnet,service=(cdn|ats-be)
  • 19:20 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3071.esams.wmnet with OS bullseye
  • 19:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P60288 and previous config saved to /var/cache/conftool/dbconfig/20240410-191854-arnaudb.json
  • 19:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T360332)', diff saved to https://phabricator.wikimedia.org/P60287 and previous config saved to /var/cache/conftool/dbconfig/20240410-190347-arnaudb.json
  • 18:54 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3071.esams.wmnet with reason: host reimage
  • 18:51 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3071.esams.wmnet with reason: host reimage
  • 18:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T360332)', diff saved to https://phabricator.wikimedia.org/P60285 and previous config saved to /var/cache/conftool/dbconfig/20240410-184656-arnaudb.json
  • 18:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 18:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 18:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T360332)', diff saved to https://phabricator.wikimedia.org/P60284 and previous config saved to /var/cache/conftool/dbconfig/20240410-184633-arnaudb.json
  • 18:34 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 18:34 eevans@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 18:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P60283 and previous config saved to /var/cache/conftool/dbconfig/20240410-183126-arnaudb.json
  • 18:30 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1115.eqiad.wmnet,service=(cdn|ats-be)
  • 18:28 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp3071.esams.wmnet with OS bullseye
  • 18:26 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1115.eqiad.wmnet with OS bullseye
  • 18:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp3071.esams.wmnet,service=(cdn|ats-be)
  • 18:17 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 18:16 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 18:16 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 18:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P60282 and previous config saved to /var/cache/conftool/dbconfig/20240410-181618-arnaudb.json
  • 18:15 eevans@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 18:08 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 18:05 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1115.eqiad.wmnet with reason: host reimage
  • 18:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T360332)', diff saved to https://phabricator.wikimedia.org/P60281 and previous config saved to /var/cache/conftool/dbconfig/20240410-180111-arnaudb.json
  • 17:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T360332)', diff saved to https://phabricator.wikimedia.org/P60280 and previous config saved to /var/cache/conftool/dbconfig/20240410-175816-arnaudb.json
  • 17:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 17:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 17:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T360332)', diff saved to https://phabricator.wikimedia.org/P60279 and previous config saved to /var/cache/conftool/dbconfig/20240410-175752-arnaudb.json
  • 17:48 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 17:48 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS bullseye
  • 17:46 swfrench-wmf: finished updating A:conf hosts to etcd-mirror 0.0.11-1 (T358636)
  • 17:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P60278 and previous config saved to /var/cache/conftool/dbconfig/20240410-174244-arnaudb.json
  • 17:37 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS bullseye
  • 17:37 swfrench-wmf: restarting etcd-mirror on conf2005.codfw.wmnet for T358636
  • 17:35 sukhe@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp1115.eqiad.wmnet
  • 17:34 sukhe@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1115.eqiad.wmnet
  • 17:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P60277 and previous config saved to /var/cache/conftool/dbconfig/20240410-172736-arnaudb.json
  • 17:21 hashar@deploy1002: Finished scap: Backport for TitleLibrary: Don't register external titles as dependencies (T362222) (duration: 18m 53s)
  • 17:14 sukhe@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp1115.eqiad.wmnet
  • 17:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T360332)', diff saved to https://phabricator.wikimedia.org/P60276 and previous config saved to /var/cache/conftool/dbconfig/20240410-171229-arnaudb.json
  • 17:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T360332)', diff saved to https://phabricator.wikimedia.org/P60275 and previous config saved to /var/cache/conftool/dbconfig/20240410-170930-arnaudb.json
  • 17:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 17:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 17:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T360332)', diff saved to https://phabricator.wikimedia.org/P60274 and previous config saved to /var/cache/conftool/dbconfig/20240410-170907-arnaudb.json
  • 17:07 hashar@deploy1002: hashar: Continuing with sync
  • 17:07 hashar@deploy1002: hashar: Backport for TitleLibrary: Don't register external titles as dependencies (T362222) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:06 sukhe@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1115.eqiad.wmnet
  • 17:06 sukhe@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp1115.eqiad.wmnet
  • 17:05 sukhe@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp1115.eqiad.wmnet
  • 17:05 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1115.eqiad.wmnet,service=(cdn|ats-be)
  • 17:04 sukhe: depool cp1115 for firmware downgrade for PXE boot testing: T350179
  • 17:04 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:03 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:02 hnowlan: killing long-running videoscaler ffmpegs
  • 17:02 hashar@deploy1002: Started scap: Backport for TitleLibrary: Don't register external titles as dependencies (T362222)
  • 16:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P60272 and previous config saved to /var/cache/conftool/dbconfig/20240410-165359-arnaudb.json
  • 16:50 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:50 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:50 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:50 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P60270 and previous config saved to /var/cache/conftool/dbconfig/20240410-163851-arnaudb.json
  • 16:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T360332)', diff saved to https://phabricator.wikimedia.org/P60269 and previous config saved to /var/cache/conftool/dbconfig/20240410-162344-arnaudb.json
  • 16:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1228 (T360332)', diff saved to https://phabricator.wikimedia.org/P60268 and previous config saved to /var/cache/conftool/dbconfig/20240410-162101-arnaudb.json
  • 16:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1228.eqiad.wmnet with reason: Maintenance
  • 16:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1228.eqiad.wmnet with reason: Maintenance
  • 16:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T360332)', diff saved to https://phabricator.wikimedia.org/P60267 and previous config saved to /var/cache/conftool/dbconfig/20240410-162039-arnaudb.json
  • 16:19 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 16:19 elukey@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 16:16 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 16:16 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 16:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 16:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
  • 16:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 16:13 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
  • 16:12 swfrench-wmf: uploaded etcd-mirror 0.0.11-1 to apt.wikimedia.org (T358636)
  • 16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P60265 and previous config saved to /var/cache/conftool/dbconfig/20240410-160531-arnaudb.json
  • 15:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P60264 and previous config saved to /var/cache/conftool/dbconfig/20240410-155024-arnaudb.json
  • 15:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T360332)', diff saved to https://phabricator.wikimedia.org/P60262 and previous config saved to /var/cache/conftool/dbconfig/20240410-153516-arnaudb.json
  • 15:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T360332)', diff saved to https://phabricator.wikimedia.org/P60261 and previous config saved to /var/cache/conftool/dbconfig/20240410-153229-arnaudb.json
  • 15:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 15:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 15:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T360332)', diff saved to https://phabricator.wikimedia.org/P60260 and previous config saved to /var/cache/conftool/dbconfig/20240410-153207-arnaudb.json
  • 15:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P60259 and previous config saved to /var/cache/conftool/dbconfig/20240410-151659-arnaudb.json
  • 15:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:14 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T356166)', diff saved to https://phabricator.wikimedia.org/P60258 and previous config saved to /var/cache/conftool/dbconfig/20240410-150327-marostegui.json
  • 15:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 15:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 15:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T356166)', diff saved to https://phabricator.wikimedia.org/P60257 and previous config saved to /var/cache/conftool/dbconfig/20240410-150304-marostegui.json
  • 15:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P60256 and previous config saved to /var/cache/conftool/dbconfig/20240410-150152-arnaudb.json
  • 14:58 moritzm: installing debian-archive-keyring updates on buster
  • 14:55 akosiaris: kill all ffmpegs on mw1437 and increase weight of mw1347 from 10 to 30 to direct most queries to it while the other 3 videoscalers serve the backlog
  • 14:54 akosiaris@cumin1002: conftool action : set/weight=30; selector: name=mw1437.*.wmnet,dc=eqiad
  • 14:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P60255 and previous config saved to /var/cache/conftool/dbconfig/20240410-144757-marostegui.json
  • 14:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T360332)', diff saved to https://phabricator.wikimedia.org/P60254 and previous config saved to /var/cache/conftool/dbconfig/20240410-144644-arnaudb.json
  • 14:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T360332)', diff saved to https://phabricator.wikimedia.org/P60253 and previous config saved to /var/cache/conftool/dbconfig/20240410-144400-arnaudb.json
  • 14:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 14:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 14:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T360332)', diff saved to https://phabricator.wikimedia.org/P60252 and previous config saved to /var/cache/conftool/dbconfig/20240410-144336-arnaudb.json
  • 14:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P60251 and previous config saved to /var/cache/conftool/dbconfig/20240410-143249-marostegui.json
  • 14:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P60250 and previous config saved to /var/cache/conftool/dbconfig/20240410-142829-arnaudb.json
  • 14:21 sukhe@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp4052.ulsfo.wmnet
  • 14:20 sukhe@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4052.ulsfo.wmnet
  • 14:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T356166)', diff saved to https://phabricator.wikimedia.org/P60249 and previous config saved to /var/cache/conftool/dbconfig/20240410-141742-marostegui.json
  • 14:17 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1112.eqiad.wmnet,service=(cdn|ats-be)
  • 14:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P60248 and previous config saved to /var/cache/conftool/dbconfig/20240410-141322-arnaudb.json
  • 14:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1112.eqiad.wmnet with OS bullseye
  • 13:58 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet,service=(cdn|ats-be)
  • 13:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T360332)', diff saved to https://phabricator.wikimedia.org/P60246 and previous config saved to /var/cache/conftool/dbconfig/20240410-135814-arnaudb.json
  • 13:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T360332)', diff saved to https://phabricator.wikimedia.org/P60245 and previous config saved to /var/cache/conftool/dbconfig/20240410-135525-arnaudb.json
  • 13:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 13:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 13:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T360332)', diff saved to https://phabricator.wikimedia.org/P60244 and previous config saved to /var/cache/conftool/dbconfig/20240410-135502-arnaudb.json
  • 13:54 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bullseye
  • 13:49 denisse: Delete unused Prometheus TLS certificates - T360414
  • 13:47 moritzm: installing unbound security updates
  • 13:46 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 13:43 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
  • 13:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P60243 and previous config saved to /var/cache/conftool/dbconfig/20240410-133955-arnaudb.json
  • 13:39 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 13:33 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 13:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on elastic2088.codfw.wmnet with reason: T361525
  • 13:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on elastic2088.codfw.wmnet with reason: T361525
  • 13:30 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 13:28 eevans@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 13:27 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 13:26 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS bullseye
  • 13:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P60242 and previous config saved to /var/cache/conftool/dbconfig/20240410-132447-arnaudb.json
  • 13:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T356166)', diff saved to https://phabricator.wikimedia.org/P60241 and previous config saved to /var/cache/conftool/dbconfig/20240410-131716-marostegui.json
  • 13:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 13:17 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS bullseye
  • 13:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T356166)', diff saved to https://phabricator.wikimedia.org/P60240 and previous config saved to /var/cache/conftool/dbconfig/20240410-131653-marostegui.json
  • 13:16 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1112.eqiad.wmnet,service=(cdn|ats-be)
  • 13:09 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:09 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test restoring dns entry - volans@cumin2002"
  • 13:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T360332)', diff saved to https://phabricator.wikimedia.org/P60239 and previous config saved to /var/cache/conftool/dbconfig/20240410-130940-arnaudb.json
  • 13:09 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test restoring dns entry - volans@cumin2002"
  • 13:07 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bullseye
  • 13:07 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 13:07 sukhe: depool cp4052 for PXE boot issue testing
  • 13:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T360332)', diff saved to https://phabricator.wikimedia.org/P60238 and previous config saved to /var/cache/conftool/dbconfig/20240410-130650-arnaudb.json
  • 13:07 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet,service=(cdn|ats-be)
  • 13:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T360332)', diff saved to https://phabricator.wikimedia.org/P60237 and previous config saved to /var/cache/conftool/dbconfig/20240410-130626-arnaudb.json
  • 13:05 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:05 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test removing dns entry - volans@cumin2002"
  • 13:05 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=authdns-update
  • 13:04 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: test removing dns entry - volans@cumin2002"
  • 13:02 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P60236 and previous config saved to /var/cache/conftool/dbconfig/20240410-130145-marostegui.json
  • 12:59 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test2003.wikimedia.org
  • 12:59 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:59 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 12:56 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=authdns-update
  • 12:56 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2003.wikimedia.org decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 12:53 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 12:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P60235 and previous config saved to /var/cache/conftool/dbconfig/20240410-125119-arnaudb.json
  • 12:48 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts idp-test2003.wikimedia.org
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P60234 and previous config saved to /var/cache/conftool/dbconfig/20240410-124638-marostegui.json
  • 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update for latest VMs - jmm@cumin2002"
  • 12:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update for latest VMs - jmm@cumin2002"
  • 12:25 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60231 and previous config saved to /var/cache/conftool/dbconfig/20240410-122518-root.json
  • 12:25 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T360332)', diff saved to https://phabricator.wikimedia.org/P60230 and previous config saved to /var/cache/conftool/dbconfig/20240410-122104-arnaudb.json
  • 12:20 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1002.wikimedia.org with reason: host reimage
  • 12:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T360332)', diff saved to https://phabricator.wikimedia.org/P60229 and previous config saved to /var/cache/conftool/dbconfig/20240410-121814-arnaudb.json
  • 12:18 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1002.wikimedia.org with reason: host reimage
  • 12:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 12:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 12:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T360332)', diff saved to https://phabricator.wikimedia.org/P60228 and previous config saved to /var/cache/conftool/dbconfig/20240410-121743-arnaudb.json
  • 12:15 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki --property-id P4496 --new-data-type external-id --summary 'T359297' # succeeded
  • 12:14 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 ~ $ mwscript-k8s extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki --property-id P4496 --new-data-type external-id --summary 'T359297' # failed, will retry with non-k8s mwscript
  • 12:12 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(mw1421.eqiad.wmnet|mw1422.eqiad.wmnet|mw1491.eqiad.wmnet|mw1492.eqiad.wmnet|mw1493.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 12:11 claime: Pooling and uncordoning mw1421.eqiad.wmnet,mw1422.eqiad.wmnet,mw1491.eqiad.wmnet,mw1492.eqiad.wmnet,mw1493.eqiad.wmnet - T351074
  • 12:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60227 and previous config saved to /var/cache/conftool/dbconfig/20240410-121012-root.json
  • 12:04 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1002.wikimedia.org with OS bookworm
  • 12:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P60226 and previous config saved to /var/cache/conftool/dbconfig/20240410-120235-arnaudb.json
  • 12:01 claime: Running homer 'cr*eqiad*' commit 'T351074' and homer 'lsw1-e3-eqiad*' commit 'T351074'
  • 11:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60225 and previous config saved to /var/cache/conftool/dbconfig/20240410-115506-root.json
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1492.eqiad.wmnet with OS bullseye
  • 11:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1491.eqiad.wmnet with OS bullseye
  • 11:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1422.eqiad.wmnet with OS bullseye
  • 11:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P60224 and previous config saved to /var/cache/conftool/dbconfig/20240410-114728-arnaudb.json
  • 11:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1493.eqiad.wmnet with OS bullseye
  • 11:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1421.eqiad.wmnet with OS bullseye
  • 11:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60223 and previous config saved to /var/cache/conftool/dbconfig/20240410-114001-root.json
  • 11:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1492.eqiad.wmnet with reason: host reimage
  • 11:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1491.eqiad.wmnet with reason: host reimage
  • 11:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T360332)', diff saved to https://phabricator.wikimedia.org/P60222 and previous config saved to /var/cache/conftool/dbconfig/20240410-113220-arnaudb.json
  • 11:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: host reimage
  • 11:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T360332)', diff saved to https://phabricator.wikimedia.org/P60221 and previous config saved to /var/cache/conftool/dbconfig/20240410-112929-arnaudb.json
  • 11:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 11:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 11:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T360332)', diff saved to https://phabricator.wikimedia.org/P60220 and previous config saved to /var/cache/conftool/dbconfig/20240410-112907-arnaudb.json
  • 11:28 jiji@deploy1002: Finished scap: Deploy chart changes in gerrit:1015342 (duration: 08m 18s)
  • 11:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1493.eqiad.wmnet with reason: host reimage
  • 11:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60219 and previous config saved to /var/cache/conftool/dbconfig/20240410-112455-root.json
  • 11:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: host reimage
  • 11:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1493.eqiad.wmnet with reason: host reimage
  • 11:22 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1492.eqiad.wmnet with reason: host reimage
  • 11:22 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1491.eqiad.wmnet with reason: host reimage
  • 11:21 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: host reimage
  • 11:21 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: host reimage
  • 11:19 jiji@deploy1002: Started scap: Deploy chart changes in gerrit:1015342
  • 11:16 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:15 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:14 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P60218 and previous config saved to /var/cache/conftool/dbconfig/20240410-111400-arnaudb.json
  • 11:13 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1493.eqiad.wmnet with OS bullseye
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1175 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60217 and previous config saved to /var/cache/conftool/dbconfig/20240410-110949-root.json
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1492.eqiad.wmnet with OS bullseye
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1491.eqiad.wmnet with OS bullseye
  • 11:08 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1422.eqiad.wmnet with OS bullseye
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1421.eqiad.wmnet with OS bullseye
  • 11:07 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:07 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:03 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:02 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:02 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:02 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:59 claime: Depooling mw1421.eqiad.wmnet,mw1422.eqiad.wmnet,mw1491.eqiad.wmnet,mw1492.eqiad.wmnet,mw1493.eqiad.wmnet - T351074
  • 10:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P60216 and previous config saved to /var/cache/conftool/dbconfig/20240410-105852-arnaudb.json
  • 10:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1175.eqiad.wmnet with OS bookworm
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1175 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60215 and previous config saved to /var/cache/conftool/dbconfig/20240410-105444-root.json
  • 10:53 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T360332)', diff saved to https://phabricator.wikimedia.org/P60214 and previous config saved to /var/cache/conftool/dbconfig/20240410-104345-arnaudb.json
  • 10:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T360332)', diff saved to https://phabricator.wikimedia.org/P60213 and previous config saved to /var/cache/conftool/dbconfig/20240410-104053-arnaudb.json
  • 10:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T360332)', diff saved to https://phabricator.wikimedia.org/P60212 and previous config saved to /var/cache/conftool/dbconfig/20240410-104030-arnaudb.json
  • 10:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage
  • 10:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage
  • 10:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P60211 and previous config saved to /var/cache/conftool/dbconfig/20240410-102523-arnaudb.json
  • 10:21 claime: Enabling and running puppet on O:docker_registry_ha::registry - T360636
  • 10:19 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bookworm
  • 10:18 claime: Enabling and running puppet on registry1003.eqiad.wmnet - T360636
  • 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1175 T362036', diff saved to https://phabricator.wikimedia.org/P60210 and previous config saved to /var/cache/conftool/dbconfig/20240410-101746-root.json
  • 10:16 claime: Disabling puppet on O:docker_registry_ha::registry - T360636
  • 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mariadb::sanitarium_master
  • 10:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P60209 and previous config saved to /var/cache/conftool/dbconfig/20240410-101015-arnaudb.json
  • 10:08 jiji@deploy1002: Finished scap: (no justification provided) (duration: 27m 59s)
  • 09:58 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: mariadb::sanitarium_master
  • 09:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T360332)', diff saved to https://phabricator.wikimedia.org/P60208 and previous config saved to /var/cache/conftool/dbconfig/20240410-095508-arnaudb.json
  • 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1163 (T360332)', diff saved to https://phabricator.wikimedia.org/P60207 and previous config saved to /var/cache/conftool/dbconfig/20240410-095214-arnaudb.json
  • 09:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 09:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 09:42 effie: running scap sync-world to rebuild mw image and pick up gerrit:1015338
  • 09:40 jiji@deploy1002: Started scap: (no justification provided)
  • 08:49 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3070.esams.wmnet
  • 08:42 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3070.esams.wmnet with OS bullseye
  • 08:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 100%: Post clone repool (dst)', diff saved to https://phabricator.wikimedia.org/P60206 and previous config saved to /var/cache/conftool/dbconfig/20240410-083822-arnaudb.json
  • 08:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:34 gmodena@deploy1002: Finished deploy [airflow-dags/analytics@46818a3]: Deploying cassandra_load_pageview_top_articles changes MR#648 (duration: 00m 33s)
  • 08:34 hashar@deploy1002: Synchronized php: group1 wikis to 1.42.0-wmf.26 refs T360158 (duration: 13m 05s)
  • 08:34 gmodena@deploy1002: Started deploy [airflow-dags/analytics@46818a3]: Deploying cassandra_load_pageview_top_articles changes MR#648
  • 08:25 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:25 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:25 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:25 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:24 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:24 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 75%: Post clone repool (dst)', diff saved to https://phabricator.wikimedia.org/P60205 and previous config saved to /var/cache/conftool/dbconfig/20240410-082316-arnaudb.json
  • 08:21 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.26 refs T360158
  • 08:18 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3070.esams.wmnet with reason: host reimage
  • 08:15 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3070.esams.wmnet with reason: host reimage
  • 08:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 50%: Post clone repool (dst)', diff saved to https://phabricator.wikimedia.org/P60204 and previous config saved to /var/cache/conftool/dbconfig/20240410-080810-arnaudb.json
  • 07:56 moritzm: installing glibc security updates on bullseye
  • 07:53 arnaudb@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 25%: Post clone repool (dst)', diff saved to https://phabricator.wikimedia.org/P60203 and previous config saved to /var/cache/conftool/dbconfig/20240410-075304-arnaudb.json
  • 07:52 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp3070.esams.wmnet with OS bullseye
  • 07:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: Post clone (src)', diff saved to https://phabricator.wikimedia.org/P60202 and previous config saved to /var/cache/conftool/dbconfig/20240410-075150-arnaudb.json
  • 07:50 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3070.esams.wmnet
  • 07:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 16%: Post clone repool (dst)', diff saved to https://phabricator.wikimedia.org/P60201 and previous config saved to /var/cache/conftool/dbconfig/20240410-073759-arnaudb.json
  • 07:36 arnaudb@cumin1002: dbctl commit (dc=all): 'db2112 (re)pooling @ 75%: Post clone (src)', diff saved to https://phabricator.wikimedia.org/P60200 and previous config saved to /var/cache/conftool/dbconfig/20240410-073644-arnaudb.json
  • 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::server::spare
  • 07:29 akosiaris@deploy1002: Synchronized wmf-config/mc.php: Dummy sync for https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1018332 (duration: 14m 03s)
  • 07:25 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::server::spare
  • 07:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 8%: Post clone repool (dst)', diff saved to https://phabricator.wikimedia.org/P60199 and previous config saved to /var/cache/conftool/dbconfig/20240410-072253-arnaudb.json
  • 07:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db2112 (re)pooling @ 50%: Post clone (src)', diff saved to https://phabricator.wikimedia.org/P60198 and previous config saved to /var/cache/conftool/dbconfig/20240410-072137-arnaudb.json
  • 07:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 4%: Post clone repool (dst)', diff saved to https://phabricator.wikimedia.org/P60197 and previous config saved to /var/cache/conftool/dbconfig/20240410-070745-arnaudb.json
  • 07:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db2112 (re)pooling @ 25%: Post clone (src)', diff saved to https://phabricator.wikimedia.org/P60196 and previous config saved to /var/cache/conftool/dbconfig/20240410-070631-arnaudb.json
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60195 and previous config saved to /var/cache/conftool/dbconfig/20240410-065929-root.json
  • 06:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 2%: Post clone repool (dst)', diff saved to https://phabricator.wikimedia.org/P60194 and previous config saved to /var/cache/conftool/dbconfig/20240410-065239-arnaudb.json
  • 06:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db2112 (re)pooling @ 20%: Post clone (src)', diff saved to https://phabricator.wikimedia.org/P60193 and previous config saved to /var/cache/conftool/dbconfig/20240410-065125-arnaudb.json
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60192 and previous config saved to /var/cache/conftool/dbconfig/20240410-064423-root.json
  • 06:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 1%: Post clone repool (dst)', diff saved to https://phabricator.wikimedia.org/P60191 and previous config saved to /var/cache/conftool/dbconfig/20240410-063734-arnaudb.json
  • 06:36 arnaudb@cumin1002: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: Post clone (src)', diff saved to https://phabricator.wikimedia.org/P60190 and previous config saved to /var/cache/conftool/dbconfig/20240410-063620-arnaudb.json
  • 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60189 and previous config saved to /var/cache/conftool/dbconfig/20240410-062917-root.json
  • 06:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db2112 (re)pooling @ 5%: Post clone (src)', diff saved to https://phabricator.wikimedia.org/P60188 and previous config saved to /var/cache/conftool/dbconfig/20240410-062114-arnaudb.json
  • 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P60187 and previous config saved to /var/cache/conftool/dbconfig/20240410-062003-root.json
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60186 and previous config saved to /var/cache/conftool/dbconfig/20240410-061411-root.json
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P60185 and previous config saved to /var/cache/conftool/dbconfig/20240410-060457-root.json
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60184 and previous config saved to /var/cache/conftool/dbconfig/20240410-055906-root.json
  • 05:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P60183 and previous config saved to /var/cache/conftool/dbconfig/20240410-054952-root.json
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60182 and previous config saved to /var/cache/conftool/dbconfig/20240410-054400-root.json
  • 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P60181 and previous config saved to /var/cache/conftool/dbconfig/20240410-053445-root.json
  • 05:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1166.eqiad.wmnet with OS bookworm
  • 05:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60180 and previous config saved to /var/cache/conftool/dbconfig/20240410-052854-root.json
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P60179 and previous config saved to /var/cache/conftool/dbconfig/20240410-051939-root.json
  • 05:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: host reimage
  • 05:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: host reimage
  • 05:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P60178 and previous config saved to /var/cache/conftool/dbconfig/20240410-050434-root.json
  • 04:58 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1166.eqiad.wmnet with OS bookworm
  • 04:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1166 T362134', diff saved to https://phabricator.wikimedia.org/P60177 and previous config saved to /var/cache/conftool/dbconfig/20240410-045710-marostegui.json
  • 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repool db1223', diff saved to https://phabricator.wikimedia.org/P60176 and previous config saved to /var/cache/conftool/dbconfig/20240410-045632-marostegui.json
  • 04:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1223 T362134', diff saved to https://phabricator.wikimedia.org/P60175 and previous config saved to /var/cache/conftool/dbconfig/20240410-045534-marostegui.json
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P60174 and previous config saved to /var/cache/conftool/dbconfig/20240410-044928-root.json
  • 04:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Kernel reboot
  • 04:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Kernel reboot
  • 04:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T356166)', diff saved to https://phabricator.wikimedia.org/P60173 and previous config saved to /var/cache/conftool/dbconfig/20240410-041604-marostegui.json
  • 04:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 04:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 04:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T356166)', diff saved to https://phabricator.wikimedia.org/P60172 and previous config saved to /var/cache/conftool/dbconfig/20240410-041541-marostegui.json
  • 04:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P60171 and previous config saved to /var/cache/conftool/dbconfig/20240410-040033-marostegui.json
  • 03:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P60170 and previous config saved to /var/cache/conftool/dbconfig/20240410-034526-marostegui.json
  • 03:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T356166)', diff saved to https://phabricator.wikimedia.org/P60169 and previous config saved to /var/cache/conftool/dbconfig/20240410-033019-marostegui.json

2024-04-09

  • 23:17 eileen: config revision changed from 7908b55e to 974afe9c
  • 23:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T360332)', diff saved to https://phabricator.wikimedia.org/P60168 and previous config saved to /var/cache/conftool/dbconfig/20240409-230828-arnaudb.json
  • 23:08 eileen: config revision changed from 064d18b0 to 7908b55e
  • 22:58 eileen: config revision changed from 8fb02f33 to 064d18b0
  • 22:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P60167 and previous config saved to /var/cache/conftool/dbconfig/20240409-225321-arnaudb.json
  • 22:51 eileen: config revision changed from df416a50 to 8fb02f33
  • 22:48 eileen: config revision changed from cea14e30 to df416a50
  • 22:42 eileen: config revision changed from 075ddd44 to cea14e30
  • 22:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P60166 and previous config saved to /var/cache/conftool/dbconfig/20240409-223813-arnaudb.json
  • 22:24 eileen: config revision changed from 3c1a0267 to 4638a4d2
  • 22:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T360332)', diff saved to https://phabricator.wikimedia.org/P60165 and previous config saved to /var/cache/conftool/dbconfig/20240409-222306-arnaudb.json
  • 22:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T360332)', diff saved to https://phabricator.wikimedia.org/P60164 and previous config saved to /var/cache/conftool/dbconfig/20240409-220755-arnaudb.json
  • 22:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 22:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 22:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T360332)', diff saved to https://phabricator.wikimedia.org/P60163 and previous config saved to /var/cache/conftool/dbconfig/20240409-220732-arnaudb.json
  • 22:03 eileen: civicrm upgraded from b05fd08f to a382a7b0
  • 21:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P60162 and previous config saved to /var/cache/conftool/dbconfig/20240409-215225-arnaudb.json
  • 21:38 eileen: civicrm upgraded from 8c7cc208 to b05fd08f
  • 21:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P60161 and previous config saved to /var/cache/conftool/dbconfig/20240409-213717-arnaudb.json
  • 21:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T360332)', diff saved to https://phabricator.wikimedia.org/P60160 and previous config saved to /var/cache/conftool/dbconfig/20240409-212210-arnaudb.json
  • 21:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T360332)', diff saved to https://phabricator.wikimedia.org/P60159 and previous config saved to /var/cache/conftool/dbconfig/20240409-210656-arnaudb.json
  • 21:07 arnaudb@cumin1002: END (PASS)