Jump to content

Server Admin Log

From Wikitech

2026-05-19

  • 12:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS trixie
  • 12:28 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
  • 12:28 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 12:27 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
  • 12:27 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
  • 12:26 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
  • 12:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
  • 12:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
  • 12:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS trixie
  • 12:24 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage
  • 12:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS trixie
  • 12:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 12:21 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS trixie
  • 12:20 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1279.eqiad.wmnet with OS trixie
  • 12:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
  • 12:19 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
  • 12:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
  • 12:18 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 12:18 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus7002.magru.wmnet
  • 12:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS trixie
  • 12:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
  • 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 12:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage
  • 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet
  • 12:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS trixie
  • 12:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage
  • 12:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be2004.codfw.wmnet
  • 12:12 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus7002.magru.wmnet
  • 12:12 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
  • 12:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage
  • 12:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS trixie
  • 12:09 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 12:08 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 12:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host apus-be2004.codfw.wmnet
  • 12:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage
  • 12:07 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be2005.codfw.wmnet
  • 12:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet
  • 12:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS trixie
  • 12:06 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
  • 12:06 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
  • 12:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS trixie
  • 12:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage
  • 12:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host apus-be2005.codfw.wmnet
  • 12:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be2006.codfw.wmnet
  • 12:00 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage
  • 11:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
  • 11:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
  • 11:57 taavi@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudidp2001-dev.codfw.wmnet
  • 11:56 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage
  • 11:56 cgoubert@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-eqiad
  • 11:56 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host apus-be2006.codfw.wmnet
  • 11:56 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
  • 11:56 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 11:53 taavi@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudidp2001-dev.codfw.wmnet
  • 11:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage
  • 11:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 18 hosts with reason: restart
  • 11:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage
  • 11:49 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage
  • 11:49 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage
  • 11:48 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage
  • 11:48 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage
  • 11:47 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage
  • 11:47 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage
  • 11:46 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 11:46 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5003.eqsin.wmnet
  • 11:45 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage
  • 11:45 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage
  • 11:44 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage
  • 11:44 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage
  • 11:44 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage
  • 11:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
  • 11:39 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus5003.eqsin.wmnet
  • 11:39 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1008.eqiad.wmnet
  • 11:39 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[1003-1004].eqiad.wmnet with reason: restart
  • 11:37 moritzm: failover Ganeti cluster in eqsin to ganeti5004
  • 11:37 moritzm: failover Ganeti cluster in magru to ganeti7001
  • 11:36 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS trixie
  • 11:35 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS trixie
  • 11:35 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS trixie
  • 11:35 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS trixie
  • 11:34 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
  • 11:34 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS trixie
  • 11:34 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS trixie
  • 11:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
  • 11:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
  • 11:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
  • 11:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
  • 11:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS trixie
  • 11:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS trixie
  • 11:31 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS trixie
  • 11:31 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS trixie
  • 11:31 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS trixie
  • 11:29 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus1008.eqiad.wmnet
  • 11:29 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 11:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
  • 11:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
  • 11:24 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS trixie
  • 11:21 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS trixie
  • 11:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
  • 11:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 11:19 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 11:19 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4003.ulsfo.wmnet
  • 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7002.magru.wmnet
  • 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
  • 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
  • 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
  • 11:16 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS trixie
  • 11:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS trixie
  • 11:13 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus4003.ulsfo.wmnet
  • 11:13 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3004.esams.wmnet
  • 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage
  • 11:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
  • 11:10 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
  • 11:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
  • 11:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
  • 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS trixie
  • 11:07 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus3004.esams.wmnet
  • 11:07 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1007.eqiad.wmnet
  • 11:06 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage
  • 11:06 cgoubert@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-eqiad
  • 11:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7002.magru.wmnet
  • 11:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
  • 11:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS trixie
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 11:02 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage
  • 11:02 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS trixie
  • 11:00 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1002.eqiad.wmnet
  • 10:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage
  • 10:59 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus1007.eqiad.wmnet
  • 10:59 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 10:57 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS trixie
  • 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage
  • 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS trixie
  • 10:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
  • 10:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 10:53 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host titan1002.eqiad.wmnet
  • 10:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
  • 10:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS trixie
  • 10:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
  • 10:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage
  • 10:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be1005.eqiad.wmnet
  • 10:50 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D{aux-k8s-worker100[2-5].eqiad.wmnet} and (A:aux-master-eqiad or A:aux-worker-eqiad)
  • 10:50 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1005.eqiad.wmnet
  • 10:50 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1005.eqiad.wmnet
  • 10:49 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 10:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS trixie
  • 10:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage
  • 10:46 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1005.eqiad.wmnet
  • 10:45 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1005.eqiad.wmnet
  • 10:45 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1004.eqiad.wmnet
  • 10:45 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1004.eqiad.wmnet
  • 10:45 cgoubert@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-codfw
  • 10:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host apus-be1005.eqiad.wmnet
  • 10:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be1004.eqiad.wmnet
  • 10:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage
  • 10:41 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1004.eqiad.wmnet
  • 10:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage
  • 10:40 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1004.eqiad.wmnet
  • 10:40 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1003.eqiad.wmnet
  • 10:40 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1003.eqiad.wmnet
  • 10:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host apus-be1004.eqiad.wmnet
  • 10:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be1006.eqiad.wmnet
  • 10:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage
  • 10:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage
  • 10:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage
  • 10:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2021.codfw.wmnet,pc[1011,1021].eqiad.wmnet with reason: Maintenance on pc1
  • 10:36 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1003.eqiad.wmnet
  • 10:36 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage
  • 10:36 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage
  • 10:36 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage
  • 10:36 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1003.eqiad.wmnet
  • 10:36 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1002.eqiad.wmnet
  • 10:36 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1002.eqiad.wmnet
  • 10:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage
  • 10:34 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage
  • 10:34 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage
  • 10:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage
  • 10:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage
  • 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage
  • 10:32 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1002.eqiad.wmnet
  • 10:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host apus-be1006.eqiad.wmnet
  • 10:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1003.eqiad.wmnet
  • 10:29 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage
  • 10:26 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1002.eqiad.wmnet
  • 10:26 elukey@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D{aux-k8s-worker100[2-5].eqiad.wmnet} and (A:aux-master-eqiad or A:aux-worker-eqiad)
  • 10:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be1003.eqiad.wmnet
  • 10:24 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS trixie
  • 10:24 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS trixie
  • 10:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS trixie
  • 10:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS trixie
  • 10:22 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS trixie
  • 10:22 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS trixie
  • 10:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS trixie
  • 10:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS trixie
  • 10:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS trixie
  • 10:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS trixie
  • 10:18 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
  • 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS trixie
  • 10:10 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
  • 10:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
  • 10:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2244: Migration of db2244.codfw.wmnet completed
  • 10:07 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1008-dev.eqiad.wmnet
  • 10:05 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1001.eqiad.wmnet
  • 10:00 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1008-dev.eqiad.wmnet
  • 10:00 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet
  • 09:58 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2011.codfw.wmnet
  • 09:58 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet
  • 09:55 cgoubert@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-codfw
  • 09:53 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host rdb2011.codfw.wmnet
  • 09:51 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2010-dev.codfw.wmnet
  • 09:51 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet
  • 09:48 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2012.codfw.wmnet
  • 09:47 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
  • 09:43 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host rdb2012.codfw.wmnet
  • 09:42 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2006-dev.codfw.wmnet
  • 09:42 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet
  • 09:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40401
  • 09:38 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 40401
  • 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2001.codfw.wmnet
  • 09:34 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2005-dev.codfw.wmnet
  • 09:31 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1006.eqiad.wmnet
  • 09:31 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:aux-worker-eqiad
  • 09:31 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1009.eqiad.wmnet
  • 09:31 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1009.eqiad.wmnet
  • 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
  • 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
  • 09:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2001.codfw.wmnet
  • 09:26 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1009.eqiad.wmnet
  • 09:25 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1009.eqiad.wmnet
  • 09:25 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1008.eqiad.wmnet
  • 09:25 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1008.eqiad.wmnet
  • 09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
  • 09:23 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudnet1006.eqiad.wmnet
  • 09:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2244: Migration of db2244.codfw.wmnet completed
  • 09:20 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1005.eqiad.wmnet
  • 09:20 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2244.codfw.wmnet with OS trixie
  • 09:20 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1008.eqiad.wmnet
  • 09:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
  • 09:19 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1008.eqiad.wmnet
  • 09:19 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1007.eqiad.wmnet
  • 09:19 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1007.eqiad.wmnet
  • 09:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 09:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 09:18 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 09:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1007.eqiad.wmnet
  • 09:13 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1007.eqiad.wmnet
  • 09:13 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host aux-k8s-worker1006.eqiad.wmnet
  • 09:13 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host aux-k8s-worker1006.eqiad.wmnet
  • 09:13 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
  • 09:13 filippo@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudnet1006.eqiad.wmnet
  • 09:13 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudnet1006.eqiad.wmnet
  • 09:08 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1006.eqiad.wmnet
  • 09:07 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1006.eqiad.wmnet
  • 09:05 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2006-dev.codfw.wmnet
  • 09:05 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1001.eqiad.wmnet
  • 09:04 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-gutter-codfw
  • 09:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2244.codfw.wmnet with reason: host reimage
  • 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
  • 09:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2244.codfw.wmnet with reason: host reimage
  • 08:59 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1001.eqiad.wmnet
  • 08:58 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1002.eqiad.wmnet
  • 08:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
  • 08:57 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudnet2006-dev.codfw.wmnet
  • 08:57 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2005-dev.codfw.wmnet
  • 08:55 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Remove unused $wgEnableUserEmailMuteList config (T413867) (duration: 07m 15s)
  • 08:52 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1002.eqiad.wmnet
  • 08:51 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
  • 08:50 dreamyjazz@deploy1003: dreamyjazz: Backport for Remove unused $wgEnableUserEmailMuteList config (T413867) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:49 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudnet2005-dev.codfw.wmnet
  • 08:49 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2007-dev.codfw.wmnet
  • 08:49 elukey@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:aux-worker-eqiad
  • 08:48 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1072.eqiad.wmnet
  • 08:48 dreamyjazz@deploy1003: Started scap sync-world: Backport for Remove unused $wgEnableUserEmailMuteList config (T413867)
  • 08:47 moritzm: failover Ganeti cluster in ulsfo to ganeti4008
  • 08:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
  • 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2004.codfw.wmnet
  • 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
  • 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 08:44 kharlan@deploy1003: Finished scap sync-world: Backport for IPReputation: Route opensearch_ipoid through envoy service mesh (T421293) (duration: 09m 08s)
  • 08:44 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-gutter-codfw
  • 08:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2244.codfw.wmnet with OS trixie
  • 08:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 08:43 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc1072.eqiad.wmnet
  • 08:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1071.eqiad.wmnet
  • 08:42 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudnet2007-dev.codfw.wmnet
  • 08:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2004.codfw.wmnet
  • 08:41 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1003.eqiad.wmnet
  • 08:40 kharlan@deploy1003: kharlan: Continuing with deployment
  • 08:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2244: Upgrading db2244.codfw.wmnet
  • 08:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2244: Upgrading db2244.codfw.wmnet
  • 08:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 08:39 marostegui@cumin1003: START - Cookbook sre.mysql.major-upgrade
  • 08:37 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc1071.eqiad.wmnet
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1004.eqiad.wmnet
  • 08:37 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1070.eqiad.wmnet
  • 08:37 kharlan@deploy1003: kharlan: Backport for IPReputation: Route opensearch_ipoid through envoy service mesh (T421293) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 08:37 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-gutter-eqiad
  • 08:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
  • 08:35 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.eqiad.wmnet
  • 08:35 kharlan@deploy1003: Started scap sync-world: Backport for IPReputation: Route opensearch_ipoid through envoy service mesh (T421293)
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
  • 08:35 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit2001-dev.codfw.wmnet
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 08:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1004.eqiad.wmnet
  • 08:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 08:33 cezmunsta: Removing db2151 from orchestrator T424343
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint2001.codfw.wmnet
  • 08:32 cezmunsta: Removing db2151 from zarcillo T424343
  • 08:32 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc1070.eqiad.wmnet
  • 08:31 cwilliams@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2151.codfw.wmnet
  • 08:31 cwilliams@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:31 cwilliams@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2151.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwilliams@cumin1003"
  • 08:31 cwilliams@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2151.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwilliams@cumin1003"
  • 08:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint2001.codfw.wmnet
  • 08:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 08:28 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudrabbit2001-dev.codfw.wmnet
  • 08:27 cwilliams@cumin1003: START - Cookbook sre.dns.netbox
  • 08:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
  • 08:26 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit2002-dev.codfw.wmnet
  • 08:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
  • 08:24 Emperor: reboot apus codfw frontends (May reboots)
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 08:22 cwilliams@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2151.codfw.wmnet
  • 08:19 hashar@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.3 refs T423912
  • 08:19 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudrabbit2002-dev.codfw.wmnet
  • 08:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 08:17 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-gutter-eqiad
  • 08:17 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit2003-dev.codfw.wmnet
  • 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 08:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
  • 08:10 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudrabbit2003-dev.codfw.wmnet
  • 08:02 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2212: Repooling after switchover
  • 07:58 cezmunsta: Removing db2150 from orchestrator T424342
  • 07:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
  • 07:57 Emperor: reboot apus eqiad frontends (May reboots)
  • 07:52 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
  • 07:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2005.wikimedia.org
  • 07:50 cezmunsta: Removing db2150 from zarcillo T424342
  • 07:48 cwilliams@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2150.codfw.wmnet
  • 07:48 cwilliams@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:48 cwilliams@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2150.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwilliams@cumin1003"
  • 07:48 cwilliams@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2150.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwilliams@cumin1003"
  • 07:46 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
  • 07:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
  • 07:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2005.wikimedia.org
  • 07:43 cwilliams@cumin1003: START - Cookbook sre.dns.netbox
  • 07:39 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
  • 07:39 cwilliams@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2150.codfw.wmnet
  • 07:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bookworm
  • 07:33 XioNoX: add gnmic 0.46.0 to reprepro
  • 07:20 mlitn@deploy1003: Finished scap sync-world: Backport for Squashed diff to master (duration: 13m 17s)
  • 07:18 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage
  • 07:17 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2212: Repooling after switchover
  • 07:14 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage
  • 07:14 mlitn@deploy1003: mlitn: Continuing with deployment
  • 07:14 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1210: Repooling after switchover
  • 07:13 mlitn@deploy1003: mlitn: Backport for Squashed diff to master synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 07:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 07:07 mlitn@deploy1003: Started scap sync-world: Backport for Squashed diff to master
  • 07:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint1001.eqiad.wmnet
  • 07:04 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bookworm
  • 07:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint1001.eqiad.wmnet
  • 07:02 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bookworm
  • 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1003.eqiad.wmnet
  • 06:59 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bookworm
  • 06:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 06:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1003.eqiad.wmnet
  • 06:56 fceratto@cumin1003: dbctl commit (dc=all): 'Depool db2212 T426703', diff saved to https://phabricator.wikimedia.org/P92584 and previous config saved to /var/cache/conftool/dbconfig/20260519-065637-fceratto.json
  • 06:54 moritzm: installing qemu security updates
  • 06:52 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2203 to s1 primary T426703', diff saved to https://phabricator.wikimedia.org/P92583 and previous config saved to /var/cache/conftool/dbconfig/20260519-065224-fceratto.json
  • 06:52 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2021.codfw.wmnet,pc[1011,1021].eqiad.wmnet with reason: Maintenance on pc1
  • 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc1011.eqiad.wmnet: Maintenance on pc1
  • 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 06:51 federico3: Starting s1 codfw failover from db2212 to db2203 - T426703
  • 06:51 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
  • 06:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc1011.eqiad.wmnet: Maintenance on pc1
  • 06:50 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99)
  • 06:50 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
  • 06:48 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage
  • 06:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
  • 06:45 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2203 with weight 0 T426703', diff saved to https://phabricator.wikimedia.org/P92581 and previous config saved to /var/cache/conftool/dbconfig/20260519-064500-fceratto.json
  • 06:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 30 hosts with reason: Primary switchover s1 T426703
  • 06:44 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage
  • 06:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
  • 06:40 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage
  • 06:39 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage
  • 06:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2014.codfw.wmnet
  • 06:33 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:33 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pc2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 06:33 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pc2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 06:29 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 06:28 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bookworm
  • 06:28 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1210: Repooling after switchover
  • 06:28 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bookworm
  • 06:24 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts pc2014.codfw.wmnet
  • 06:22 marostegui@cumin1003: dbctl commit (dc=all): 'Remove pc2014 from dbctl T426595', diff saved to https://phabricator.wikimedia.org/P92578 and previous config saved to /var/cache/conftool/dbconfig/20260519-062227-marostegui.json
  • 06:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 06:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depool db1210 T426087', diff saved to https://phabricator.wikimedia.org/P92577 and previous config saved to /var/cache/conftool/dbconfig/20260519-062056-fceratto.json
  • 06:19 fceratto@dns1005: END - running authdns-update
  • 06:18 fceratto@dns1005: START - running authdns-update
  • 06:15 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db1230 to s5 primary and set section read-write T426087', diff saved to https://phabricator.wikimedia.org/P92576 and previous config saved to /var/cache/conftool/dbconfig/20260519-061524-fceratto.json
  • 06:14 fceratto@cumin1003: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T426087', diff saved to https://phabricator.wikimedia.org/P92575 and previous config saved to /var/cache/conftool/dbconfig/20260519-061435-fceratto.json
  • 06:14 federico3: Starting s5 eqiad failover from db1210 to db1230 - T426087
  • 06:09 fceratto@cumin1003: dbctl commit (dc=all): 'Set db1230 with weight 0 T426087', diff saved to https://phabricator.wikimedia.org/P92574 and previous config saved to /var/cache/conftool/dbconfig/20260519-060929-fceratto.json
  • 06:08 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s5 T426087
  • 05:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance on pc4
  • 04:02 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.26 (duration: 02m 40s)
  • 03:41 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.3 refs T423912 (duration: 38m 23s)
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.3 refs T423912
  • 02:08 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s)
  • 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 02:00 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 02:00 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 02:00 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 02:00 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 02:00 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 02:00 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 00:58 ladsgroup@deploy1003: Finished scap sync-world: Backport for ThumbLimits: Harmonize svwiki large size with the rest of wikis (T376152) (duration: 06m 36s)
  • 00:54 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 00:54 ladsgroup@deploy1003: ladsgroup: Backport for ThumbLimits: Harmonize svwiki large size with the rest of wikis (T376152) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:52 ladsgroup@deploy1003: Started scap sync-world: Backport for ThumbLimits: Harmonize svwiki large size with the rest of wikis (T376152)
  • 00:32 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
  • 00:30 ladsgroup@deploy1003: Finished scap sync-world: Backport for IS: Drop wgGraphDefaultVegaVer, never used any more, IS: Drop wgEnableSpecialMute, ignored since MW 1.46, IS: Drop wgDiscussionTools_visualenhancements_*, ignored since 2025 (duration: 07m 08s)
  • 00:26 ladsgroup@deploy1003: ladsgroup, jforrester: Continuing with deployment
  • 00:25 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
  • 00:25 ladsgroup@deploy1003: ladsgroup, jforrester: Backport for IS: Drop wgGraphDefaultVegaVer, never used any more, IS: Drop wgEnableSpecialMute, ignored since MW 1.46, IS: Drop wgDiscussionTools_visualenhancements_*, ignored since 2025 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:24 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab2002.wikimedia.org with reason: T426563
  • 00:23 ladsgroup@deploy1003: Started scap sync-world: Backport for IS: Drop wgGraphDefaultVegaVer, never used any more, IS: Drop wgEnableSpecialMute, ignored since MW 1.46, IS: Drop wgDiscussionTools_visualenhancements_*, ignored since 2025

2026-05-18

  • 23:48 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1004.eqiad.wmnet
  • 23:44 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host vrts1004.eqiad.wmnet
  • 23:44 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on vrts1004.eqiad.wmnet with reason: T426563
  • 23:37 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lists2001.wikimedia.org with reason: T426563
  • 23:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bookworm
  • 23:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bookworm
  • 23:18 ladsgroup@deploy1003: Finished scap sync-world: Backport for Remove wgThumbnailStepsRatio (duration: 06m 52s)
  • 23:13 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 23:12 ladsgroup@deploy1003: ladsgroup: Backport for Remove wgThumbnailStepsRatio synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:11 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove wgThumbnailStepsRatio
  • 23:07 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage
  • 23:03 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage
  • 23:00 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage
  • 22:59 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage
  • 22:48 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bookworm
  • 22:48 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bookworm
  • 22:11 swfrench@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2331.codfw.wmnet
  • 22:06 swfrench@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2331.codfw.wmnet
  • 21:55 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2149-2154].codfw.wmnet
  • 21:54 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2149-2154].codfw.wmnet
  • 21:45 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-out1001.wikimedia.org with reason: T426563
  • 21:44 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-in1001.wikimedia.org with reason: T426563
  • 21:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-out2001.wikimedia.org with reason: T426563
  • 21:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-in2001.wikimedia.org with reason: T426563
  • 21:42 krinkle@deploy1003: Finished scap sync-world: Backport for Revert "Enable wgTrackMediaRequestProvenance on Commons" (T414338 T425580) (duration: 11m 29s)
  • 21:38 krinkle@deploy1003: seddon, krinkle: Continuing with deployment
  • 21:32 krinkle@deploy1003: seddon, krinkle: Backport for Revert "Enable wgTrackMediaRequestProvenance on Commons" (T414338 T425580) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:31 krinkle@deploy1003: Started scap sync-world: Backport for Revert "Enable wgTrackMediaRequestProvenance on Commons" (T414338 T425580)
  • 21:28 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2190.codfw.wmnet
  • 21:28 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2190.codfw.wmnet
  • 21:16 mutante: gerrit-replica.wikimedia.org back online
  • 21:14 dzahn@cumin2002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:15:00 on gerrit-replica.wikimedia.org with reason: T426563
  • 21:10 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gerrit2002.wikimedia.org with reason: T426563
  • 21:10 mutante: gerrit-replica.wikimedia.org, gerrit-spare.wikimedia.org - rebooting backends
  • 21:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gerrit1003.wikimedia.org with reason: T426563
  • 20:30 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{wikikube-worker[2203-2331].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
  • 20:30 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2317-2330].codfw.wmnet
  • 20:30 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2317-2330].codfw.wmnet
  • 20:22 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2317-2330].codfw.wmnet
  • 20:14 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2317-2330].codfw.wmnet
  • 20:13 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2303-2316].codfw.wmnet
  • 20:13 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2303-2316].codfw.wmnet
  • 20:05 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2303-2316].codfw.wmnet
  • 19:56 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2303-2316].codfw.wmnet
  • 19:56 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2289-2302].codfw.wmnet
  • 19:56 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2289-2302].codfw.wmnet
  • 19:48 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2289-2302].codfw.wmnet
  • 19:40 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2289-2302].codfw.wmnet
  • 19:39 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2275-2288].codfw.wmnet
  • 19:39 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2275-2288].codfw.wmnet
  • 19:29 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2275-2288].codfw.wmnet
  • 19:21 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2275-2288].codfw.wmnet
  • 19:20 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2261-2274].codfw.wmnet
  • 19:20 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2261-2274].codfw.wmnet
  • 19:12 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2261-2274].codfw.wmnet
  • 19:03 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2261-2274].codfw.wmnet
  • 19:02 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2243,2248-2260].codfw.wmnet
  • 19:02 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2243,2248-2260].codfw.wmnet
  • 18:54 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2243,2248-2260].codfw.wmnet
  • 18:48 jhathaway@dns1004: END - running authdns-update
  • 18:46 jhathaway@dns1004: START - running authdns-update
  • 18:45 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2243,2248-2260].codfw.wmnet
  • 18:45 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2203-2215,2242].codfw.wmnet
  • 18:44 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2203-2215,2242].codfw.wmnet
  • 18:38 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab2003.wikimedia.org with reason: T426563
  • 18:35 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2203-2215,2242].codfw.wmnet
  • 18:32 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab-runner2004.codfw.wmnet
  • 18:32 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2004.codfw.wmnet
  • 18:31 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab-runner2004.codfw.wmnet
  • 18:31 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2004.codfw.wmnet
  • 18:30 herron@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host centrallog1002.eqiad.wmnet
  • 18:30 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
  • 18:26 herron: rebooting alert1002
  • 18:26 swfrench@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[2203-2331].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
  • 18:22 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2191-2202].codfw.wmnet
  • 18:22 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2191-2202].codfw.wmnet
  • 18:20 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2003.codfw.wmnet
  • 18:20 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2002.codfw.wmnet
  • 18:19 swfrench@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2189.codfw.wmnet
  • 18:19 swfrench@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2189.codfw.wmnet
  • 18:16 mutante: releases.wikimedia.org - rebooting backends
  • 18:16 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases2003.codfw.wmnet with reason: T426563
  • 18:13 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1003.eqiad.wmnet
  • 18:13 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2002.codfw.wmnet
  • 18:13 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
  • 18:11 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
  • 18:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2003.codfw.wmnet
  • 18:07 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host kafkamon2003.codfw.wmnet
  • 18:06 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
  • 18:06 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
  • 18:04 blake@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on P{wikikube-worker[2155-2331].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
  • 18:02 Reedy: Deployed patch for T426631
  • 17:59 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
  • 17:56 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
  • 17:56 herron@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-eqiad
  • 17:50 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 17:50 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1002.eqiad.wmnet
  • 17:48 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bookworm
  • 17:46 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab1005.eqiad.wmnet with reason: T426563
  • 17:46 herron: rebooting alert2002
  • 17:45 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab2003.codfw.wmnet with reason: T426563
  • 17:45 herron@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
  • 17:45 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
  • 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
  • 17:44 herron@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
  • 17:44 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
  • 17:44 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 17:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bookworm
  • 17:40 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
  • 17:38 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1005.eqiad.wmnet
  • 17:37 mutante: stewards* - rebooting
  • 17:36 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
  • 17:33 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage
  • 17:32 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
  • 17:31 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
  • 17:30 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2004.codfw.wmnet
  • 17:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage
  • 17:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1003.eqiad.wmnet
  • 17:23 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage
  • 17:23 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage
  • 17:23 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 17:22 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
  • 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
  • 17:18 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwlog1003.eqiad.wmnet
  • 17:16 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
  • 17:16 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
  • 17:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
  • 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2003.codfw.wmnet
  • 17:14 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on doc2003.codfw.wmnet with reason: T426563
  • 17:14 mutante: doc.wikimedia.org - rebooting backends
  • 17:13 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
  • 17:13 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host o11ytest2001.codfw.wmnet
  • 17:13 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on doc1004.eqiad.wmnet with reason: T426563
  • 17:13 topranks: restarted gnmic on netflow3004 as series missing for cr2-esams
  • 17:12 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host o11ytest1001.eqiad.wmnet
  • 17:11 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bookworm
  • 17:11 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bookworm
  • 17:11 mutante: etherpad - rebooting backends
  • 17:10 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on etherpad1004.eqiad.wmnet with reason: T426563
  • 17:10 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
  • 17:10 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
  • 17:08 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host mwlog2003.codfw.wmnet
  • 17:07 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host o11ytest2001.codfw.wmnet
  • 17:07 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host o11ytest1001.eqiad.wmnet
  • 17:05 herron@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-eqiad
  • 17:04 mutante: contint2002, phab2002 - rebooting
  • 16:49 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{wikikube-worker[1328-1384].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
  • 16:49 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1373-1374].eqiad.wmnet
  • 16:49 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1373-1374].eqiad.wmnet
  • 16:42 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1373-1374].eqiad.wmnet
  • 16:41 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1373-1374].eqiad.wmnet
  • 16:41 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1370-1372].eqiad.wmnet
  • 16:41 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1370-1372].eqiad.wmnet
  • 16:40 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2189-2202].codfw.wmnet
  • 16:37 herron@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-codfw
  • 16:34 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1370-1372].eqiad.wmnet
  • 16:32 mutante: zuul[12]00[123] / zuul* - rebooting
  • 16:31 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2189-2202].codfw.wmnet
  • 16:31 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2170-2179,2184-2188].codfw.wmnet
  • 16:31 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2170-2179,2184-2188].codfw.wmnet
  • 16:28 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1370-1372].eqiad.wmnet
  • 16:28 mutante: contint2003 - new jenkins - reboot for kernel upgrade
  • 16:28 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1367-1369].eqiad.wmnet
  • 16:28 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1367-1369].eqiad.wmnet
  • 16:27 mutante: people.wikimedia.org backend - rebooting
  • 16:22 mutante: contint1003 - rebooting
  • 16:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2009.codfw.wmnet
  • 16:22 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2170-2179,2184-2188].codfw.wmnet
  • 16:20 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1367-1369].eqiad.wmnet
  • 16:19 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1367-1369].eqiad.wmnet
  • 16:18 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1364-1366].eqiad.wmnet
  • 16:18 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1364-1366].eqiad.wmnet
  • 16:14 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2009.codfw.wmnet
  • 16:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1009.eqiad.wmnet
  • 16:13 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2170-2179,2184-2188].codfw.wmnet
  • 16:13 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2155-2169].codfw.wmnet
  • 16:13 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2155-2169].codfw.wmnet
  • 16:12 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bookworm
  • 16:11 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1364-1366].eqiad.wmnet
  • 16:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bookworm
  • 16:09 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1364-1366].eqiad.wmnet
  • 16:09 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1361-1363].eqiad.wmnet
  • 16:09 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1361-1363].eqiad.wmnet
  • 16:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1009.eqiad.wmnet
  • 16:05 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2008.codfw.wmnet
  • 16:04 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2155-2169].codfw.wmnet
  • 16:02 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1361-1363].eqiad.wmnet
  • 16:01 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1361-1363].eqiad.wmnet
  • 16:00 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1358-1360].eqiad.wmnet
  • 16:00 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1358-1360].eqiad.wmnet
  • 15:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage
  • 15:56 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2008.codfw.wmnet
  • 15:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1008.eqiad.wmnet
  • 15:55 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2155-2169].codfw.wmnet
  • 15:54 blake@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[2155-2331].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
  • 15:54 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage
  • 15:53 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1358-1360].eqiad.wmnet
  • 15:53 blake@cumin1003: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on P{wikikube-worker[2001-2331].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
  • 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2004.codfw.wmnet
  • 15:52 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1358-1360].eqiad.wmnet
  • 15:51 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1355-1357].eqiad.wmnet
  • 15:51 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1355-1357].eqiad.wmnet
  • 15:50 herron@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-codfw
  • 15:50 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2152-2154].codfw.wmnet
  • 15:49 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage
  • 15:49 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage
  • 15:48 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1008.eqiad.wmnet
  • 15:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2007.codfw.wmnet
  • 15:48 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2152-2154].codfw.wmnet
  • 15:48 blake@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) pool for host wikikube-worker[2149-2151].codfw.wmnet
  • 15:47 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2149-2151].codfw.wmnet
  • 15:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2004.codfw.wmnet
  • 15:45 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1355-1357].eqiad.wmnet
  • 15:43 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1355-1357].eqiad.wmnet
  • 15:43 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1352-1354].eqiad.wmnet
  • 15:43 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1352-1354].eqiad.wmnet
  • 15:41 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2007.codfw.wmnet
  • 15:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1007.eqiad.wmnet
  • 15:40 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2149-2151].codfw.wmnet
  • 15:38 Amir1: re-mapping thumbsize of 1 to 2 in all group0 wikis (T376152)
  • 15:38 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2149-2151].codfw.wmnet
  • 15:38 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2146-2148].codfw.wmnet
  • 15:38 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2146-2148].codfw.wmnet
  • 15:37 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bookworm
  • 15:37 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bookworm
  • 15:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
  • 15:36 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1352-1354].eqiad.wmnet
  • 15:34 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1352-1354].eqiad.wmnet
  • 15:34 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1349-1351].eqiad.wmnet
  • 15:34 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1349-1351].eqiad.wmnet
  • 15:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
  • 15:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1007.eqiad.wmnet
  • 15:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2006.codfw.wmnet
  • 15:31 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2146-2148].codfw.wmnet
  • 15:29 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2146-2148].codfw.wmnet
  • 15:29 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2143-2145].codfw.wmnet
  • 15:29 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2143-2145].codfw.wmnet
  • 15:28 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1349-1351].eqiad.wmnet
  • 15:26 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1349-1351].eqiad.wmnet
  • 15:26 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1346-1348].eqiad.wmnet
  • 15:26 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1346-1348].eqiad.wmnet
  • 15:25 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2006.codfw.wmnet
  • 15:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1006.eqiad.wmnet
  • 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3004.esams.wmnet
  • 15:22 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2143-2145].codfw.wmnet
  • 15:20 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2143-2145].codfw.wmnet
  • 15:20 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2140-2142].codfw.wmnet
  • 15:20 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2140-2142].codfw.wmnet
  • 15:19 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1346-1348].eqiad.wmnet
  • 15:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3004.esams.wmnet
  • 15:18 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1346-1348].eqiad.wmnet
  • 15:17 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1343-1345].eqiad.wmnet
  • 15:17 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1343-1345].eqiad.wmnet
  • 15:16 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1006.eqiad.wmnet
  • 15:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2005.codfw.wmnet
  • 15:12 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1343-1345].eqiad.wmnet
  • 15:12 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2140-2142].codfw.wmnet
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4003.ulsfo.wmnet
  • 15:11 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1343-1345].eqiad.wmnet
  • 15:11 herron@cumin1003: END (FAIL) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=99) rolling reboot on A:kafka-logging-codfw
  • 15:11 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2140-2142].codfw.wmnet
  • 15:11 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1340-1342].eqiad.wmnet
  • 15:10 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1340-1342].eqiad.wmnet
  • 15:10 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2137-2139].codfw.wmnet
  • 15:10 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2137-2139].codfw.wmnet
  • 15:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet
  • 15:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1005.eqiad.wmnet
  • 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4003.ulsfo.wmnet
  • 15:05 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1340-1342].eqiad.wmnet
  • 15:04 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1340-1342].eqiad.wmnet
  • 15:04 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1337-1339].eqiad.wmnet
  • 15:04 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1337-1339].eqiad.wmnet
  • 15:03 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2137-2139].codfw.wmnet
  • 15:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1005.eqiad.wmnet
  • 15:01 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2137-2139].codfw.wmnet
  • 15:01 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2134-2136].codfw.wmnet
  • 15:01 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2134-2136].codfw.wmnet
  • 14:57 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1337-1339].eqiad.wmnet
  • 14:55 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1337-1339].eqiad.wmnet
  • 14:55 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1334-1336].eqiad.wmnet
  • 14:55 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1334-1336].eqiad.wmnet
  • 14:54 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2134-2136].codfw.wmnet
  • 14:52 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2134-2136].codfw.wmnet
  • 14:52 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2131-2133].codfw.wmnet
  • 14:52 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2131-2133].codfw.wmnet
  • 14:48 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1334-1336].eqiad.wmnet
  • 14:48 herron@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-codfw
  • 14:47 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1334-1336].eqiad.wmnet
  • 14:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003
  • 14:47 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1331-1333].eqiad.wmnet
  • 14:47 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1331-1333].eqiad.wmnet
  • 14:45 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2131-2133].codfw.wmnet
  • 14:43 Daimona: Running queries to fixup data for T426002
  • 14:43 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2131-2133].codfw.wmnet
  • 14:43 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2128-2130].codfw.wmnet
  • 14:43 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2128-2130].codfw.wmnet
  • 14:42 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
  • 14:40 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1331-1333].eqiad.wmnet
  • 14:38 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1331-1333].eqiad.wmnet
  • 14:38 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1328-1330].eqiad.wmnet
  • 14:38 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1328-1330].eqiad.wmnet
  • 14:36 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2128-2130].codfw.wmnet
  • 14:34 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2128-2130].codfw.wmnet
  • 14:34 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2125-2127].codfw.wmnet
  • 14:34 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2125-2127].codfw.wmnet
  • 14:33 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1328-1330].eqiad.wmnet
  • 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2192 weight', diff saved to https://phabricator.wikimedia.org/P92573 and previous config saved to /var/cache/conftool/dbconfig/20260518-143320-fceratto.json
  • 14:32 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2149.codfw.wmnet with reason: Depooled host, will be decommissioned
  • 14:32 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2143.codfw.wmnet with reason: Depooled host, will be decommissioned
  • 14:31 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1328-1330].eqiad.wmnet
  • 14:31 jiji@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[1328-1384].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
  • 14:30 bking@deploy1003: Finished deploy [wdqs/wdqs@e8fb00c]: 0.3.163 (duration: 31m 47s)
  • 14:29 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bookworm
  • 14:27 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2125-2127].codfw.wmnet
  • 14:25 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2125-2127].codfw.wmnet
  • 14:25 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2114-2115,2124].codfw.wmnet
  • 14:25 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2114-2115,2124].codfw.wmnet
  • 14:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bookworm
  • 14:24 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 14:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 14:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:22 mlitn@deploy1003: Finished scap sync-world: Backport for Squashed diff to master (duration: 20m 05s)
  • 14:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:18 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2114-2115,2124].codfw.wmnet
  • 14:18 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2192: Repooling after switchover
  • 14:18 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2192: Repooling after switchover
  • 14:16 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2114-2115,2124].codfw.wmnet
  • 14:16 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on A:swift-fe
  • 14:16 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2111-2113].codfw.wmnet
  • 14:16 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2111-2113].codfw.wmnet
  • 14:16 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2179: Repooling after switchover
  • 14:16 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2179: Repooling after switchover
  • 14:13 mlitn@deploy1003: Rolling back deployment
  • 14:13 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage
  • 14:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003
  • 14:10 Amir1: mapping thumbsize of 0 to 2 in all group1 wikis (T376152)
  • 14:09 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2111-2113].codfw.wmnet
  • 14:09 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage
  • 14:08 mlitn@deploy1003: mlitn: Continuing with deployment
  • 14:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage
  • 14:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage
  • 14:05 mlitn@deploy1003: mlitn: Backport for Squashed diff to master synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:03 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2111-2113].codfw.wmnet
  • 14:03 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2108-2110].codfw.wmnet
  • 14:03 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2108-2110].codfw.wmnet
  • 14:02 mlitn@deploy1003: Started scap sync-world: Backport for Squashed diff to master
  • 13:58 bking@deploy1003: Started deploy [wdqs/wdqs@e8fb00c]: 0.3.163
  • 13:56 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Store uncomputed references delta as null, not 0 (T426002), .gitignore: Add /static/hcaptcha/ (T403829) (duration: 09m 57s)
  • 13:55 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2108-2110].codfw.wmnet
  • 13:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bookworm
  • 13:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bookworm
  • 13:53 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2108-2110].codfw.wmnet
  • 13:53 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2105-2107].codfw.wmnet
  • 13:53 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2105-2107].codfw.wmnet
  • 13:51 lucaswerkmeister-wmde@deploy1003: daimona, lucaswerkmeister-wmde, dancy: Continuing with deployment
  • 13:50 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
  • 13:48 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bookworm
  • 13:47 lucaswerkmeister-wmde@deploy1003: daimona, lucaswerkmeister-wmde, dancy: Backport for Store uncomputed references delta as null, not 0 (T426002), .gitignore: Add /static/hcaptcha/ (T403829) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:46 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2105-2107].codfw.wmnet
  • 13:46 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Store uncomputed references delta as null, not 0 (T426002), .gitignore: Add /static/hcaptcha/ (T403829)
  • 13:44 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bookworm
  • 13:44 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2105-2107].codfw.wmnet
  • 13:44 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2102-2104].codfw.wmnet
  • 13:44 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2102-2104].codfw.wmnet
  • 13:44 tchin@deploy1003: Finished deploy [analytics/refinery@ba10fca] (thin): Regular analytics weekly train THIN [analytics/refinery@ba10fcad] (duration: 01m 55s)
  • 13:42 tchin@deploy1003: Started deploy [analytics/refinery@ba10fca] (thin): Regular analytics weekly train THIN [analytics/refinery@ba10fcad]
  • 13:42 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:41 tchin@deploy1003: Finished deploy [analytics/refinery@ba10fca]: Regular analytics weekly train [analytics/refinery@ba10fcad] (duration: 05m 05s)
  • 13:41 Lucas_WMDE: updateCollation arwikisource for T426526 finished
  • 13:41 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:37 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2102-2104].codfw.wmnet
  • 13:36 tchin@deploy1003: Started deploy [analytics/refinery@ba10fca]: Regular analytics weekly train [analytics/refinery@ba10fcad]
  • 13:36 tchin@deploy1003: Finished deploy [analytics/refinery@ba10fca] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ba10fcad] (duration: 01m 54s)
  • 13:34 tchin@deploy1003: Started deploy [analytics/refinery@ba10fca] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ba10fcad]
  • 13:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2218']
  • 13:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2218']
  • 13:33 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: updateCollation arwikisource --previous-collation=uppercase # T426526
  • 13:33 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage
  • 13:33 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [config] Set Category Collation for arwikisource (T426526) (duration: 11m 24s)
  • 13:31 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2102-2104].codfw.wmnet
  • 13:30 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2093-2095].codfw.wmnet
  • 13:30 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2093-2095].codfw.wmnet
  • 13:29 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage
  • 13:28 lucaswerkmeister-wmde@deploy1003: hubaishan, lucaswerkmeister-wmde: Continuing with deployment
  • 13:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage
  • 13:25 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5003.eqsin.wmnet
  • 13:23 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2093-2095].codfw.wmnet
  • 13:23 lucaswerkmeister-wmde@deploy1003: hubaishan, lucaswerkmeister-wmde: Backport for [config] Set Category Collation for arwikisource (T426526) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:21 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:21 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2093-2095].codfw.wmnet
  • 13:21 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [config] Set Category Collation for arwikisource (T426526)
  • 13:21 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2090-2092].codfw.wmnet
  • 13:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5003.eqsin.wmnet
  • 13:21 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2090-2092].codfw.wmnet
  • 13:21 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:19 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:19 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for fix(signup.js): Do not warn about a username being available (T419401) (duration: 09m 18s)
  • 13:18 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:16 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bookworm
  • 13:15 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, sgimeno: Continuing with deployment
  • 13:14 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2090-2092].codfw.wmnet
  • 13:13 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bookworm
  • 13:13 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2090-2092].codfw.wmnet
  • 13:12 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2087-2089].codfw.wmnet
  • 13:12 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2087-2089].codfw.wmnet
  • 13:11 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, sgimeno: Backport for fix(signup.js): Do not warn about a username being available (T419401) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:10 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for fix(signup.js): Do not warn about a username being available (T419401)
  • 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 13:07 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Drop addurl trigger and 100% passive mode SiteKey (T426587) (duration: 11m 27s)
  • 13:05 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2087-2089].codfw.wmnet
  • 13:03 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2087-2089].codfw.wmnet
  • 13:03 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2076-2078].codfw.wmnet
  • 13:03 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2076-2078].codfw.wmnet
  • 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 13:00 kharlan@deploy1003: kharlan: Continuing with deployment
  • 12:59 kharlan@deploy1003: kharlan: Backport for hCaptcha: Drop addurl trigger and 100% passive mode SiteKey (T426587) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:57 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2076-2078].codfw.wmnet
  • 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow7002.magru.wmnet
  • 12:56 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Drop addurl trigger and 100% passive mode SiteKey (T426587)
  • 12:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow7002.magru.wmnet
  • 12:50 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2076-2078].codfw.wmnet
  • 12:50 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2073-2075].codfw.wmnet
  • 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T419635)', diff saved to https://phabricator.wikimedia.org/P92570 and previous config saved to /var/cache/conftool/dbconfig/20260518-125038-fceratto.json
  • 12:50 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2073-2075].codfw.wmnet
  • 12:43 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2073-2075].codfw.wmnet
  • 12:41 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2073-2075].codfw.wmnet
  • 12:41 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2070-2072].codfw.wmnet
  • 12:41 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2070-2072].codfw.wmnet
  • 12:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast7002.wikimedia.org
  • 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P92569 and previous config saved to /var/cache/conftool/dbconfig/20260518-124030-fceratto.json
  • 12:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast7002.wikimedia.org
  • 12:34 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2070-2072].codfw.wmnet
  • 12:32 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2070-2072].codfw.wmnet
  • 12:31 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2067-2069].codfw.wmnet
  • 12:31 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2067-2069].codfw.wmnet
  • 12:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
  • 12:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P92568 and previous config saved to /var/cache/conftool/dbconfig/20260518-123022-fceratto.json
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
  • 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2003.codfw.wmnet
  • 12:22 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2067-2069].codfw.wmnet
  • 12:22 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bookworm
  • 12:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
  • 12:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor-dev2001.codfw.wmnet
  • 12:20 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2067-2069].codfw.wmnet
  • 12:20 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2062,2064-2065].codfw.wmnet
  • 12:20 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2062,2064-2065].codfw.wmnet
  • 12:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T419635)', diff saved to https://phabricator.wikimedia.org/P92567 and previous config saved to /var/cache/conftool/dbconfig/20260518-122014-fceratto.json
  • 12:18 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1071.eqiad.wmnet with OS bookworm
  • 12:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor-dev2001.codfw.wmnet
  • 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1070.eqiad.wmnet with OS bookworm
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
  • 12:13 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2062,2064-2065].codfw.wmnet
  • 12:11 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2062,2064-2065].codfw.wmnet
  • 12:11 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2059-2061].codfw.wmnet
  • 12:11 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2059-2061].codfw.wmnet
  • 12:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1072.eqiad.wmnet with OS bookworm
  • 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
  • 12:07 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage
  • 12:04 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2059-2061].codfw.wmnet
  • 12:02 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2059-2061].codfw.wmnet
  • 12:02 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2056-2058].codfw.wmnet
  • 12:02 Dreamy_Jazz: Ran `scap remove-patch --message-body 'Dropping patch already made public' /srv/patches/next/extensions/ConfirmEdit/01-T423840.patch`
  • 12:02 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2056-2058].codfw.wmnet
  • 12:02 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1071.eqiad.wmnet with reason: host reimage
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2003.codfw.wmnet
  • 11:58 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1070.eqiad.wmnet with reason: host reimage
  • 11:55 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2056-2058].codfw.wmnet
  • 11:54 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1072.eqiad.wmnet with reason: host reimage
  • 11:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin2003.codfw.wmnet
  • 11:53 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2056-2058].codfw.wmnet
  • 11:53 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2050-2051,2055].codfw.wmnet
  • 11:53 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2050-2051,2055].codfw.wmnet
  • 11:53 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage
  • 11:53 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2213 to s5 primary T426600', diff saved to https://phabricator.wikimedia.org/P92566 and previous config saved to /var/cache/conftool/dbconfig/20260518-115304-fceratto.json
  • 11:52 federico3: Starting s5 codfw failover from db2192 to db2213 - T426600
  • 11:51 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1070.eqiad.wmnet with reason: host reimage
  • 11:50 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1071.eqiad.wmnet with reason: host reimage
  • 11:50 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1072.eqiad.wmnet with reason: host reimage
  • 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1003.eqiad.wmnet
  • 11:47 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2050-2051,2055].codfw.wmnet
  • 11:46 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2213 with weight 0 T426600', diff saved to https://phabricator.wikimedia.org/P92565 and previous config saved to /var/cache/conftool/dbconfig/20260518-114652-fceratto.json
  • 11:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s5 T426600
  • 11:45 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2050-2051,2055].codfw.wmnet
  • 11:45 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2044,2046,2049].codfw.wmnet
  • 11:45 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2044,2046,2049].codfw.wmnet
  • 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1003.eqiad.wmnet
  • 11:41 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bookworm
  • 11:39 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1070.eqiad.wmnet with OS bookworm
  • 11:39 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1071.eqiad.wmnet with OS bookworm
  • 11:38 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1072.eqiad.wmnet with OS bookworm
  • 11:38 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2044,2046,2049].codfw.wmnet
  • 11:36 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2044,2046,2049].codfw.wmnet
  • 11:36 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2039,2041-2042].codfw.wmnet
  • 11:36 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2039,2041-2042].codfw.wmnet
  • 11:32 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1072.eqiad.wmnet with OS bullseye
  • 11:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1071.eqiad.wmnet with OS bullseye
  • 11:29 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2039,2041-2042].codfw.wmnet
  • 11:27 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idm1001.wikimedia.org
  • 11:26 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2039,2041-2042].codfw.wmnet
  • 11:26 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2036-2038].codfw.wmnet
  • 11:26 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2036-2038].codfw.wmnet
  • 11:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1070.eqiad.wmnet with OS bullseye
  • 11:24 slyngshede@cumin1003: START - Cookbook sre.hosts.reboot-single for host idm1001.wikimedia.org
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2003.codfw.wmnet
  • 11:21 slyngshede@dns1004: END - running authdns-update
  • 11:19 slyngshede@dns1004: START - running authdns-update
  • 11:19 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2036-2038].codfw.wmnet
  • 11:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2003.codfw.wmnet
  • 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4006.wikimedia.org
  • 11:17 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1072.eqiad.wmnet with reason: host reimage
  • 11:17 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2036-2038].codfw.wmnet
  • 11:17 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2033-2035].codfw.wmnet
  • 11:17 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2033-2035].codfw.wmnet
  • 11:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4006.wikimedia.org
  • 11:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1071.eqiad.wmnet with reason: host reimage
  • 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idm2001.wikimedia.org
  • 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org
  • 11:11 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2033-2035].codfw.wmnet
  • 11:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1070.eqiad.wmnet with reason: host reimage
  • 11:09 slyngshede@cumin1003: START - Cookbook sre.hosts.reboot-single for host idm2001.wikimedia.org
  • 11:09 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idm-test1001.wikimedia.org
  • 11:09 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2033-2035].codfw.wmnet
  • 11:09 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2016-2018].codfw.wmnet
  • 11:09 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2016-2018].codfw.wmnet
  • 11:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1072.eqiad.wmnet with reason: host reimage
  • 11:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1071.eqiad.wmnet with reason: host reimage
  • 11:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1070.eqiad.wmnet with reason: host reimage
  • 11:05 slyngshede@cumin1003: START - Cookbook sre.hosts.reboot-single for host idm-test1001.wikimedia.org
  • 11:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org
  • 11:04 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1005.wikimedia.org
  • 11:02 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2016-2018].codfw.wmnet
  • 11:00 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2016-2018].codfw.wmnet
  • 11:00 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2013-2015].codfw.wmnet
  • 11:00 slyngshede@cumin1003: START - Cookbook sre.hosts.reboot-single for host idp1005.wikimedia.org
  • 11:00 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2013-2015].codfw.wmnet
  • 10:56 slyngshede@dns1004: END - running authdns-update
  • 10:55 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1072.eqiad.wmnet with OS bullseye
  • 10:54 slyngshede@dns1004: START - running authdns-update
  • 10:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1071.eqiad.wmnet with OS bullseye
  • 10:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1070.eqiad.wmnet with OS bullseye
  • 10:53 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2013-2015].codfw.wmnet
  • 10:52 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2013-2015].codfw.wmnet
  • 10:51 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2006,2011-2012].codfw.wmnet
  • 10:51 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2006,2011-2012].codfw.wmnet
  • 10:50 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe
  • 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2005.wikimedia.org
  • 10:46 slyngshede@cumin1003: START - Cookbook sre.hosts.reboot-single for host idp2005.wikimedia.org
  • 10:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org
  • 10:45 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2006,2011-2012].codfw.wmnet
  • 10:42 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2006,2011-2012].codfw.wmnet
  • 10:42 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2001-2002,2005].codfw.wmnet
  • 10:42 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2001-2002,2005].codfw.wmnet
  • 10:41 slyngshede@cumin1003: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org
  • 10:41 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2005.wikimedia.org
  • 10:37 slyngshede@cumin1003: START - Cookbook sre.hosts.reboot-single for host idp-test2005.wikimedia.org
  • 10:37 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2005.wikimedia.org
  • 10:37 slyngshede@cumin1003: START - Cookbook sre.hosts.reboot-single for host idp-test2005.wikimedia.org
  • 10:35 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2001-2002,2005].codfw.wmnet
  • 10:33 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2001-2002,2005].codfw.wmnet
  • 10:33 blake@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[2001-2331].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
  • 10:27 blake@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{wikikube-worker[2332-2374].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
  • 10:27 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2352-2356].codfw.wmnet
  • 10:27 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2352-2356].codfw.wmnet
  • 10:21 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2352-2356].codfw.wmnet
  • 10:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc4', diff saved to https://phabricator.wikimedia.org/P92564 and previous config saved to /var/cache/conftool/dbconfig/20260518-101917-marostegui.json
  • 10:18 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool pc2024: replacing hw
  • 10:18 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99)
  • 10:18 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
  • 10:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool pc2024: replacing hw
  • 10:17 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2352-2356].codfw.wmnet
  • 10:17 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2024 to pc4 master T418973', diff saved to https://phabricator.wikimedia.org/P92563 and previous config saved to /var/cache/conftool/dbconfig/20260518-101749-marostegui.json
  • 10:17 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2347-2351].codfw.wmnet
  • 10:17 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2347-2351].codfw.wmnet
  • 10:17 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2024 to dbctl T418973', diff saved to https://phabricator.wikimedia.org/P92562 and previous config saved to /var/cache/conftool/dbconfig/20260518-101714-marostegui.json
  • 10:11 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2347-2351].codfw.wmnet
  • 10:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply
  • 10:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply
  • 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2179 (T419635)', diff saved to https://phabricator.wikimedia.org/P92561 and previous config saved to /var/cache/conftool/dbconfig/20260518-100831-fceratto.json
  • 10:08 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 10:07 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2347-2351].codfw.wmnet
  • 10:07 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2342-2346].codfw.wmnet
  • 10:07 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2342-2346].codfw.wmnet
  • 10:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight T426590', diff saved to https://phabricator.wikimedia.org/P92560 and previous config saved to /var/cache/conftool/dbconfig/20260518-100710-fceratto.json
  • 10:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2240 to s4 primary T426590', diff saved to https://phabricator.wikimedia.org/P92559 and previous config saved to /var/cache/conftool/dbconfig/20260518-100203-fceratto.json
  • 10:00 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2342-2346].codfw.wmnet
  • 10:00 federico3: Starting s4 codfw failover from db2179 to db2240 - T426590
  • 09:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1086.eqiad.wmnet with OS bullseye
  • 09:57 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2342-2346].codfw.wmnet
  • 09:57 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2337-2341].codfw.wmnet
  • 09:57 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2337-2341].codfw.wmnet
  • 09:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
  • 09:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
  • 09:52 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2240 with weight 0 T426590', diff saved to https://phabricator.wikimedia.org/P92558 and previous config saved to /var/cache/conftool/dbconfig/20260518-095218-fceratto.json
  • 09:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 40 hosts with reason: Primary switchover s4 T426590
  • 09:50 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2337-2341].codfw.wmnet
  • 09:47 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2337-2341].codfw.wmnet
  • 09:47 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2332-2336].codfw.wmnet
  • 09:47 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2332-2336].codfw.wmnet
  • 09:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1082.eqiad.wmnet with OS bullseye
  • 09:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1086.eqiad.wmnet with reason: host reimage
  • 09:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
  • 09:41 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2332-2336].codfw.wmnet
  • 09:38 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2332-2336].codfw.wmnet
  • 09:37 blake@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[2332-2374].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
  • 09:37 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1086.eqiad.wmnet with reason: host reimage
  • 09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
  • 09:29 cezmunsta: Removing db2152.codfw.wmnet from orchestrator T424344
  • 09:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1082.eqiad.wmnet with reason: host reimage
  • 09:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be1086
  • 09:25 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1086
  • 09:25 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1086
  • 09:25 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be1086.eqiad.wmnet 18.32.64.10.in-addr.arpa 8.1.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:25 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be1086.eqiad.wmnet 18.32.64.10.in-addr.arpa 8.1.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:25 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:25 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be1086 - mvernon@cumin2002"
  • 09:25 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1082.eqiad.wmnet with reason: host reimage
  • 09:25 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be1086 - mvernon@cumin2002"
  • 09:21 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 09:20 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be1086
  • 09:20 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1086.eqiad.wmnet with OS bullseye
  • 09:18 moritzm: installing Java 21 security updates
  • 09:13 cezmunsta: Removing db2152.codfw.wmnet from zarcillo T424344
  • 09:11 jnuche@deploy1003: Installation of scap version "4.265.2" completed for 1 hosts
  • 09:10 jnuche@deploy1003: Installing scap version "4.265.2" for 1 host(s)
  • 09:08 cwilliams@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2152.codfw.wmnet
  • 09:08 cwilliams@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:08 cwilliams@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2152.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwilliams@cumin1003"
  • 09:08 cwilliams@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2152.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwilliams@cumin1003"
  • 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be1082
  • 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1082
  • 09:07 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1082
  • 09:07 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be1082.eqiad.wmnet 52.32.64.10.in-addr.arpa 2.5.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:07 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be1082.eqiad.wmnet 52.32.64.10.in-addr.arpa 2.5.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:07 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:07 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be1082 - mvernon@cumin2002"
  • 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be1082 - mvernon@cumin2002"
  • 09:05 jnuche@deploy1003: Installing scap version "4.265.2" for 163 host(s)
  • 09:04 cwilliams@cumin1003: START - Cookbook sre.dns.netbox
  • 09:03 ayounsi@dns1004: END - running authdns-update
  • 09:02 javiermonton@deploy1003: Finished scap sync-world: Backport for stream: mediawiki.page_html_content_change (T423920) (duration: 31m 35s)
  • 09:01 ayounsi@dns1004: START - running authdns-update
  • 08:59 cwilliams@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2152.codfw.wmnet
  • 08:50 javiermonton@deploy1003: javiermonton: Continuing with deployment
  • 08:50 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 08:50 javiermonton@deploy1003: javiermonton: Backport for stream: mediawiki.page_html_content_change (T423920) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:50 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 08:50 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 08:50 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 08:44 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 08:41 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be1082
  • 08:40 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1082.eqiad.wmnet with OS bullseye
  • 08:31 javiermonton@deploy1003: Started scap sync-world: Backport for stream: mediawiki.page_html_content_change (T423920)
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 08:15 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
  • 08:14 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
  • 08:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 08:12 moritzm: installing glibc bugfix updates from bookworm point release
  • 07:46 moritzm: installing systemd bugfix updates from bookworm point release
  • 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 07:35 moritzm: installing openssl bugfix updates from bookworm point release
  • 07:25 godog: clean up space on cloudcumin1001: apt archives and older kernels
  • 07:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc2013.codfw.wmnet
  • 07:14 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:14 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pc2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 07:14 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pc2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 07:09 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 07:04 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts pc2013.codfw.wmnet
  • 07:03 marostegui@cumin1003: dbctl commit (dc=all): 'Remove pc2013 from dbctl T426555', diff saved to https://phabricator.wikimedia.org/P92557 and previous config saved to /var/cache/conftool/dbconfig/20260518-070322-marostegui.json
  • 06:59 moritzm: installing systemd bugfix updates from trixie point release
  • 06:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Kgraessle out of all services on: 2468 hosts
  • 06:54 moritzm: installing Linux 6.1.172 on bookworm hosts
  • 06:49 moritzm: installing glibc bugfix updates from trixie point release
  • 06:44 moritzm: installing openssl bugfix updates from trixie point release
  • 06:33 moritzm: installing Linux 6.12.88 on trixie hosts
  • 05:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99)
  • 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
  • 05:12 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 05:11 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
  • 05:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2014,2024].codfw.wmnet,pc1014.eqiad.wmnet with reason: Maintenance on pc4

2026-05-15

  • 21:03 jforrester@deploy1003: Finished scap sync-world: Backport for Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580) (duration: 07m 43s)
  • 20:59 jforrester@deploy1003: jforrester, seddon: Continuing with deployment
  • 20:57 jforrester@deploy1003: jforrester, seddon: Backport for Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:55 jforrester@deploy1003: Started scap sync-world: Backport for Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)
  • 20:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1290.eqiad.wmnet with OS bookworm
  • 20:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 20:09 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 19:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage
  • 19:47 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage
  • 19:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm
  • 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290
  • 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290
  • 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts
  • 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s)
  • 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003"
  • 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003"
  • 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet
  • 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012
  • 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet
  • 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye
  • 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - T426298
  • 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
  • 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage
  • 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye
  • 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie
  • 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065
  • 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065
  • 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065
  • 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002"
  • 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002"
  • 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065
  • 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye
  • 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
  • 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage
  • 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003"
  • 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003"
  • 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link T424611
  • 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye
  • 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - T426298
  • 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface T424611
  • 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye
  • 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface T424611
  • 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065
  • 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed T426383
  • 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064
  • 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064
  • 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064
  • 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002"
  • 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002"
  • 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064
  • 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye
  • 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 T426380', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json
  • 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary T426380', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json
  • 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - T426380
  • 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T426380
  • 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 T426380', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json
  • 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065
  • 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064
  • 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie
  • 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie
  • 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064
  • 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010
  • 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010
  • 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s)
  • 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm
  • 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm
  • 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm
  • 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage
  • 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage
  • 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm
  • 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED

2026-05-14

  • 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290
  • 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290
  • 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289
  • 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289
  • 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 21:47 egardner@deploy1003: Finished scap sync-world: Backport for Share Highlight: overdraw photo on share card canvas (T426344) (duration: 07m 14s)
  • 21:43 egardner@deploy1003: egardner: Continuing with deployment
  • 21:41 egardner@deploy1003: egardner: Backport for Share Highlight: overdraw photo on share card canvas (T426344) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:40 egardner@deploy1003: Started scap sync-world: Backport for Share Highlight: overdraw photo on share card canvas (T426344)
  • 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for Disable Reading Lists survey for Wikipedias (T421776) (duration: 09m 15s)
  • 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment
  • 21:26 jdrewniak@deploy1003: jdrewniak: Backport for Disable Reading Lists survey for Wikipedias (T421776) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for Disable Reading Lists survey for Wikipedias (T421776)
  • 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Enable hCaptcha for account creation API on group 0 wiki's, Remove DynamicPageList from legalteamwiki as unused (duration: 06m 33s)
  • 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm
  • 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment
  • 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for Enable hCaptcha for account creation API on group 0 wiki's, Remove DynamicPageList from legalteamwiki as unused synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for Enable hCaptcha for account creation API on group 0 wiki's, Remove DynamicPageList from legalteamwiki as unused
  • 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm
  • 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for Simplewiki: include article wizard in AG experiment (T426278) (duration: 07m 03s)
  • 20:46 sbisson@deploy1003: sbisson: Continuing with deployment
  • 20:45 sbisson@deploy1003: sbisson: Backport for Simplewiki: include article wizard in AG experiment (T426278) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:43 sbisson@deploy1003: Started scap sync-world: Backport for Simplewiki: include article wizard in AG experiment (T426278)
  • 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage
  • 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage
  • 20:35 cjming@deploy1003: Finished scap sync-world: Backport for Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206) (duration: 10m 18s)
  • 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment
  • 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm
  • 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm
  • 20:27 cjming@deploy1003: cjming, neriah: Backport for Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:25 cjming@deploy1003: Started scap sync-world: Backport for Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)
  • 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage
  • 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm
  • 20:19 jsn@deploy1003: Finished scap sync-world: Backport for Enable AutoModerator on Italian Wikipedia (T405152), Enable AutoModerator on Albanian Wikipedia (T420450), Enable AutoModerator on Dutch Wikipedia (T425509) (duration: 07m 48s)
  • 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage
  • 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment
  • 20:13 jsn@deploy1003: kgraessle, jsn: Backport for Enable AutoModerator on Italian Wikipedia (T405152), Enable AutoModerator on Albanian Wikipedia (T420450), Enable AutoModerator on Dutch Wikipedia (T425509) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:11 jsn@deploy1003: Started scap sync-world: Backport for Enable AutoModerator on Italian Wikipedia (T405152), Enable AutoModerator on Albanian Wikipedia (T420450), Enable AutoModerator on Dutch Wikipedia (T425509)
  • 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm
  • 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm
  • 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage
  • 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286
  • 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286
  • 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003"
  • 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003"
  • 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage
  • 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm
  • 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm
  • 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage
  • 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage
  • 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm
  • 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - T426298
  • 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - T426298
  • 17:10 cmooney@dns2005: END - running authdns-update
  • 17:09 cmooney@dns2005: START - running authdns-update
  • 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T426298
  • 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T426298
  • 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie
  • 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches T424611
  • 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches T424611
  • 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches T424611
  • 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm
  • 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003"
  • 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003"
  • 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290
  • 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290
  • 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003"
  • 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003"
  • 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage
  • 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003
  • 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage
  • 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003
  • 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm
  • 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie
  • 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289
  • 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289
  • 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003"
  • 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003"
  • 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm
  • 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm
  • 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage
  • 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage
  • 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for EventStreamConfig: fix product_metrics.web_base (T426209) (duration: 06m 20s)
  • 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage
  • 15:12 bearloga@deploy1003: bearloga: Continuing with deployment
  • 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:12 bearloga@deploy1003: bearloga: Backport for EventStreamConfig: fix product_metrics.web_base (T426209) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:10 bearloga@deploy1003: Started scap sync-world: Backport for EventStreamConfig: fix product_metrics.web_base (T426209)
  • 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage
  • 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm
  • 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T419961)', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json
  • 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288
  • 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm
  • 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288
  • 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003"
  • 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003"
  • 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm
  • 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json
  • 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage
  • 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003"
  • 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003"
  • 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289
  • 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289
  • 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json
  • 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm
  • 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514) (duration: 11m 14s)
  • 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage
  • 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285
  • 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285
  • 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003"
  • 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003"
  • 14:29 phuedx@deploy1003: phuedx: Continuing with deployment
  • 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T419961)', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json
  • 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 14:24 phuedx@deploy1003: phuedx: Backport for ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm
  • 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 14:22 phuedx@deploy1003: Started scap sync-world: Backport for ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)
  • 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284
  • 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 (T419961)', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json
  • 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage
  • 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm
  • 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl T424342', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json
  • 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284
  • 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003"
  • 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003"
  • 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json
  • 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage
  • 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295) (duration: 08m 00s)
  • 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment
  • 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm
  • 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json
  • 14:06 krinkle@deploy1003: Started scap sync-world: Backport for throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)
  • 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage
  • 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl T424343', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json
  • 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for Scale share-highlight card to fit small viewports (T426247) (duration: 07m 09s)
  • 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm
  • 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage
  • 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:56 mfossati@deploy1003: mfossati: Continuing with deployment
  • 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T419635)', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json
  • 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:55 mfossati@deploy1003: mfossati: Backport for Scale share-highlight card to fit small viewports (T426247) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:53 mfossati@deploy1003: Started scap sync-world: Backport for Scale share-highlight card to fit small viewports (T426247)
  • 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned
  • 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 (T419635)', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json
  • 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned
  • 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned
  • 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338) (duration: 07m 03s)
  • 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage
  • 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:45 krinkle@deploy1003: krinkle: Continuing with deployment
  • 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage
  • 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:44 krinkle@deploy1003: krinkle: Backport for Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm
  • 13:42 krinkle@deploy1003: Started scap sync-world: Backport for Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)
  • 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for Add ReadingLists Account Creation CTA campaign (T422169), WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169) (duration: 12m 33s)
  • 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment
  • 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned
  • 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned
  • 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned
  • 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned
  • 13:31 krinkle@deploy1003: krinkle, annet: Backport for Add ReadingLists Account Creation CTA campaign (T422169), WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm
  • 13:29 krinkle@deploy1003: Started scap sync-world: Backport for Add ReadingLists Account Creation CTA campaign (T422169), WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)
  • 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283
  • 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283
  • 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for Enable the Article Guidance experiment on simplewiki (T426278) (duration: 08m 10s)
  • 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003"
  • 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003"
  • 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:12 sbisson@deploy1003: sbisson: Continuing with deployment
  • 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:10 sbisson@deploy1003: sbisson: Backport for Enable the Article Guidance experiment on simplewiki (T426278) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover
  • 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282
  • 13:08 sbisson@deploy1003: Started scap sync-world: Backport for Enable the Article Guidance experiment on simplewiki (T426278)
  • 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover
  • 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight T426291', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json
  • 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282
  • 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003"
  • 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003"
  • 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary T426291', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json
  • 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281
  • 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - T426291
  • 13:00 kart_: Updated cxserver to 2026-05-14-123010-production (T426174, T404298)
  • 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281
  • 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003"
  • 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003"
  • 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280
  • 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280
  • 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003"
  • 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003"
  • 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 T426291', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json
  • 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279
  • 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 T426291
  • 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279
  • 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003"
  • 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003"
  • 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003
  • 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003
  • 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458
  • 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458
  • 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master T418973', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json
  • 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master T418973', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json
  • 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 T418973', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json
  • 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync
  • 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync
  • 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply
  • 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply
  • 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye
  • 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye
  • 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl T424344', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json
  • 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
  • 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage
  • 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply
  • 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply
  • 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage
  • 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage
  • 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage
  • 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye
  • 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye
  • 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned
  • 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned
  • 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned
  • 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned
  • 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye
  • 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye
  • 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye
  • 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye
  • 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage
  • 09:20 Emperor: rebalance codfw swift rings T354872
  • 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage
  • 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage
  • 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage
  • 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage
  • 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage
  • 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage
  • 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage
  • 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye
  • 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye
  • 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye
  • 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye
  • 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 T424341', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json
  • 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs T423911
  • 07:01 kart_: Update cxserver to 2026-04-23-114216-production (T423002)
  • 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3
  • 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW T418973
  • 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
  • 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW T418973
  • 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie
  • 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie
  • 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie
  • 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage
  • 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage
  • 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie
  • 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie
  • 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie
  • 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie
  • 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie
  • 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s)
  • 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003

2026-05-13

  • 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis (T376152)
  • 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003
  • 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for wgThumbLimits: Remove the exception for itwikiquote (T376152) (duration: 07m 48s)
  • 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment
  • 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for wgThumbLimits: Remove the exception for itwikiquote (T376152) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for wgThumbLimits: Remove the exception for itwikiquote (T376152)
  • 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
  • 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
  • 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
  • 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
  • 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
  • 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for Handle share-highlight images w/o resizeUrl (T426215) (duration: 07m 32s)
  • 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
  • 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment
  • 20:37 jdlrobson@deploy1003: jdlrobson: Backport for Handle share-highlight images w/o resizeUrl (T426215) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for Handle share-highlight images w/o resizeUrl (T426215)
  • 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for Update small size for Swedish Wikipedia (T424910) (duration: 07m 26s)
  • 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment
  • 20:27 jdlrobson@deploy1003: jdlrobson: Backport for Update small size for Swedish Wikipedia (T424910) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for Update small size for Swedish Wikipedia (T424910)
  • 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
  • 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
  • 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for Revert "cirrus: AB test query suggester variants" (T407432) (duration: 07m 06s)
  • 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
  • 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
  • 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment
  • 20:18 ebernhardson@deploy1003: ebernhardson: Backport for Revert "cirrus: AB test query suggester variants" (T407432) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for Revert "cirrus: AB test query suggester variants" (T407432)
  • 20:13 cjming@deploy1003: Finished scap sync-world: Backport for Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403) (duration: 06m 47s)
  • 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003
  • 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment
  • 20:09 cjming@deploy1003: bpirkle, cjming: Backport for Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:07 cjming@deploy1003: Started scap sync-world: Backport for Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)
  • 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:20 cmooney@dns2005: END - running authdns-update
  • 18:19 cmooney@dns2005: START - running authdns-update
  • 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003"
  • 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003"
  • 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003
  • 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003
  • 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links T424611
  • 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure
  • 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul
  • 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links T424611
  • 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet
  • 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
  • 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet
  • 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet
  • 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm
  • 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links T424611
  • 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links T424611
  • 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm
  • 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm
  • 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003
  • 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.*
  • 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior (T419825)
  • 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm
  • 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm
  • 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.*
  • 15:27 fabfur: depooling cp7009 to install haproxy-awslc (T419825)
  • 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003
  • 15:16 cmooney@dns2005: END - running authdns-update
  • 15:15 cmooney@dns2005: START - running authdns-update
  • 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003
  • 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003"
  • 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003"
  • 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003"
  • 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003"
  • 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003
  • 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie
  • 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for WikiEditor: Populate user_groups in EditAttemptStep events (T424010) (duration: 07m 17s)
  • 14:37 kharlan@deploy1003: kharlan: Continuing with deployment
  • 14:36 kharlan@deploy1003: kharlan: Backport for WikiEditor: Populate user_groups in EditAttemptStep events (T424010) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 14:34 kharlan@deploy1003: Started scap sync-world: Backport for WikiEditor: Populate user_groups in EditAttemptStep events (T424010)
  • 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003
  • 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002"
  • 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003
  • 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002"
  • 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage
  • 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647) (duration: 06m 35s)
  • 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage
  • 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003
  • 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:15 jforrester@deploy1003: jforrester: Continuing with deployment
  • 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:14 jforrester@deploy1003: jforrester: Backport for Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:12 jforrester@deploy1003: Started scap sync-world: Backport for Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)
  • 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003
  • 14:08 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033), ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033), Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972), [[gerrit:1286892|Add 'Promise-Non-Write-API-Action' to $wgAl}}
  • 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment
  • 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.*
  • 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org
  • 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033), ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033), Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972), [[gerrit:1286892|Add 'Promise-Non-Write-AP}}
  • 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie
  • 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003
  • {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033), ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033), Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972), [[gerrit:1286892|Add 'Promise-Non-Write-API-Action' to $wgAll}}
  • 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior (T419825)
  • 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org
  • 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393) (duration: 07m 36s)
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002
  • 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment
  • 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)
  • {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [Share Highlight] Exclude section edit links, footnotes from selection (T423658), Add robust color fallbacks for QuoteCard average-color styling (T425358), Fixed card width (T425710), Adjust image size to match fixed width (T425710), [[gerrit:1286846|ShareHighlight: exclude browsers t}}
  • 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment
  • {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [Share Highlight] Exclude section edit links, footnotes from selection (T423658), Add robust color fallbacks for QuoteCard average-color styling (T425358), Fixed card width (T425710), Adjust image size to match fixed width (T425710), [[gerrit:1286846|ShareHighlight: exclude browsers that d}}
  • 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002
  • 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [Share Highlight] Exclude section edit links, footnotes from selection (T423658), Add robust color fallbacks for QuoteCard average-color styling (T425358), Fixed card width (T425710), Adjust image size to match fixed width (T425710), [[gerrit:1286846|ShareHighlight: exclude browsers th}}
  • 13:25 moritzm: installing openjdk-11 security updates
  • 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for Add configurable user-agent and sparql endpoint url (T425389) (duration: 08m 18s)
  • 13:07 sbisson@deploy1003: sbisson: Continuing with deployment
  • 13:05 sbisson@deploy1003: sbisson: Backport for Add configurable user-agent and sparql endpoint url (T425389) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw
  • 13:03 sbisson@deploy1003: Started scap sync-world: Backport for Add configurable user-agent and sparql endpoint url (T425389)
  • 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for Fix TypeError on saving userrights interwiki (T426185) (duration: 06m 42s)
  • 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment
  • 12:45 mszwarc@deploy1003: mszwarc: Backport for Fix TypeError on saving userrights interwiki (T426185) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for Fix TypeError on saving userrights interwiki (T426185)
  • 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.*
  • 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) (T419825)
  • 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs T424611
  • 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0)
  • 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
  • 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
  • 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs T424611
  • 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003"
  • 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003"
  • 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie
  • 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed
  • 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie
  • 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs T424611
  • 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs T424611
  • 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage
  • 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage
  • 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts
  • 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie
  • 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie
  • 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie
  • 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet
  • 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet
  • 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade
  • 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie
  • 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org
  • 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts
  • 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage
  • 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org
  • 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage
  • 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage
  • 10:33 topranks: switch eqsin core router ibgp path to route via switches T424611
  • 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage
  • 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie
  • 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet
  • 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
  • 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
  • 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 10:10 moritzm: installing Apache security updates on Bullseye
  • 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie
  • 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye
  • 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply
  • 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie
  • 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie
  • 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie
  • 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply
  • 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 T426142', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json
  • 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary T426142', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json
  • 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye
  • 09:56 moritzm: installing distro-info-data updates from Bookworm point release
  • 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - T426142
  • 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T426142
  • 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye
  • 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 T426142', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json
  • 09:51 moritzm: installing ca-certificates update from Bookworm point release
  • 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye
  • 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage
  • 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for EventStreamConfig: Register special_user_login event stream (T425631) (duration: 09m 01s)
  • 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage
  • 09:41 kharlan@deploy1003: kharlan: Continuing with deployment
  • 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage
  • 09:38 kharlan@deploy1003: kharlan: Backport for EventStreamConfig: Register special_user_login event stream (T425631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:36 kharlan@deploy1003: Started scap sync-world: Backport for EventStreamConfig: Register special_user_login event stream (T425631)
  • 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage
  • 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage
  • 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage
  • 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage
  • 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage
  • 09:28 cmooney@dns2005: END - running authdns-update
  • 09:27 cmooney@dns2005: START - running authdns-update
  • 09:27 logmsgbot: dreamyjazz Deployed security patch for T423840
  • 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet
  • 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage
  • 09:21 logmsgbot: dreamyjazz Deployed security patch for T423840
  • 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye
  • 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye
  • 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye
  • 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye
  • 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw
  • 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003"
  • 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003"
  • 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 08:45 moritzm: installing dnsmasq security updates
  • 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003"
  • 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 08:38 cmooney@dns2005: END - running authdns-update
  • 08:37 cmooney@dns2005: START - running authdns-update
  • 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs T423911
  • 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003"
  • 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for WikimediaEvents: Enable Special:UserLogin instrumentation (T425631) (duration: 09m 18s)
  • 08:20 kharlan@deploy1003: kharlan: Continuing with deployment
  • 08:16 kharlan@deploy1003: kharlan: Backport for WikimediaEvents: Enable Special:UserLogin instrumentation (T425631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:14 kharlan@deploy1003: Started scap sync-world: Backport for WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)
  • 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build)
  • 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged T424611
  • 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build)
  • 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for translate: add opensearch-ttmserver-test (T425377) (duration: 09m 09s)
  • 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment
  • 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for translate: add opensearch-ttmserver-test (T425377) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 07:38 dcausse@deploy1003: Started scap sync-world: Backport for translate: add opensearch-ttmserver-test (T425377)
  • 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967), Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete" (duration: 09m 32s)
  • 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment
  • 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967), Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:25 dcausse@deploy1003: Started scap sync-world: Backport for testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967), Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"
  • 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie
  • 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie
  • 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay
  • 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie
  • 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie
  • 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie
  • 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie
  • 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage
  • 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage
  • 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage
  • 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage
  • 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie
  • 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie
  • 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie
  • 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie
  • 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie
  • 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie
  • 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie
  • 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie
  • 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm
  • 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage
  • 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage
  • 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm
  • 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm
  • 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm
  • 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage
  • 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage
  • 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278
  • 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage
  • 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278
  • 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003"
  • 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003"
  • 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage
  • 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm
  • 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm
  • 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm
  • 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277
  • 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277
  • 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003"
  • 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003"
  • 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm
  • 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage
  • 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276
  • 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage
  • 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276
  • 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003"
  • 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003"
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s)
  • 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm
  • 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 01:32 zabe@deploy1003: Finished scap sync-world: Backport for Start reading from new tables everywhere except commons (2nd try) (T416548) (duration: 06m 35s)
  • 01:28 zabe@deploy1003: zabe: Continuing with deployment
  • 01:27 zabe@deploy1003: zabe: Backport for Start reading from new tables everywhere except commons (2nd try) (T416548) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm
  • 01:26 zabe@deploy1003: Started scap sync-world: Backport for Start reading from new tables everywhere except commons (2nd try) (T416548)
  • 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275
  • 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275
  • 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003"
  • 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003"
  • 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274
  • 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274
  • 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003"
  • 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003"
  • 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm
  • 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"

2026-05-12

  • 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage
  • 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage
  • 23:46 cscott@deploy1003: Finished scap sync-world: Backport for Re-enable unit tests with updated output, Re-enable ContentHolderTest with updated output, Revert "Remove File::getHandler language fallback" (T425988) (duration: 12m 45s)
  • 23:40 cscott@deploy1003: cscott: Continuing with deployment
  • 23:39 cscott@deploy1003: cscott: Backport for Re-enable unit tests with updated output, Re-enable ContentHolderTest with updated output, Revert "Remove File::getHandler language fallback" (T425988) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:33 cscott@deploy1003: Started scap sync-world: Backport for Re-enable unit tests with updated output, Re-enable ContentHolderTest with updated output, Revert "Remove File::getHandler language fallback" (T425988)
  • 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for Also merge views overflow into array-items (T426115), Also merge views overflow into array-items (T426115), Special:Preferences: Display three options for thumbsizes (T424910) (duration: 33m 28s)
  • 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm
  • 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment
  • 22:49 jdlrobson@deploy1003: jdlrobson: Backport for Also merge views overflow into array-items (T426115), Also merge views overflow into array-items (T426115), Special:Preferences: Display three options for thumbsizes (T424910) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm
  • 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for Also merge views overflow into array-items (T426115), Also merge views overflow into array-items (T426115), Special:Preferences: Display three options for thumbsizes (T424910)
  • 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage
  • 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage
  • 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for Disable interactions until load is complete (T422968 T424787) (duration: 34m 01s)
  • 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment
  • 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:01 jdlrobson@deploy1003: jdlrobson: Backport for Disable interactions until load is complete (T422968 T424787) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:59 dwisehaupt@dns1004: END - running authdns-update
  • 21:57 dwisehaupt@dns1004: START - running authdns-update
  • 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm
  • 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for Disable interactions until load is complete (T422968 T424787)
  • 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273
  • 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm
  • 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273
  • 21:38 cscott@deploy1003: Finished scap sync-world: Backport for Enabling RSS extension for cowikimedia chapter (T425440), Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332) (duration: 11m 56s)
  • 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003"
  • 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003"
  • 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment
  • 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage
  • 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for Enabling RSS extension for cowikimedia chapter (T425440), Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003"
  • 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003"
  • 21:26 cscott@deploy1003: Started scap sync-world: Backport for Enabling RSS extension for cowikimedia chapter (T425440), Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)
  • 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage
  • 21:19 cscott@deploy1003: Finished scap sync-world: Backport for Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981), Bump wikimedia/parsoid to 0.24.0-a3 (T425981), Disable unit tests that fail with new vendor release, Skip ContentHolderTest that fails with new vendor release (duration: 14m 51s)
  • 21:15 cscott@deploy1003: cscott: Continuing with deployment
  • 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side T424611
  • 21:07 cscott@deploy1003: cscott: Backport for Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981), Bump wikimedia/parsoid to 0.24.0-a3 (T425981), Disable unit tests that fail with new vendor release, Skip ContentHolderTest that fails with new vendor release synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change
  • 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm
  • 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm
  • 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003"
  • 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003"
  • 21:05 cscott@deploy1003: Started scap sync-world: Backport for Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981), Bump wikimedia/parsoid to 0.24.0-a3 (T425981), Disable unit tests that fail with new vendor release, Skip ContentHolderTest that fails with new vendor release
  • 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side T424611
  • 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003"
  • 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side T424611
  • 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm
  • 20:50 samtar@deploy1003: Finished scap sync-world: Backport for Allow svwiki bureaucrats to remove sysop rights (T425806) (duration: 09m 03s)
  • 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage
  • 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003"
  • 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment
  • 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side T424611
  • 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for Allow svwiki bureaucrats to remove sysop rights (T425806) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage
  • 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage
  • 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage
  • 20:41 samtar@deploy1003: Started scap sync-world: Backport for Allow svwiki bureaucrats to remove sysop rights (T425806)
  • 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side T424611
  • 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for docroot: Add "get_login_creds" permission to Android app. (T426010) (duration: 08m 27s)
  • 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm
  • 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side T424611
  • 20:20 dbrant@deploy1003: dbrant: Continuing with deployment
  • 20:20 dbrant@deploy1003: dbrant: Backport for docroot: Add "get_login_creds" permission to Android app. (T426010) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:18 dbrant@deploy1003: Started scap sync-world: Backport for docroot: Add "get_login_creds" permission to Android app. (T426010)
  • 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side T424611
  • 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for Enforce 2FA requirements for phase 2 groups (T423119), Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120) (duration: 11m 47s)
  • 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment
  • 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm
  • 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:05 alexsanford@deploy1003: alexsanford: Backport for Enforce 2FA requirements for phase 2 groups (T423119), Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side T424611
  • 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for Enforce 2FA requirements for phase 2 groups (T423119), Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)
  • 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side T424611
  • 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye
  • 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side T424611
  • 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage
  • 19:34 dancy@deploy1003: Finished scap sync-world: Backport for Fix MediaHandler caching to not preserve language (T425988 T425740 T425782) (duration: 07m 07s)
  • 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage
  • 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment
  • 19:30 dancy@deploy1003: jforrester, dancy: Backport for Fix MediaHandler caching to not preserve language (T425988 T425740 T425782) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:27 dancy@deploy1003: Started scap sync-world: Backport for Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)
  • 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side T424611
  • 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side T424611
  • 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye
  • 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 17:56 otto@deploy1003: Finished scap sync-world: Backport for EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952) (duration: 16m 08s)
  • 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:52 otto@deploy1003: otto: Continuing with deployment
  • 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:42 otto@deploy1003: otto: Backport for EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:40 otto@deploy1003: Started scap sync-world: Backport for EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)
  • 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply
  • 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply
  • 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm
  • 16:25 moritzm: installing Exim security updates on lists/vrts hosts
  • 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
  • 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
  • 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for wikinews: Remove unnecessary settings (T421796) (duration: 07m 22s)
  • 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
  • 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
  • 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment
  • 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for wikinews: Remove unnecessary settings (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for wikinews: Remove unnecessary settings (T421796)
  • 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:34 jelto: helm uninstall -n miscweb design-strategy - T329991
  • 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm
  • 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts
  • 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s)
  • 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts
  • 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s)
  • 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s)
  • 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply
  • 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply
  • 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm
  • 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye
  • 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye
  • 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm
  • 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply
  • 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye
  • 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye
  • 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage
  • 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001
  • 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001
  • 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001
  • 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors
  • 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors
  • 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003"
  • 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm
  • 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage
  • 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003"
  • 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001
  • 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage
  • 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001
  • 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001
  • 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors
  • 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors
  • 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003"
  • 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001
  • 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003"
  • 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage
  • 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001
  • 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage
  • 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage
  • 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage
  • 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage
  • 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 14:15 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Keep all long, non-wrapping values inside parent element (T425176), Revert "page_change - add revision.revert info" (duration: 07m 02s)
  • 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment
  • 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for Keep all long, non-wrapping values inside parent element (T425176), Revert "page_change - add revision.revert info" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271
  • 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Keep all long, non-wrapping values inside parent element (T425176), Revert "page_change - add revision.revert info"
  • 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye
  • 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye
  • 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye
  • 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye
  • 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Keep all long, non-wrapping values inside parent element (T425176), page_change - add revision.revert info (duration: 39m 36s)
  • 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271
  • 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment
  • 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272
  • 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272
  • 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003"
  • 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003"
  • 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm
  • 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye
  • 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for T423583
  • 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for Keep all long, non-wrapping values inside parent element (T425176), page_change - add revision.revert info synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Keep all long, non-wrapping values inside parent element (T425176), page_change - add revision.revert info
  • 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot
  • 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot
  • 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for ArticleGuidance: set sparql endpoint (T425389) (duration: 07m 13s)
  • 13:09 sbisson@deploy1003: sbisson: Continuing with deployment
  • 13:08 sbisson@deploy1003: sbisson: Backport for ArticleGuidance: set sparql endpoint (T425389) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:06 sbisson@deploy1003: Started scap sync-world: Backport for ArticleGuidance: set sparql endpoint (T425389)
  • 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Make DiscussionTools not show hCaptcha initially unless configured (T425955), Show CAPTCHA if required for all edits before first edit attempt (T425955), hCaptcha: Enable for DiscussionTools on testwiki (T426039), [[gerrit:1286318|hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}}
  • 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment
  • 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for Make DiscussionTools not show hCaptcha initially unless configured (T425955), Show CAPTCHA if required for all edits before first edit attempt (T425955), hCaptcha: Enable for DiscussionTools on testwiki (T426039), hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940) synced
  • {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for Make DiscussionTools not show hCaptcha initially unless configured (T425955), Show CAPTCHA if required for all edits before first edit attempt (T425955), hCaptcha: Enable for DiscussionTools on testwiki (T426039), [[gerrit:1286318|hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}}
  • 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for Special:UserLogin: Instrument no-JS form submissions (T425631) (duration: 07m 45s)
  • 12:06 kharlan@deploy1003: kharlan: Continuing with deployment
  • 12:04 kharlan@deploy1003: kharlan: Backport for Special:UserLogin: Instrument no-JS form submissions (T425631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:02 kharlan@deploy1003: Started scap sync-world: Backport for Special:UserLogin: Instrument no-JS form submissions (T425631)
  • 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003"
  • 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003"
  • 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for Update UserEntitySerializer callers (T426026) (duration: 07m 43s)
  • 09:51 kharlan@deploy1003: kharlan: Continuing with deployment
  • 09:50 kharlan@deploy1003: kharlan: Backport for Update UserEntitySerializer callers (T426026) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:48 kharlan@deploy1003: Started scap sync-world: Backport for Update UserEntitySerializer callers (T426026)
  • 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 (T419961)', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json
  • 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json
  • 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json
  • 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 (T419961)', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json
  • 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 (T419961)', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json
  • 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance
  • 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie
  • 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs T423911
  • 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie
  • 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply
  • 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply
  • 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete" (duration: 07m 02s)
  • 08:00 dcausse@deploy1003: dcausse: Rolling back deployment
  • 08:00 dcausse@deploy1003: dcausse: Backport for Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:56 dcausse@deploy1003: Started scap sync-world: Backport for Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"
  • 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie
  • 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie
  • 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie
  • 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie
  • 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage
  • 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage
  • 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage
  • 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage
  • 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet
  • 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, T418200 (duration: 07m 56s)
  • 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie
  • 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie
  • 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie
  • 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie
  • 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie
  • 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie
  • 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie
  • 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie
  • 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet
  • 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, T418200
  • 06:27 jayme@dns1004: END - running authdns-update
  • 06:26 jayme@dns1004: START - running authdns-update
  • 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs T423911 (duration: 36m 36s)
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs T423911
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s)
  • 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply
  • 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply
  • 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for Skin: Correct thumbnail class (T424910) (duration: 07m 24s)
  • 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment
  • 00:02 jdlrobson@deploy1003: jdlrobson: Backport for Skin: Correct thumbnail class (T424910) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for Skin: Correct thumbnail class (T424910)

2026-05-11

  • 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for Exclude sitesupport from button/icon treatment, remove manual styling (T425721) (duration: 06m 21s)
  • 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment
  • 23:40 jdlrobson@deploy1003: jdlrobson: Backport for Exclude sitesupport from button/icon treatment, remove manual styling (T425721) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for Exclude sitesupport from button/icon treatment, remove manual styling (T425721)
  • 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for Add support for icons in toolbox (T424571) (duration: 06m 29s)
  • 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment
  • 23:19 jdlrobson@deploy1003: jdlrobson: Backport for Add support for icons in toolbox (T424571) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for Add support for icons in toolbox (T424571)
  • 21:51 cjming@deploy1003: Finished scap sync-world: Backport for WikiLambdaApi instrument: update schema (T415254) (duration: 06m 26s)
  • 21:47 cjming@deploy1003: cjming: Continuing with deployment
  • 21:47 cjming@deploy1003: cjming: Backport for WikiLambdaApi instrument: update schema (T415254) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:45 cjming@deploy1003: Started scap sync-world: Backport for WikiLambdaApi instrument: update schema (T415254)
  • 21:29 maryum: Deployed security fix for T425406
  • 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for Enable CSPUseReportURIDirective in Wikimedia production (T424058) (duration: 06m 36s)
  • 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment
  • 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for Enable CSPUseReportURIDirective in Wikimedia production (T424058) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:09 mstyles@deploy1003: Started scap sync-world: Backport for Enable CSPUseReportURIDirective in Wikimedia production (T424058)
  • 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003"
  • 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003"
  • 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm
  • 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for Bumping portals to master (T128546) (duration: 09m 51s)
  • 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment
  • 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 20:33 jdrewniak@deploy1003: jdrewniak: Backport for Bumping portals to master (T128546) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for Bumping portals to master (T128546)
  • 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage
  • 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage
  • 20:02 zabe@deploy1003: Finished scap sync-world: Backport for Start reading from new file tables on all small and medium wikis (T416548) (duration: 06m 57s)
  • 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm
  • 19:58 zabe@deploy1003: zabe: Continuing with deployment
  • 19:57 zabe@deploy1003: zabe: Backport for Start reading from new file tables on all small and medium wikis (T416548) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:55 zabe@deploy1003: Started scap sync-world: Backport for Start reading from new file tables on all small and medium wikis (T416548)
  • 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye
  • 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts
  • 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper
  • 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269
  • 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269
  • 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003"
  • 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003"
  • 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm
  • 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 19:16 dzahn@dns1005: END - running authdns-update
  • 19:14 dzahn@dns1005: START - running authdns-update
  • 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space
  • 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage
  • 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage
  • 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye
  • 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie
  • 18:12 ottomata: roll restarting eventgate-main to pick up changes for T423952
  • 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm
  • 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm
  • 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm
  • 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye
  • 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 (T419961)', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json
  • 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268
  • 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268
  • 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003"
  • 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003"
  • 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json
  • 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json
  • 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts
  • 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s)
  • 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 (T419961)', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json
  • 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
  • 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
  • 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie
  • 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 (T419961)', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json
  • 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance
  • 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
  • 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
  • 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:27 zabe@deploy1003: Finished scap sync-world: Backport for Disable FlaggedRevs on wikinews (T423577) (duration: 06m 54s)
  • 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply
  • 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply
  • 16:23 zabe@deploy1003: zabe: Continuing with deployment
  • 16:22 zabe@deploy1003: zabe: Backport for Disable FlaggedRevs on wikinews (T423577) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:20 zabe@deploy1003: Started scap sync-world: Backport for Disable FlaggedRevs on wikinews (T423577)
  • 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 15:58 zabe@deploy1003: Finished scap sync-world: Backport for Remove custom user groups from Wikinews (T423578) (duration: 07m 48s)
  • 15:54 zabe@deploy1003: zabe: Continuing with deployment
  • 15:52 zabe@deploy1003: zabe: Backport for Remove custom user groups from Wikinews (T423578) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:50 zabe@deploy1003: Started scap sync-world: Backport for Remove custom user groups from Wikinews (T423578)
  • 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:46 zabe@deploy1003: Finished scap sync-world: Backport for Start reading from new file tables on testwiki (2nd try) (T416548) (duration: 06m 32s)
  • 15:42 zabe@deploy1003: zabe: Continuing with deployment
  • 15:41 zabe@deploy1003: zabe: Backport for Start reading from new file tables on testwiki (2nd try) (T416548) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:39 zabe@deploy1003: Started scap sync-world: Backport for Start reading from new file tables on testwiki (2nd try) (T416548)
  • 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
  • 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
  • 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm
  • 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement
  • 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
  • 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
  • 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017
  • 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017
  • 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
  • 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
  • 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386), WikiLambdaApi: update stream configuration (T415254), WikiLambdaApi instrument: Sets the custom schemaID (T415254), editSaves: getExperiment returns a promise now (T425785) (duration: 18
  • 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment
  • {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386), WikiLambdaApi: update stream configuration (T415254), WikiLambdaApi instrument: Sets the custom schemaID (T415254), [[gerrit:1285406|editSaves: getExperiment returns a promise now}}
  • 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386), WikiLambdaApi: update stream configuration (T415254), WikiLambdaApi instrument: Sets the custom schemaID (T415254), editSaves: getExperiment returns a promise now (T425785)
  • {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Prevent username registration if the username previously existed (T196386), Prevent username registration if the username previously existed (v2) (T196386), API: Introduce list=globalusers (T261752), [[gerrit:1285761|list=globalusers: Avoid querying group permissions with empty group list (}}
  • 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm
  • 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment
  • {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for Prevent username registration if the username previously existed (T196386), Prevent username registration if the username previously existed (v2) (T196386), API: Introduce list=globalusers (T261752), [[gerrit:1285761|list=globalusers: Avoid querying group permissions with empty group}}
  • 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad
  • 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
  • 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm
  • 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
  • 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad
  • 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw
  • 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
  • 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
  • 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw
  • 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage
  • {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Prevent username registration if the username previously existed (T196386), Prevent username registration if the username previously existed (v2) (T196386), API: Introduce list=globalusers (T261752), [[gerrit:1285761|list=globalusers: Avoid querying group permissions with empty group list (T}}
  • 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage
  • 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for T420437
  • 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Enable and configure WikiProjects prototype on Wikidata beta (T421850) (duration: 06m 28s)
  • 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm
  • 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie
  • 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment
  • 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for Enable and configure WikiProjects prototype on Wikidata beta (T421850) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Enable and configure WikiProjects prototype on Wikidata beta (T421850)
  • 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:07 otto@deploy1003: Finished scap sync-world: Backport for EventStreamConfig - add mediawiki.user_change.dev0 (T423952) (duration: 08m 05s)
  • 13:06 elukey: remove old discovery pki intermediate
  • 13:03 otto@deploy1003: otto: Continuing with deployment
  • 13:01 otto@deploy1003: otto: Backport for EventStreamConfig - add mediawiki.user_change.dev0 (T423952) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:59 otto@deploy1003: Started scap sync-world: Backport for EventStreamConfig - add mediawiki.user_change.dev0 (T423952)
  • 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Enable editing on group0 wikis (T425354) (duration: 12m 07s)
  • 12:47 kharlan@deploy1003: kharlan: Continuing with deployment
  • 12:45 kharlan@deploy1003: kharlan: Backport for hCaptcha: Enable editing on group0 wikis (T425354) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:41 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Enable editing on group0 wikis (T425354)
  • 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage
  • 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage
  • 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie
  • 12:04 topranks: push out updated ACL to Nokia switches for BGP connections (T425703) and add BFD config (T425813)
  • 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot
  • 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts
  • 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply
  • 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply
  • 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments T418200 (duration: 13m 28s)
  • 11:21 jayme@deploy1003: Rolling back deployment
  • 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments T418200
  • 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - T418200
  • 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clément Goubert out of all services on: 2459 hosts
  • 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s)
  • 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image
  • 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
  • 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
  • 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 10:16 slyngs: Migrate of lvs2012 due to hardware issues
  • 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Enable for group0 wikis (T425354) (duration: 30m 15s)
  • 10:10 moritzm: rebalance routed Ganeti cluster in eqsin T421863
  • 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:59 kharlan@deploy1003: kharlan: Continuing with deployment
  • 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:58 kharlan@deploy1003: kharlan: Backport for hCaptcha: Enable for group0 wikis (T425354) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure
  • 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure
  • 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: T419635
  • 09:41 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Enable for group0 wikis (T425354)
  • 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T419961)', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json
  • 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json
  • 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd
  • 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json
  • 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: T419635
  • 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T419961)', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json
  • 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd
  • 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 (T419961)', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01
  • 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 08:10 slyngshede@dns1004: END - running authdns-update
  • 08:08 slyngshede@dns1004: START - running authdns-update
  • 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003"
  • 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003"
  • 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
  • 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
  • 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
  • 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
  • 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors
  • 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm
  • 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage
  • 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage
  • 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet
  • 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet
  • 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org
  • 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org
  • 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s)
  • 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2026-05-10

  • 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per phab:T425504' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # T425504
  • 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per phab:T425504' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # T425504
  • 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per phab:T425503' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # T425503
  • 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per phab:T425504' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # T425504
  • 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per phab:T425503' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # T425503
  • 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per phab:T425504' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # T425504
  • 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per phab:T425503' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # T425503
  • 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s)
  • 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2026-05-09

  • 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003"
  • 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003
  • 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003
  • 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003"
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s)
  • 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm
  • 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage
  • 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage
  • 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm
  • 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED

2026-05-08

  • 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267
  • 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267
  • 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003"
  • 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003"
  • 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm
  • 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage
  • 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage
  • 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm
  • 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266
  • 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266
  • 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003"
  • 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003"
  • 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm
  • 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage
  • 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage
  • 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm
  • 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265
  • 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265
  • 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003"
  • 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003"
  • 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/
  • 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health
  • 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps
  • 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad
  • 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart
  • 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet
  • 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet
  • 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:51 btullis: re-pooled wdqs-main in eqiad for T425758
  • 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad
  • 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart
  • 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 09:44 btullis: depooled wdqs-main in eqiad for T425758
  • 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad
  • 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T419635)', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json
  • 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json
  • 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json
  • 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T419635)', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json
  • 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 (T419635)', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json
  • 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T419635)', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json
  • 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json
  • 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json
  • 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T419635)', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json
  • 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 (T419635)', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json
  • 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org
  • 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie
  • 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org
  • 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie
  • 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie
  • 06:11 moritzm: installing postorius security updates
  • 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage
  • 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage
  • 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie
  • 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie
  • 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie
  • 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie
  • 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie
  • 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage
  • 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage
  • 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie
  • 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024
  • 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024
  • 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003"
  • 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003"
  • 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie
  • 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage
  • 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage
  • 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie
  • 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023
  • 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023
  • 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003"
  • 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003"
  • 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox

2026-05-07

  • 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie
  • 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage
  • 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage
  • 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie
  • 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s)
  • 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19]
  • 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s)
  • 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19]
  • 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s)
  • 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19]
  • {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3), composer.json: Update webonyx/graphql-php to ^15.32.3, Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731), [[gerrit:1284837|Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}}
  • 21:23 cscott@deploy1003: cscott: Continuing with deployment
  • 21:17 cscott@deploy1003: cscott: Backport for Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3), composer.json: Update webonyx/graphql-php to ^15.32.3, Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731), Bump wikimedia/parsoid to 0.24.0-a2 (T425731) synced to the t
  • {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3), composer.json: Update webonyx/graphql-php to ^15.32.3, Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731), [[gerrit:1284837|Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}}
  • 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie
  • 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for Revert "Enable mobile editor abandonment survey on enwiki" (T424102), Remove duplicate definition of EditCheckAction#isTagged (T425583), Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583) (duration: 06m 38s)
  • 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment
  • 20:44 kemayo@deploy1003: esanders, kemayo: Backport for Revert "Enable mobile editor abandonment survey on enwiki" (T424102), Remove duplicate definition of EditCheckAction#isTagged (T425583), Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v
  • 20:42 kemayo@deploy1003: Started scap sync-world: Backport for Revert "Enable mobile editor abandonment survey on enwiki" (T424102), Remove duplicate definition of EditCheckAction#isTagged (T425583), Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)
  • 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki
  • 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki
  • 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage
  • 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage
  • 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for Provide page context for LintErrorChecker (T419596), Make email confirmation banner a standalone RL module (T425677) (duration: 07m 18s)
  • 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment
  • 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie
  • 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for Provide page context for LintErrorChecker (T419596), Make email confirmation banner a standalone RL module (T425677) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:07 arlolra@deploy1003: Started scap sync-world: Backport for Provide page context for LintErrorChecker (T419596), Make email confirmation banner a standalone RL module (T425677)
  • 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie
  • 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie
  • 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022
  • 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022
  • 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003"
  • 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003"
  • 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs T423910
  • 18:06 cdanis@dns1005: END - running authdns-update
  • 18:04 cdanis@dns1005: START - running authdns-update
  • 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for Profiler: Set explicit "excimer-wall" redis channel instead of concat (duration: 29m 24s)
  • 18:02 brennen: 1.47.0-wmf.1 train status (T423910): blockers resolved, rolling to all wikis
  • 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 17:51 krinkle@deploy1003: krinkle: Continuing with deployment
  • 17:50 krinkle@deploy1003: krinkle: Backport for Profiler: Set explicit "excimer-wall" redis channel instead of concat synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 17:33 krinkle@deploy1003: Started scap sync-world: Backport for Profiler: Set explicit "excimer-wall" redis channel instead of concat
  • 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart
  • 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart
  • 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 16:32 jynus: restarting backup1-* database primary hosts
  • 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart
  • 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart
  • 16:14 sukhe@dns1004: END - running authdns-update
  • 16:13 sukhe@dns1004: START - running authdns-update
  • 16:13 sukhe@dns1004: START - running authdns-update
  • 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox)
  • 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart
  • 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox)
  • 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified]
  • 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified]
  • 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P{lvs4009*} and A:liberica
  • 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs4009.ulsfo.wmnet} and A:liberica
  • 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P{lvs4009.ulsfo.wmnet} and A:liberica
  • 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs4009.ulsfo.wmnet} and A:liberica
  • 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P{lvs4009.ulsfo.wmnet} and A:liberica
  • 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P{lvs4009*} and A:liberica
  • 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts
  • 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts
  • 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs4009*} and A:liberica
  • 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs4009*} and A:liberica
  • 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet
  • 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified]
  • 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified]
  • 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs
  • 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad
  • 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s)
  • 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c]
  • 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s)
  • 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad
  • 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c]
  • 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s)
  • 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh T408892]
  • 14:32 slyngshede@dns1004: END - running authdns-update
  • 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c]
  • 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 14:30 slyngshede@dns1004: START - running authdns-update
  • 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 14:30 akhatun: Deploying Refinery at 4734c67 for weekly deployment train
  • 14:30 jmm@dns1004: END - running authdns-update
  • 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 14:28 jmm@dns1004: START - running authdns-update
  • 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003"
  • 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003"
  • 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox
  • 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw
  • 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw
  • 13:34 stran@deploy1003: Finished scap sync-world: Backport for Enable staggered rollout for IRS on enwiki (T424008), Fix when user is considered exposed to the feature in the experiment (T424075) (duration: 09m 05s)
  • 13:30 stran@deploy1003: stran: Continuing with deployment
  • 13:27 stran@deploy1003: stran: Backport for Enable staggered rollout for IRS on enwiki (T424008), Fix when user is considered exposed to the feature in the experiment (T424075) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:25 stran@deploy1003: Started scap sync-world: Backport for Enable staggered rollout for IRS on enwiki (T424008), Fix when user is considered exposed to the feature in the experiment (T424075)
  • 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for Remove the progress bar, mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311) (duration: 06m 55s)
  • 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment
  • 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for Remove the progress bar, mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:03 jforrester@deploy1003: Started scap sync-world: Backport for Remove the progress bar, mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)
  • 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh T408892]
  • 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 12:45 sukhe@dns1004: FAIL - running authdns-update
  • 12:44 sukhe@dns1004: START - running authdns-update
  • 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie
  • 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org
  • 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm
  • 12:23 slyngshede@dns1004: FAIL - running authdns-update
  • 12:21 slyngshede@dns1004: START - running authdns-update
  • 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release
  • 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003"
  • 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003"
  • 12:12 slyngshede@dns1004: FAIL - running authdns-update
  • 12:11 slyngshede@dns1004: START - running authdns-update
  • 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh T408892]
  • 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage
  • 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie
  • 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage
  • 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage
  • 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage
  • 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie
  • 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie
  • 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage
  • 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie
  • 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org
  • 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org
  • 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie
  • 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
  • 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie
  • 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage
  • 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 (T419961)', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json
  • 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie
  • 11:11 moritzm: instaling modsecurity-apache security updates
  • 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie
  • 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm
  • 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002"
  • 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002"
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors
  • 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org
  • 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184
  • 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184
  • 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184
  • 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors
  • 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Russian Wikinews (T421796) (duration: 08m 40s)
  • 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 10:54 root@cumin1003: START - Cookbook sre.dns.netbox
  • 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json
  • 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 10:49 ladsgroup@deploy1003: ladsgroup: Backport for Close Russian Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184
  • 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie
  • 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
  • 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Russian Wikinews (T421796)
  • 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 (T419961)', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json
  • 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
  • 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage
  • 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors
  • 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org
  • 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 (T419961)', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json
  • 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance
  • 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie
  • 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org
  • 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie
  • 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie
  • 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie
  • 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie
  • 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie
  • 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie
  • 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie
  • 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie
  • 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie
  • 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org
  • 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, T408892]
  • 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, T408892]
  • 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages T424686
  • 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org
  • 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie
  • 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org
  • 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage
  • 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage
  • 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd
  • 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie
  • 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie
  • 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie
  • 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd
  • 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd
  • 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet
  • 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet
  • 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie
  • 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
  • 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org
  • 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage
  • 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage
  • 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org
  • 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage
  • 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage
  • 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie
  • 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie
  • 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie
  • 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie
  • 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie
  • 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie
  • 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie
  • 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie
  • 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet
  • 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet
  • 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd
  • 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 T425522', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json
  • 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage
  • 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage
  • 08:23 XioNoX: drmrs remove old v6 gateway IP
  • 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003"
  • 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage
  • 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003"
  • 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01
  • 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01
  • 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01
  • 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01
  • 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for search: add alt. completion indices to test keyword tokenizer (2/2) (T420427) (duration: 09m 46s)
  • 07:49 dcausse@deploy1003: dcausse: Continuing with deployment
  • 07:46 dcausse@deploy1003: dcausse: Backport for search: add alt. completion indices to test keyword tokenizer (2/2) (T420427) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd
  • 07:44 dcausse@deploy1003: Started scap sync-world: Backport for search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)
  • 07:32 moritzm: installing apache2 security updates
  • 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd
  • 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet
  • 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet
  • 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd
  • 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd
  • 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd
  • 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd
  • 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01
  • 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01
  • 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie
  • 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie
  • 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie
  • 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie
  • 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage
  • 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage
  • 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage
  • 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage
  • 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie
  • 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie
  • 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie
  • 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie
  • 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 T424848', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json
  • 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary T424848', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json
  • 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - T424848
  • 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 T424848
  • 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 T424848', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json
  • 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s)
  • 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 01:15 zabe@deploy1003: Finished scap sync-world: Backport for Drop some unneeded wikinews configs (T421796) (duration: 12m 57s)
  • 01:09 zabe@deploy1003: zabe: Continuing with deployment
  • 01:09 zabe@deploy1003: zabe: Backport for Drop some unneeded wikinews configs (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:02 zabe@deploy1003: Started scap sync-world: Backport for Drop some unneeded wikinews configs (T421796)
  • 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie
  • 00:43 zabe@deploy1003: Finished scap sync-world: Backport for Undeploy GoogleNewsSitemap (T421798) (duration: 33m 54s)
  • 00:31 zabe@deploy1003: zabe: Continuing with deployment
  • 00:29 zabe@deploy1003: zabe: Backport for Undeploy GoogleNewsSitemap (T421798) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:10 zabe@deploy1003: Started scap sync-world: Backport for Undeploy GoogleNewsSitemap (T421798)

2026-05-06

  • 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie
  • 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie
  • 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s)
  • 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s)
  • 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Spanish Wikinews (T421796) (duration: 07m 08s)
  • 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 22:45 ladsgroup@deploy1003: ladsgroup: Backport for Close Spanish Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Spanish Wikinews (T421796)
  • 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close English Wikinews (T421796) (duration: 06m 40s)
  • 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 22:28 ladsgroup@deploy1003: ladsgroup: Backport for Close English Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for Close English Wikinews (T421796)
  • 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie
  • 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:14 cjming@deploy1003: Finished scap sync-world: Backport for UBN fix: guard entry.serverTiming before forEach (T425591) (duration: 06m 25s)
  • 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:10 cjming@deploy1003: cjming: Continuing with deployment
  • 22:10 cjming@deploy1003: cjming: Backport for UBN fix: guard entry.serverTiming before forEach (T425591) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:08 cjming@deploy1003: Started scap sync-world: Backport for UBN fix: guard entry.serverTiming before forEach (T425591)
  • 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021
  • 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021
  • 21:52 zabe@deploy1003: Finished scap sync-world: Backport for Disable GNSM on dewikinews (T421798) (duration: 06m 56s)
  • 21:48 zabe@deploy1003: zabe: Continuing with deployment
  • 21:47 zabe@deploy1003: zabe: Backport for Disable GNSM on dewikinews (T421798) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:45 zabe@deploy1003: Started scap sync-world: Backport for Disable GNSM on dewikinews (T421798)
  • 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie
  • 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003"
  • 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003"
  • 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021
  • 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021
  • 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
  • 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:28 catrope@deploy1003: Finished scap sync-world: Backport for Replace use of $wgRequest (T336703) (duration: 09m 12s)
  • 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
  • 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment
  • 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for Replace use of $wgRequest (T336703) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:19 catrope@deploy1003: Started scap sync-world: Backport for Replace use of $wgRequest (T336703)
  • 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie
  • 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie
  • 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie
  • 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie
  • 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie
  • 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie
  • 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm
  • 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
  • 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
  • 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
  • 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
  • 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie
  • 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs T423910
  • 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:37 dzahn@dns1005: END - running authdns-update
  • 18:35 dzahn@dns1005: START - running authdns-update
  • 18:33 brennen: 1.47.0-wmf.1 train status (T423910): blockers resolved, rolling to group1
  • 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie
  • 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm
  • 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo
  • 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo
  • 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from T425301 - bking@cumin2002
  • 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo
  • 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo
  • 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch T408892
  • 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo
  • 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work
  • 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo
  • 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch T408892
  • 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie
  • 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm
  • 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm
  • 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm
  • 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm
  • 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution.
  • 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'"
  • 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution.
  • 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Chinese Wikinews (T421796) (duration: 06m 41s)
  • 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 15:01 ladsgroup@deploy1003: ladsgroup: Backport for Close Chinese Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Chinese Wikinews (T421796)
  • 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet
  • 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie
  • 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change]
  • 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change]
  • 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for Add user_groups to editAttemptStep schema (T424010) (duration: 11m 16s)
  • 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 14:26 kharlan@deploy1003: kharlan: Continuing with deployment
  • 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'"
  • 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:21 kharlan@deploy1003: kharlan: Backport for Add user_groups to editAttemptStep schema (T424010) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet
  • 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm
  • 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:19 kharlan@deploy1003: Started scap sync-world: Backport for Add user_groups to editAttemptStep schema (T424010)
  • 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet
  • 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close German Wikinews (T421796) (duration: 06m 40s)
  • 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie
  • 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 ladsgroup@deploy1003: ladsgroup: Backport for Close German Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for Close German Wikinews (T421796)
  • 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close French Wikinews (T421796) (duration: 11m 28s)
  • 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet
  • 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
  • 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
  • 13:55 ladsgroup@deploy1003: ladsgroup: Backport for Close French Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie
  • 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from T425301 - bking@cumin2002
  • 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie
  • 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for Close French Wikinews (T421796)
  • 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
  • 13:45 jgreen@dns1004: END - running authdns-update
  • 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for Add messages related to mandatory 2FA for more groups (T423119) (duration: 30m 53s)
  • 13:44 jgreen@dns1004: START - running authdns-update
  • 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
  • 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm
  • 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors
  • 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors
  • 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment
  • 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage
  • 13:31 alexsanford@deploy1003: alexsanford: Backport for Add messages related to mandatory 2FA for more groups (T423119) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet']
  • 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet']
  • 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet']
  • 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet']
  • 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet']
  • 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet']
  • 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors
  • 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie
  • 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors
  • 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003"
  • 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003"
  • 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for Add messages related to mandatory 2FA for more groups (T423119)
  • 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie
  • 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie
  • 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie
  • 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie
  • 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie
  • 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
  • 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update
  • 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
  • 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage
  • 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage
  • 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie
  • 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie
  • 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie
  • 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie
  • 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Polish Wikinews (T421796) (duration: 06m 28s)
  • 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 12:07 ladsgroup@deploy1003: ladsgroup: Backport for Close Polish Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie
  • 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Polish Wikinews (T421796)
  • 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
  • 11:50 moritzm: installing openjdk-17 security updates
  • 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
  • 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 (T419961)', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json
  • 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie
  • 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot
  • 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie
  • 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage
  • 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie
  • 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
  • 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm
  • 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
  • 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json
  • 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie
  • 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json
  • 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie
  • 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie
  • 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 (T419961)', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json
  • 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie
  • 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie
  • 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage
  • 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage
  • 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm
  • 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet']
  • 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet']
  • 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet']
  • 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet']
  • 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 (T419961)', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json
  • 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance
  • 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 (T419961)', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json
  • 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie
  • 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie
  • 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie
  • 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json
  • 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 (T419961)', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json
  • 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003"
  • 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 (T419961)', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json
  • 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance
  • 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003"
  • 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 (T419961)', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json
  • 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie
  • 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update
  • 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 T418979ç', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json
  • 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW T418979
  • 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99)
  • 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
  • 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW T418979
  • 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json
  • 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json
  • 09:03 zabe@deploy1003: Finished scap sync-world: Backport for Correctly support new file tables in RevisionDeleteUser (T424553) (duration: 08m 44s)
  • 08:59 zabe@deploy1003: zabe: Continuing with deployment
  • 08:56 zabe@deploy1003: zabe: Backport for Correctly support new file tables in RevisionDeleteUser (T424553) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:54 zabe@deploy1003: Started scap sync-world: Backport for Correctly support new file tables in RevisionDeleteUser (T424553)
  • 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 (T419961)', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json
  • 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 T418973', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json
  • 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 (T419961)', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json
  • 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance
  • 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie
  • 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie
  • 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie
  • 08:06 awight: EU morning deployment is done
  • 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw
  • 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW T418979
  • 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
  • 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW T418979
  • 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie
  • 07:40 awight@deploy1003: Finished scap sync-world: Backport for VE: Avoid counting all refs when listIndex is undefined (T425433), search: fix alt. completion indices to test keyword tokenizer (T420427), search: enable Latin-to-Devanagari transliteration second-chance (T425018) (duration: 08m 58s)
  • 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment
  • 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for VE: Avoid counting all refs when listIndex is undefined (T425433), search: fix alt. completion indices to test keyword tokenizer (T420427), search: enable Latin-to-Devanagari transliteration second-chance (T425018) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can
  • 07:31 awight@deploy1003: Started scap sync-world: Backport for VE: Avoid counting all refs when listIndex is undefined (T425433), search: fix alt. completion indices to test keyword tokenizer (T420427), search: enable Latin-to-Devanagari transliteration second-chance (T425018)
  • 07:26 awight@deploy1003: Finished scap sync-world: Backport for VE: Avoid counting all refs when listIndex is undefined (T425433) (duration: 07m 37s)
  • 07:22 awight@deploy1003: awight, lilients: Continuing with deployment
  • 07:21 awight@deploy1003: awight, lilients: Backport for VE: Avoid counting all refs when listIndex is undefined (T425433) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:19 awight@deploy1003: Started scap sync-world: Backport for VE: Avoid counting all refs when listIndex is undefined (T425433)
  • 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet
  • 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie
  • 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie
  • 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet
  • 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet
  • 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet
  • 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues T425506
  • 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
  • 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage
  • 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
  • 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage
  • 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie
  • 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie
  • 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie
  • 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie
  • 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie
  • 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie
  • 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie
  • 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie
  • 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie
  • 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie
  • 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie
  • 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie
  • 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie
  • 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie
  • 05:11 marostegui@dns1004: END - running authdns-update
  • 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 T425318', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json
  • 05:09 marostegui@dns1004: START - running authdns-update
  • 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write T425318', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json
  • 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T425318', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json
  • 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - T425318
  • 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T425318
  • 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 T425318', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json
  • 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s)
  • 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie
  • 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Dutch Wikinews (T421796) (duration: 06m 26s)
  • 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 00:49 ladsgroup@deploy1003: ladsgroup: Backport for Close Dutch Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Dutch Wikinews (T421796)
  • 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
  • 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
  • 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Italian Wikinews (T421796) (duration: 07m 26s)
  • 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001
  • 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001
  • 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie
  • 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 00:21 ladsgroup@deploy1003: ladsgroup: Backport for Close Italian Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Italian Wikinews (T421796)

2026-05-05

  • 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002"
  • 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002"
  • 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Arabic Wikinews (T421796) (duration: 06m 58s)
  • 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 22:49 ladsgroup@deploy1003: ladsgroup: Backport for Close Arabic Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Arabic Wikinews (T421796)
  • 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Ukrainian Wikinews (T421796) (duration: 06m 28s)
  • 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 22:39 ladsgroup@deploy1003: ladsgroup: Backport for Close Ukrainian Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Ukrainian Wikinews (T421796)
  • 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Romanian Wikinews (T421796) (duration: 07m 56s)
  • 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 22:20 ladsgroup@deploy1003: ladsgroup: Backport for Close Romanian Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Romanian Wikinews (T421796)
  • 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Serbian Wikinews (T421796) (duration: 06m 45s)
  • 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 22:11 ladsgroup@deploy1003: ladsgroup: Backport for Close Serbian Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Serbian Wikinews (T421796)
  • 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Persian Wikinews (T421796) (duration: 11m 07s)
  • 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 21:58 ladsgroup@deploy1003: ladsgroup: Backport for Close Persian Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Persian Wikinews (T421796)
  • 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for Email confirmation banner: Remove obsolete arm_b variant (T421366), Legacy parser no longer varies by user thumbnail size. (T417513) (duration: 32m 55s)
  • 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment
  • 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for Email confirmation banner: Remove obsolete arm_b variant (T421366), Legacy parser no longer varies by user thumbnail size. (T417513) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:16 arlolra@deploy1003: Started scap sync-world: Backport for Email confirmation banner: Remove obsolete arm_b variant (T421366), Legacy parser no longer varies by user thumbnail size. (T417513)
  • 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts
  • 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s)
  • 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for hCaptcha: Add diagnostic context to script load error logs (T424496), sectionCollapsing: Scroll to fragment target on init (T425290), Errors added below ref list dirty when not responsive (T384599) (duration: 10m 59s)
  • 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment
  • 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for hCaptcha: Add diagnostic context to script load error logs (T424496), sectionCollapsing: Scroll to fragment target on init (T425290), Errors added below ref list dirty when not responsive (T384599) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve
  • 20:46 arlolra@deploy1003: Started scap sync-world: Backport for hCaptcha: Add diagnostic context to script load error logs (T424496), sectionCollapsing: Scroll to fragment target on init (T425290), Errors added below ref list dirty when not responsive (T384599)
  • 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie
  • 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for Enable WikiLove on shwiki (T424891), Add wikibase.v1 module to the sandbox were it is present (T422403) (duration: 10m 30s)
  • 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie
  • 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment
  • 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
  • 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for Enable WikiLove on shwiki (T424891), Add wikibase.v1 module to the sandbox were it is present (T422403) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:12 arlolra@deploy1003: Started scap sync-world: Backport for Enable WikiLove on shwiki (T424891), Add wikibase.v1 module to the sandbox were it is present (T422403)
  • 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
  • 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
  • 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
  • 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie
  • 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002
  • 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002
  • 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002
  • 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003"
  • 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003"
  • 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox
  • 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002
  • 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie
  • 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts
  • 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s)
  • 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation
  • 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs T423910
  • 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - T408892"
  • 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - T408892"
  • 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie
  • 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Swedish Wikinews (T421796) (duration: 10m 59s)
  • 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 18:55 ladsgroup@deploy1003: ladsgroup: Backport for Close Swedish Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Swedish Wikinews (T421796)
  • 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie
  • 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002"
  • 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002"
  • 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs T423910 (duration: 36m 04s)
  • 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie
  • 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002"
  • 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002"
  • 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie
  • 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie
  • 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw
  • 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw
  • 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie
  • 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs T423910
  • 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad
  • 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad
  • 18:06 brennen: 1.47.0-wmf.1 train status (T423910): no current blockers, rolling to group0
  • 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie
  • 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
  • 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
  • 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003
  • 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003
  • 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003
  • 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003"
  • 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003"
  • 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox
  • 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003
  • 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie
  • 17:05 sukhe: sudo cumin -b11 "A:cp and not P{cp2041* or cp2042*} and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'"
  • 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for Set $wgReauthenticateTime editsitejs to one hour (T197137), Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607), Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607) (duration: 07m 25s)
  • 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment
  • 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for Set $wgReauthenticateTime editsitejs to one hour (T197137), Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607), Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb
  • 16:50 sbassett@deploy1003: Started scap sync-world: Backport for Set $wgReauthenticateTime editsitejs to one hour (T197137), Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607), Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)
  • 16:38 sbassett@deploy1003: Started scap sync-world: Backport for Set $wgReauthenticateTime editsitejs to one hour (T197137), Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)
  • 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
  • 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
  • 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
  • 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
  • 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Japanese Wikinews (T421796) (duration: 06m 16s)
  • 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 16:07 ladsgroup@deploy1003: ladsgroup: Backport for Close Japanese Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Japanese Wikinews (T421796)
  • 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Korean Wikinews (T421796) (duration: 07m 53s)
  • 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
  • 15:55 ladsgroup@deploy1003: ladsgroup: Backport for Close Korean Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
  • 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
  • 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
  • 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Korean Wikinews (T421796)
  • 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Finnish Wikinews (T421796) (duration: 06m 12s)
  • 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 15:47 ladsgroup@deploy1003: ladsgroup: Backport for Close Finnish Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Finnish Wikinews (T421796)
  • 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 15:39 dzahn@dns1005: END - running authdns-update
  • 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore
  • 15:37 dzahn@dns1005: START - running authdns-update
  • 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Czech Wikinews (T421796) (duration: 06m 17s)
  • 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 (T419961)', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json
  • 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 15:16 ladsgroup@deploy1003: ladsgroup: Backport for Close Czech Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Czech Wikinews (T421796)
  • 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json
  • 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Tamil Wikinews (T421796) (duration: 07m 06s)
  • 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 15:03 ladsgroup@deploy1003: ladsgroup: Backport for Close Tamil Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Tamil Wikinews (T421796)
  • 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for fix: wrong property name action_data (T425425) (duration: 07m 48s)
  • 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json
  • 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment
  • 14:53 urbanecm@deploy1003: urbanecm: Backport for fix: wrong property name action_data (T425425) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T419635)', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json
  • 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for fix: wrong property name action_data (T425425)
  • 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 (T419961)', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json
  • 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie
  • 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json
  • 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 (T419961)', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json
  • 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance
  • 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 (T419961)', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json
  • 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json
  • 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad
  • 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json
  • 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage
  • 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet
  • 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage
  • 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet
  • 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T419635)', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json
  • 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json
  • 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
  • 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004
  • 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004
  • 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie
  • 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet
  • 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003"
  • 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003"
  • 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 (T419961)', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json
  • 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
  • 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox
  • 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad
  • 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw
  • 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 14:03 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet
  • 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution.
  • 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet
  • 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet
  • 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003"
  • 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003"
  • 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 (T419961)', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json
  • 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance
  • 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 (T419961)', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json
  • 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling
  • 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet
  • 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox
  • 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Portuguese Wikinews (T421796) (duration: 06m 22s)
  • 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet
  • 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json
  • 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 13:49 ladsgroup@deploy1003: ladsgroup: Backport for Close Portuguese Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw
  • 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet
  • 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003"
  • 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Portuguese Wikinews (T421796)
  • 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003"
  • 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 (T419635)', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json
  • 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling
  • 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox
  • 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution.
  • 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 (T419635)', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json
  • 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json
  • 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet
  • 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet
  • 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003"
  • 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003"
  • 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox
  • 13:30 Msz2001: UTC afternoon backport window done
  • 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 (T419961)', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json
  • 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet
  • 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690), Move privileged global and local group handling to WikimediaCustomizations (T418507), Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256) (duration: 08m 37s)
  • 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: T416582
  • 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 (T419961)', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json
  • 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance
  • 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T419961)', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json
  • 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment
  • 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690), Move privileged global and local group handling to WikimediaCustomizations (T418507), Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug
  • 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690), Move privileged global and local group handling to WikimediaCustomizations (T418507), Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
  • 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484) (duration: 07m 55s)
  • 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
  • 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json
  • 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment
  • 13:05 mszwarc@deploy1003: mszwarc: Backport for Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)
  • 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json
  • 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Esperanto Wikinews (T421796) (duration: 07m 23s)
  • 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 12:50 ladsgroup@deploy1003: ladsgroup: Backport for Close Esperanto Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Esperanto Wikinews (T421796)
  • 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T419961)', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json
  • 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for loggedOutWarning: instrument browser navigation and tab close (T421518) (duration: 03m 56s)
  • 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment
  • 12:42 moritzm: installing node-tar security updates
  • 12:41 sgimeno@deploy1003: sgimeno: Backport for loggedOutWarning: instrument browser navigation and tab close (T421518) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for loggedOutWarning: instrument browser navigation and tab close (T421518)
  • 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 (T419961)', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json
  • 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 12:36 moritzm: installing imagemagick security updates
  • 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T419961)', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json
  • 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply
  • 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json
  • 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json
  • 12:04 moritzm: installing postgresql-13 security updates
  • 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T419961)', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json
  • 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Shan Wikinews (T421796) (duration: 06m 13s)
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet
  • 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 (T419961)', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json
  • 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T419961)', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json
  • 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 11:53 ladsgroup@deploy1003: ladsgroup: Backport for Close Shan Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet
  • 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Shan Wikinews (T421796)
  • 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Norwegian Wikinews (T421796) (duration: 09m 21s)
  • 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet
  • 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json
  • 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet
  • 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet
  • 11:39 ladsgroup@deploy1003: ladsgroup: Backport for Close Norwegian Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet
  • 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Norwegian Wikinews (T421796)
  • 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json
  • 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T419635)', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json
  • 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T419961)', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json
  • 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 (T419961)', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json
  • 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance
  • 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T419961)', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json
  • 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json
  • 11:10 moritzm: installing ca-certificates updates from bookworm point release
  • 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie
  • 11:07 moritzm: installing multipart bugfix updates from bookworm point release
  • 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json
  • 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs4009*} and A:liberica
  • 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs4009*} and A:liberica
  • 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json
  • 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie
  • 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json
  • 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T419635)', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json
  • 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T419961)', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json
  • 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 (T419635)', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json
  • 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 (T419961)', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json
  • 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T419961)', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json
  • 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json
  • 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie
  • 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie
  • 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json
  • 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie
  • 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json
  • 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T419635)', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json
  • 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 (T419635)', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json
  • 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie
  • 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie
  • 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie
  • 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie
  • 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie
  • 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie
  • 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie
  • 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie
  • 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T419961)', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json
  • 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T419635)', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json
  • 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 (T419961)', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json
  • 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json
  • 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T419961)', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json
  • 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json
  • 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json
  • 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie
  • 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T419635)', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json
  • 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie
  • 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json
  • 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org
  • 08:50 moritzm: installing augeas security updates
  • 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors
  • 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 (T419635)', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json
  • 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T419961)', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json
  • 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement
  • 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 (T419961)', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json
  • 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T419961)', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json
  • 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage
  • 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors
  • 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json
  • 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie
  • 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage
  • 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # T414641
  • 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie
  • 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org
  • 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json
  • 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # T414641
  • 08:05 ayounsi@dns1004: END - running authdns-update
  • 08:03 ayounsi@dns1004: START - running authdns-update
  • 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T419961)', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json
  • 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie
  • 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003"
  • 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003"
  • 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie
  • 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie
  • 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie
  • 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 T424864', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json
  • 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary T424864', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json
  • 07:55 awight: EU morning deployment was fun
  • 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 (T419961)', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json
  • 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - T424864
  • 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 T424864', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json
  • 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T424864
  • 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # T414645
  • 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # T425378
  • 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie
  • 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie
  • 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie
  • 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie
  • 07:11 awight@deploy1003: Finished scap sync-world: Backport for zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165) (duration: 06m 43s)
  • 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage
  • 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment
  • 07:06 awight@deploy1003: awight, 1f616emo: Backport for zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:05 awight@deploy1003: Started scap sync-world: Backport for zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)
  • 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage
  • 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org
  • 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage
  • 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie
  • 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org
  • 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage
  • 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie
  • 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie
  • 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie
  • 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie
  • 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie
  • 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie
  • 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie
  • 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie
  • 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie
  • 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie
  • 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
  • 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
  • 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003"
  • 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003
  • 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003
  • 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003"
  • 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie
  • 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie
  • 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie
  • 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie
  • 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie
  • 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s)
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s)
  • 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002"
  • 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002"
  • 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Catalan Wikinews (T421796) (duration: 06m 50s)
  • 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 00:10 ladsgroup@deploy1003: ladsgroup: Backport for Close Catalan Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Catalan Wikinews (T421796)

2026-05-04

  • 23:48 ladsgroup@deploy1003: ladsgroup: Backport for Close Bosnian Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Bosnian Wikinews (T421796)
  • 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Hebrew Wikinews (T421796) (duration: 06m 45s)
  • 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment
  • 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for Close Hebrew Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Hebrew Wikinews (T421796)
  • 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
  • 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
  • 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s)
  • 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging
  • 21:20 cjming@deploy1003: Finished scap sync-world: Backport for Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468) (duration: 11m 20s)
  • 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment
  • 21:10 cjming@deploy1003: cjming, neriah: Backport for Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:09 cjming@deploy1003: Started scap sync-world: Backport for Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)
  • 20:38 cjming@deploy1003: Finished scap sync-world: Backport for Revert^2 "Use js promise for email confirmation banner" (duration: 22m 19s)
  • 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment
  • 20:18 cjming@deploy1003: mmartorana, cjming: Backport for Revert^2 "Use js promise for email confirmation banner" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:16 cjming@deploy1003: Started scap sync-world: Backport for Revert^2 "Use js promise for email confirmation banner"
  • 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for Enable the reading list beta feature survey on all wikipedias (T421776) (duration: 07m 21s)
  • 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie
  • 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment
  • 20:05 toyofuku@deploy1003: toyofuku: Backport for Enable the reading list beta feature survey on all wikipedias (T421776) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for Enable the reading list beta feature survey on all wikipedias (T421776)
  • 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors
  • 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors
  • 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003"
  • 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003"
  • 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage
  • 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage
  • 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - T424852
  • 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - T424852
  • 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting
  • 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005
  • 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005
  • 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie
  • 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - T424852
  • 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - T424852
  • 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Limburgish Wikinews (T421796) (duration: 06m 16s)
  • 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 18:55 ladsgroup@deploy1003: ladsgroup: Backport for Close Limburgish Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Limburgish Wikinews (T421796)
  • 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Albanian Wikinews (T421796) (duration: 09m 17s)
  • 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 18:23 ladsgroup@deploy1003: ladsgroup: Backport for Close Albanian Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Albanian Wikinews (T421796)
  • 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s)
  • 18:11 dancy@deploy1003: dancy: Rolling back deployment
  • 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:09 dancy@deploy1003: Started scap sync-world: testing
  • 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts
  • 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s)
  • 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Greek Wikinews (T421796) (duration: 06m 19s)
  • 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 16:00 ladsgroup@deploy1003: ladsgroup: Backport for Close Greek Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Greek Wikinews (T421796)
  • 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json
  • 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json
  • 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for Make errorpages responsive (duration: 06m 59s)
  • 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json
  • 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment
  • 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement
  • 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for Make errorpages responsive synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
  • 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
  • 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for Make errorpages responsive
  • 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json
  • 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json
  • 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org
  • 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors
  • 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors
  • 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json
  • 15:10 papaul: ongoing switch refresh in ULSFO
  • 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for Close Gun Wikinews (T421796) (duration: 06m 45s)
  • 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json
  • 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment
  • 15:00 ladsgroup@deploy1003: ladsgroup: Backport for Close Gun Wikinews (T421796) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for Close Gun Wikinews (T421796)
  • 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie
  • 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json
  • 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json
  • 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts
  • 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts
  • 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
  • 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
  • 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 (T419635)', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json
  • 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh
  • 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh
  • 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh
  • 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001
  • 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001
  • 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001
  • 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003"
  • 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003"
  • 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T419961)', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json
  • 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox
  • 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001
  • 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie
  • 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json
  • 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh T408892]
  • 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, T408892]
  • 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, T408892]
  • 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for ArticleGuidance: enable on simple english (T425351) (duration: 06m 22s)
  • 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors
  • 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002"
  • 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:55 sbisson@deploy1003: sbisson: Continuing with deployment
  • 13:55 sbisson@deploy1003: sbisson: Backport for ArticleGuidance: enable on simple english (T425351) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:54 dcausse: T425301: stopping writes again on cloudelastic, cluster unstable
  • 13:53 sbisson@deploy1003: Started scap sync-world: Backport for ArticleGuidance: enable on simple english (T425351)
  • 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org
  • 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json
  • 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165) (duration: 07m 30s)
  • 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment
  • 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:43 sbisson@deploy1003: Started scap sync-world: Backport for zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)
  • 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T419961)', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json
  • 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 (T419961)', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json
  • 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T419961)', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json
  • 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json
  • 13:13 moritzm: installing jaraco.context security updates
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm
  • 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json
  • 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T419961)', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json
  • 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:59 dcausse: T425301: resuming writes on cloudelastic
  • 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 (T419961)', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json
  • 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T419961)', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json
  • 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage
  • 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage
  • 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json
  • 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json
  • 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T419961)', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json
  • 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 (T419961)', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json
  • 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T419961)', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json
  • 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json
  • 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm
  • 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002"
  • 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002"
  • 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors
  • 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors
  • 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002"
  • 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json
  • 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002"
  • 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet
  • 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet
  • 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm
  • 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T419961)', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json
  • 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 (T419961)', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json
  • 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T419961)', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json
  • 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie
  • 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage
  • 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage
  • 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json
  • 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json
  • 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T419961)', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json
  • 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance
  • 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 (T419961)', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json
  • 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T419961)', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json
  • 10:48 moritzm: installing bash updates from trixie point release
  • 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json
  • 10:42 moritzm: installing postgresql-17 security updates
  • 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie
  • 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie
  • 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm
  • 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002"
  • 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002"
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors
  • 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002"
  • 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002"
  • 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T419961)', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json
  • 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet
  • 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 (T419961)', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json
  • 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T419961)', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json
  • 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance
  • 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage
  • 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage
  • 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json
  • 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie
  • 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie
  • 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie
  • 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie
  • 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org
  • 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T419961)', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json
  • 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org
  • 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 (T419961)', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json
  • 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T419961)', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json
  • 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie
  • 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json
  • 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json
  • 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events
  • 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events
  • 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events
  • 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T419961)', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json
  • 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 (T419961)', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json
  • 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T419961)', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json
  • 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie
  • 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json
  • 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet
  • 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet
  • 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet
  • 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet
  • 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet
  • 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet
  • 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet
  • 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet
  • 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet
  • 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet
  • 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json
  • 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie
  • 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts
  • 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T419961)', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
  • 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 (T419961)', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json
  • 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
  • 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage
  • 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage
  • 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
  • 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
  • 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync
  • 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for Add sva to wmgExtraLanguageNames (T407106) (duration: 07m 58s)
  • 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync
  • 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync
  • 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts
  • 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment
  • 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for Add sva to wmgExtraLanguageNames (T407106) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie
  • 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for Add sva to wmgExtraLanguageNames (T407106)
  • 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie
  • 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
  • 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
  • 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie
  • 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie
  • 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie
  • 07:44 dcausse: T425301: stopping writes on cloudelastic
  • 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet
  • 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003"
  • 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie
  • 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie
  • 07:38 moritzm: installing Linux 6.12.85 on trixie hosts
  • 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet
  • 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
  • 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet
  • 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox
  • 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet
  • 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org
  • 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org
  • 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie
  • 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie
  • 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie
  • 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie
  • 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie
  • 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie
  • 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage
  • 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage
  • 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage
  • 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage
  • 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie
  • 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage
  • 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage
  • 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie
  • 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie
  • 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie
  • 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie
  • 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 T424792
  • 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie
  • 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie
  • 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie
  • 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie
  • 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie
  • 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie
  • 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie
  • 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s)
  • 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2026-05-03

2026-05-02

  • 23:32 zabe@deploy1003: Finished scap sync-world: Backport for Uninstall DynamicPageList from wikis it's not used on (T425202) (duration: 06m 41s)
  • 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment
  • 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for Uninstall DynamicPageList from wikis it's not used on (T425202) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:26 zabe@deploy1003: Started scap sync-world: Backport for Uninstall DynamicPageList from wikis it's not used on (T425202)
  • 23:22 zabe@deploy1003: Finished scap sync-world: Backport for Uninstall DynamicPageList from officewiki (T425154) (duration: 07m 27s)
  • 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment
  • 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for Uninstall DynamicPageList from officewiki (T425154) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:15 zabe@deploy1003: Started scap sync-world: Backport for Uninstall DynamicPageList from officewiki (T425154)
  • 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie
  • 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie
  • 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie
  • 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage
  • 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage
  • 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie
  • 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie
  • 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie
  • 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie
  • 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie
  • 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage
  • 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage
  • 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage
  • 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage
  • 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage
  • 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage
  • 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage
  • 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage
  • 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage
  • 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie
  • 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie
  • 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie
  • 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage
  • 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie
  • 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie
  • 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie
  • 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie
  • 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie
  • 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie
  • 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie
  • 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie
  • 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage
  • 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage
  • 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage
  • 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage
  • 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage
  • 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage
  • 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie
  • 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie
  • 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie
  • 12:02 samtar@deploy1003: Finished scap sync-world: Backport for Watchlist star: Revert popover/dialog changes (T425185) (duration: 13m 06s)
  • 11:57 samtar@deploy1003: samtar: Continuing with deployment
  • 11:50 samtar@deploy1003: samtar: Backport for Watchlist star: Revert popover/dialog changes (T425185) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:49 samtar@deploy1003: Started scap sync-world: Backport for Watchlist star: Revert popover/dialog changes (T425185)
  • 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie
  • 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie
  • 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie
  • 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s)
  • 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage
  • 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage
  • 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage
  • 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage
  • 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage
  • 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage
  • 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie
  • 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie
  • 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie
  • 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie
  • 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie
  • 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie
  • 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage
  • 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage
  • 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage
  • 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage
  • 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage
  • 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie
  • 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie
  • 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie
  • 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie
  • 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie
  • 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie
  • 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"

2026-05-01

  • 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage
  • 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage
  • 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage
  • 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage
  • 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage
  • 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage
  • 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie
  • 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie
  • 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie
  • 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie
  • 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie
  • 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie
  • 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage
  • 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage
  • 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage
  • 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage
  • 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage
  • 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage
  • 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie
  • 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie
  • 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie
  • 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374
  • 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374
  • 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373
  • 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373
  • 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372
  • 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372
  • 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371
  • 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371
  • 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370
  • 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370
  • 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369
  • 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369
  • 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368
  • 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368
  • 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367
  • 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367
  • 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366
  • 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366
  • 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365
  • 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365
  • 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364
  • 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364
  • 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363
  • 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363
  • 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362
  • 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362
  • 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361
  • 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361
  • 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360
  • 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360
  • 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359
  • 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359
  • 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358
  • 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358
  • 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357
  • 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357
  • 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002"
  • 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002"
  • 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie
  • 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for Enable wgTrackMediaRequestProvenance on wikidata.org (T414338), Enable wgTrackMediaRequestProvenance on Commons (T414338) (duration: 15m 27s)
  • 20:02 krinkle@deploy1003: krinkle: Continuing with deployment
  • 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
  • 19:52 krinkle@deploy1003: krinkle: Backport for Enable wgTrackMediaRequestProvenance on wikidata.org (T414338), Enable wgTrackMediaRequestProvenance on Commons (T414338) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:51 krinkle@deploy1003: Started scap sync-world: Backport for Enable wgTrackMediaRequestProvenance on wikidata.org (T414338), Enable wgTrackMediaRequestProvenance on Commons (T414338)
  • 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
  • 19:40 dancy@deploy1003: Finished scap sync-world: testing T317405 (duration: 03m 23s)
  • 19:37 dancy@deploy1003: Started scap sync-world: testing T317405
  • 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts
  • 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s)
  • 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts
  • 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002
  • 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002
  • 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002
  • 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003"
  • 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003"
  • 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox
  • 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002
  • 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie
  • 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie
  • 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
  • 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
  • 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003
  • 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003
  • 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003
  • 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003"
  • 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003"
  • 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox
  • 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003
  • 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie
  • 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie
  • 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet
  • 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet
  • 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet
  • 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts
  • 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s)
  • 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage
  • 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage
  • 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004
  • 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004
  • 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004
  • 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003"
  • 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003"
  • 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts
  • 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s)
  • 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox
  • 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004
  • 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie
  • 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 13:24 _Gerges: WikiMonitor setup
  • 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080
  • 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078
  • 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079
  • 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077
  • 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080
  • 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079
  • 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078
  • 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077
  • 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003"
  • 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003"
  • 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox
  • 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 09:57 samtar@deploy1003: Finished scap sync-world: Backport for Switch watchstar from Popover to Dialog (T417847) (duration: 06m 49s)
  • 09:53 samtar@deploy1003: samtar: Continuing with deployment
  • 09:52 samtar@deploy1003: samtar: Backport for Switch watchstar from Popover to Dialog (T417847) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:50 samtar@deploy1003: Started scap sync-world: Backport for Switch watchstar from Popover to Dialog (T417847)
  • 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for Update the interwiki cache (T239173) (duration: 06m 05s)
  • 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for Update the interwiki cache (T239173)
  • 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply
  • 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s)
  • 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:16 zabe@deploy1003: Finished scap sync-world: Backport for Add script to fix fr_deleted drifts (T424553) (duration: 07m 05s)
  • 00:13 zabe@deploy1003: zabe: Continuing with deployment
  • 00:11 zabe@deploy1003: zabe: Backport for Add script to fix fr_deleted drifts (T424553) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:09 zabe@deploy1003: Started scap sync-world: Backport for Add script to fix fr_deleted drifts (T424553)

Other archives

See Server Admin Log/Archives.