Jump to content

Server Admin Log

From Wikitech
(Redirected from Server admin log)

2025-05-17

  • 17:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2018.codfw.wmnet with OS bookworm
  • 17:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
  • 17:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['pc2018']
  • 17:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest2004']
  • 17:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest2004']
  • 17:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2018']
  • 17:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2010-dev']
  • 17:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2048']
  • 17:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2047']
  • 17:08 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2048']
  • 17:08 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2047']
  • 17:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2010-dev']
  • 16:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host pc2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:25 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 15:25 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 15:24 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 15:24 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:17 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2004
  • 15:17 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2004
  • 15:17 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2018
  • 15:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host pc2018
  • 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2048
  • 15:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host es2048
  • 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2047
  • 15:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host es2047
  • 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol2010-dev
  • 15:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol2010-dev
  • 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2005
  • 15:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2005
  • 07:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.

2025-05-16

  • 23:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest2003']
  • 23:57 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest2003']
  • 23:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest2003']
  • 23:57 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest2003']
  • 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['thanos-be2006']
  • 23:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-be2006']
  • 23:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['apus-be2004']
  • 23:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['apus-be2004']
  • 23:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
  • 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2009
  • 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2008
  • 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2007
  • 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-be2004
  • 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2006
  • 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
  • 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2009
  • 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2008
  • 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2007
  • 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2006
  • 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host apus-be2004
  • 23:09 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2047 to codfw - jhancock@cumin2002"
  • 23:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2047 to codfw - jhancock@cumin2002"
  • 23:06 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 23:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2006 to codfw - jhancock@cumin2002"
  • 23:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2006 to codfw - jhancock@cumin2002"
  • 22:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:16 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 21:16 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 21:16 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 21:16 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 21:16 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 21:00 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 20:52 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 20:42 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 20:41 sbassett@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 20:31 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 20:26 herron: titan100[12] systemctl restart thanos-query
  • 19:44 cstone: civicrm upgraded from 2ae29ec9 to 5b155eaa
  • 19:06 robh@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 19:06 robh@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 19:05 robh@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 19:05 robh@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 19:04 robh@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 19:04 robh@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 18:46 robh@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 18:34 robh@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 18:34 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 18:26 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 18:25 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 18:25 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 18:25 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 18:13 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 18:12 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 18:12 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 18:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1247.eqiad.wmnet with reason: To be set up in a few days
  • 18:00 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 17:59 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
  • 17:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:49 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1238.eqiad.wmnet onto db1247.eqiad.wmnet
  • 17:49 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1238 gradually with 4 steps - Pool db1238.eqiad.wmnet in after cloning
  • 17:03 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1238 gradually with 4 steps - Pool db1238.eqiad.wmnet in after cloning
  • 16:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1177.eqiad.wmnet with OS bullseye
  • 15:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1177.eqiad.wmnet with reason: host reimage
  • 15:38 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1177.eqiad.wmnet with reason: host reimage
  • 15:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
  • 15:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
  • 15:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 14:22 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-be1004.eqiad.wmnet with OS bookworm
  • 14:22 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
  • 14:22 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
  • 14:11 root@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-backup1002.eqiad.wmnet: Renew puppet certificate - root@cumin1002
  • 14:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
  • 14:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
  • 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db2166 and db1177 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76270 and previous config saved to /var/cache/conftool/dbconfig/20250516-135438-ladsgroup.json
  • 13:52 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1238.eqiad.wmnet onto db1247.eqiad.wmnet
  • 13:50 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1188 gradually with 4 steps - Pooling back in
  • 13:50 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1238.eqiad.wmnet onto db1247.eqiad.wmnet
  • 13:47 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1238 - Depool db1238.eqiad.wmnet to then clone it to db1247.eqiad.wmnet - fceratto@cumin1002
  • 13:47 fceratto@cumin1002: START - Cookbook sre.mysql.depool db1238 - Depool db1238.eqiad.wmnet to then clone it to db1247.eqiad.wmnet - fceratto@cumin1002
  • 13:47 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1238.eqiad.wmnet onto db1247.eqiad.wmnet
  • 13:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
  • 13:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
  • 13:21 hashar@deploy1003: Finished deploy [gerrit/gerrit@fcb893c]: wm-zuul-status: do not popup when navigating changes - T394485 (duration: 00m 12s)
  • 13:21 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-be1004.eqiad.wmnet with reason: host reimage
  • 13:21 hashar@deploy1003: Started deploy [gerrit/gerrit@fcb893c]: wm-zuul-status: do not popup when navigating changes - T394485
  • 13:17 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-be1004.eqiad.wmnet with reason: host reimage
  • 13:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
  • 13:05 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1188 gradually with 4 steps - Pooling back in
  • 13:03 joal@deploy1003: Finished deploy [airflow-dags/analytics@4ebb376]: Fix gobblin artifacts (after pulling code...) (duration: 01m 01s)
  • 13:02 joal@deploy1003: Started deploy [airflow-dags/analytics@4ebb376]: Fix gobblin artifacts (after pulling code...)
  • 13:02 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@4ebb376]: Fix gobblin artifacts (duration: 00m 16s)
  • 13:01 joal@deploy1003: Started deploy [airflow-dags/analytics_test@4ebb376]: Fix gobblin artifacts
  • 13:00 joal@deploy1003: Finished deploy [airflow-dags/analytics@4351188]: Fix gobblin artifacts (duration: 00m 07s)
  • 13:00 joal@deploy1003: Started deploy [airflow-dags/analytics@4351188]: Fix gobblin artifacts
  • 12:52 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
  • 12:46 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-be1004.eqiad.wmnet with OS bookworm
  • 12:43 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 12:42 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 12:35 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1188 gradually with 4 steps - Pooling back in
  • 12:33 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1188 gradually with 4 steps - Pooling back in
  • 12:32 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
  • 12:28 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-be1004.eqiad.wmnet with OS bookworm
  • 12:20 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
  • 12:01 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 12:01 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 11:59 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1188.eqiad.wmnet onto db1246.eqiad.wmnet
  • 11:42 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1188 - Depool db1188.eqiad.wmnet to then clone it to db1246.eqiad.wmnet - fceratto@cumin1002
  • 11:42 fceratto@cumin1002: START - Cookbook sre.mysql.depool db1188 - Depool db1188.eqiad.wmnet to then clone it to db1246.eqiad.wmnet - fceratto@cumin1002
  • 11:42 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1188.eqiad.wmnet onto db1246.eqiad.wmnet
  • 11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db2242 from x3, remove db2154 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76262 and previous config saved to /var/cache/conftool/dbconfig/20250516-112345-ladsgroup.json
  • 11:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1214 from x3, remove db1257 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76261 and previous config saved to /var/cache/conftool/dbconfig/20250516-111952-ladsgroup.json
  • 10:44 joal@deploy1003: Finished deploy [airflow-dags/analytics@4351188]: Deploying analytics with artifact-cache warming using main folder (duration: 00m 49s)
  • 10:43 joal@deploy1003: Started deploy [airflow-dags/analytics@4351188]: Deploying analytics with artifact-cache warming using main folder
  • 10:28 joal@deploy1003: Finished deploy [airflow-dags/main@4351188]: Deploying main instead of analytics subfolder (duration: 01m 51s)
  • 10:26 joal@deploy1003: Started deploy [airflow-dags/main@4351188]: Deploying main instead of analytics subfolder
  • 10:22 jynus: upgrading db1239 MariaDB server T394487
  • 10:16 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet,ms-backup1002.eqiad.wmnet with reason: Upgrade and test
  • 09:51 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@4351188]: Fix slf4j artifact sync (duration: 00m 12s)
  • 09:51 joal@deploy1003: Started deploy [airflow-dags/analytics_test@4351188]: Fix slf4j artifact sync
  • 09:49 btullis@deploy1003: Finished deploy [airflow-dags/analytics_test@c2d660e]: Test (duration: 24m 55s)
  • 09:27 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 09:27 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 09:24 btullis@deploy1003: Started deploy [airflow-dags/analytics_test@c2d660e]: Test
  • 09:19 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@c2d660e]: Deploying artifacts for analytics_test manually (duration: 21m 38s)
  • 08:58 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@c2d660e]: Deploying artifacts for analytics_test manually
  • 08:39 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@0b9e2aa]: Deploying artifacts for analytics_test manually (duration: 00m 51s)
  • 08:38 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@0b9e2aa]: Deploying artifacts for analytics_test manually
  • 08:28 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:27 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:26 moritzm: uploaded httpbb 0.0.5-1+deb12u1 to apt.wikimedia.org T393711 T389380
  • 08:14 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76260 and previous config saved to /var/cache/conftool/dbconfig/20250516-081428-root.json
  • 08:08 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ODimitrijevic out of all services on: 1426 hosts
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76259 and previous config saved to /var/cache/conftool/dbconfig/20250516-080752-root.json
  • 08:07 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ODimitrijevic out of all services on: 945 hosts
  • 07:59 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76258 and previous config saved to /var/cache/conftool/dbconfig/20250516-075923-root.json
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76257 and previous config saved to /var/cache/conftool/dbconfig/20250516-075246-root.json
  • 07:50 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on build2002.codfw.wmnet with reason: busy JDK build
  • 07:44 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76256 and previous config saved to /var/cache/conftool/dbconfig/20250516-074417-root.json
  • 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76255 and previous config saved to /var/cache/conftool/dbconfig/20250516-073741-root.json
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76253 and previous config saved to /var/cache/conftool/dbconfig/20250516-072911-root.json
  • 07:22 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76251 and previous config saved to /var/cache/conftool/dbconfig/20250516-072235-root.json
  • 07:20 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
  • 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76250 and previous config saved to /var/cache/conftool/dbconfig/20250516-071406-root.json
  • 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76249 and previous config saved to /var/cache/conftool/dbconfig/20250516-070730-root.json
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76248 and previous config saved to /var/cache/conftool/dbconfig/20250516-065901-root.json
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76247 and previous config saved to /var/cache/conftool/dbconfig/20250516-065224-root.json
  • 06:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76246 and previous config saved to /var/cache/conftool/dbconfig/20250516-064356-root.json
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76245 and previous config saved to /var/cache/conftool/dbconfig/20250516-064153-root.json
  • 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76244 and previous config saved to /var/cache/conftool/dbconfig/20250516-063719-root.json
  • 06:37 marostegui@dns1006: END - running authdns-update
  • 06:36 marostegui@dns1006: START - running authdns-update
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76243 and previous config saved to /var/cache/conftool/dbconfig/20250516-063009-root.json
  • 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76242 and previous config saved to /var/cache/conftool/dbconfig/20250516-062851-root.json
  • 06:27 moritzm: uploaded openjdk-21 21.0.7+6-1~deb12u1 to component/jdk21 for bookworm (latest Java 21 security release)
  • 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76241 and previous config saved to /var/cache/conftool/dbconfig/20250516-062648-root.json
  • 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76240 and previous config saved to /var/cache/conftool/dbconfig/20250516-062213-root.json
  • 06:18 moritzm: installing Java 21 security updates on idp-test
  • 06:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2046.codfw.wmnet,es1044.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2046 es1044 T391921', diff saved to https://phabricator.wikimedia.org/P76239 and previous config saved to /var/cache/conftool/dbconfig/20250516-061649-marostegui.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76238 and previous config saved to /var/cache/conftool/dbconfig/20250516-061503-root.json
  • 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76237 and previous config saved to /var/cache/conftool/dbconfig/20250516-061142-root.json
  • 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1045 and es2045 to es5 masters T391921', diff saved to https://phabricator.wikimedia.org/P76236 and previous config saved to /var/cache/conftool/dbconfig/20250516-060652-marostegui.json
  • 06:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 06:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76235 and previous config saved to /var/cache/conftool/dbconfig/20250516-055958-root.json
  • 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76234 and previous config saved to /var/cache/conftool/dbconfig/20250516-055637-root.json
  • 05:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1086 to cirrussearch1086
  • 05:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1086
  • 05:51 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1086
  • 05:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1086 on all recursors
  • 05:51 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1086 on all recursors
  • 05:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1086 to cirrussearch1086 - ryankemper@cumin2002"
  • 05:51 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1086 to cirrussearch1086 - ryankemper@cumin2002"
  • 05:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1085 to cirrussearch1085
  • 05:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1085
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76233 and previous config saved to /var/cache/conftool/dbconfig/20250516-054452-root.json
  • 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76232 and previous config saved to /var/cache/conftool/dbconfig/20250516-054131-root.json
  • 05:35 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1085
  • 05:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1085 on all recursors
  • 05:35 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1085 on all recursors
  • 05:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1085 to cirrussearch1085 - ryankemper@cumin2002"
  • 05:33 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 05:33 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1085 to cirrussearch1085 - ryankemper@cumin2002"
  • 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76231 and previous config saved to /var/cache/conftool/dbconfig/20250516-052947-root.json
  • 05:29 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1086 to cirrussearch1086
  • 05:28 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 05:28 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1085 to cirrussearch1085
  • 05:27 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1079.eqiad.wmnet with OS bullseye
  • 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76230 and previous config saved to /var/cache/conftool/dbconfig/20250516-052625-root.json
  • 05:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1078.eqiad.wmnet with OS bullseye
  • 05:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76229 and previous config saved to /var/cache/conftool/dbconfig/20250516-051442-root.json
  • 05:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1079.eqiad.wmnet with reason: host reimage
  • 05:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2044.codfw.wmnet,es1046.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1046 es2044 T391921', diff saved to https://phabricator.wikimedia.org/P76228 and previous config saved to /var/cache/conftool/dbconfig/20250516-050707-marostegui.json
  • 05:04 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1079.eqiad.wmnet with reason: host reimage
  • 05:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1078.eqiad.wmnet with reason: host reimage
  • 05:01 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 05:01 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 04:56 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1078.eqiad.wmnet with reason: host reimage
  • 04:49 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1079
  • 04:49 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1079
  • 04:49 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1079.eqiad.wmnet with OS bullseye
  • 04:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1078
  • 04:42 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1078
  • 04:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1078.eqiad.wmnet with OS bullseye
  • 04:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1079 to cirrussearch1079
  • 04:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1079
  • 04:29 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1079
  • 04:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1079 on all recursors
  • 04:29 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1079 on all recursors
  • 04:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1079 to cirrussearch1079 - ryankemper@cumin2002"
  • 04:26 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1079 to cirrussearch1079 - ryankemper@cumin2002"
  • 04:11 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1078 to cirrussearch1078
  • 04:10 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1078
  • 04:10 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 04:10 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1079 to cirrussearch1079
  • 04:02 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1078
  • 04:02 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1078 on all recursors
  • 04:02 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1078 on all recursors
  • 04:02 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:02 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1078 to cirrussearch1078 - ryankemper@cumin2002"
  • 03:58 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1078 to cirrussearch1078 - ryankemper@cumin2002"
  • 03:49 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 03:49 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1078 to cirrussearch1078
  • 03:27 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1077.eqiad.wmnet with OS bullseye
  • 03:20 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1076.eqiad.wmnet with OS bullseye
  • 03:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1077.eqiad.wmnet with reason: host reimage
  • 02:58 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1077.eqiad.wmnet with reason: host reimage
  • 02:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1076.eqiad.wmnet with reason: host reimage
  • 02:51 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1076.eqiad.wmnet with reason: host reimage
  • 02:44 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1077
  • 02:44 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1077
  • 02:43 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1077.eqiad.wmnet with OS bullseye
  • 02:37 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1076
  • 02:37 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1076
  • 02:37 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1076.eqiad.wmnet with OS bullseye
  • 02:34 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1077 to cirrussearch1077
  • 02:34 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1077
  • 02:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1076 to cirrussearch1076
  • 02:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1076
  • 02:32 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1077
  • 02:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1077 on all recursors
  • 02:32 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1077 on all recursors
  • 02:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:30 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1076
  • 02:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1076 on all recursors
  • 02:30 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1076 on all recursors
  • 02:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1076 to cirrussearch1076 - ryankemper@cumin2002"
  • 02:30 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1076 to cirrussearch1076 - ryankemper@cumin2002"
  • 02:29 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 02:23 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 02:23 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1077 to cirrussearch1077
  • 02:23 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1076 to cirrussearch1076
  • 01:27 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2002.codfw.wmnet with OS bullseye
  • 01:16 brett: Restarting tomcat10 on idp1004
  • 01:06 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3066.*
  • 01:04 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3074.*
  • 00:49 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5031.*
  • 00:48 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5031
  • 00:18 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
  • 00:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage

2025-05-15

  • 23:59 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS bullseye
  • 22:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp503[1-2].eqsin.wmnet} and A:cp - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f818c5f7df0>>
  • 22:27 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 22:23 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 22:11 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_eqsin - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f87783bdac0>>
  • 22:00 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2002.codfw.wmnet with OS bullseye
  • 21:55 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp503[1-2].eqsin.wmnet} and A:cp - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f818c5f7df0>>
  • 21:40 brett@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-upload_eqsin - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7fc4014eef10>>
  • 21:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
  • 21:34 dancy@deploy1003: Installation of scap version "4.169.0" completed for 2 hosts
  • 21:32 dancy@deploy1003: Installing scap version "4.169.0" for 2 host(s)
  • 21:31 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
  • 21:20 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 21:19 jdrewniak@deploy1003: Finished scap sync-world: Backport for styles: Set override also to former value of `line-height-small` token (T389900 T394305) (duration: 18m 45s)
  • 21:18 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 21:16 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS bullseye
  • 21:12 jdrewniak@deploy1003: jdrewniak: Continuing with sync
  • 21:06 jdrewniak@deploy1003: jdrewniak: Backport for styles: Set override also to former value of `line-height-small` token (T389900 T394305) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:00 jdrewniak@deploy1003: Started scap sync-world: Backport for styles: Set override also to former value of `line-height-small` token (T389900 T394305)
  • 20:53 thcipriani@deploy1003: Finished scap sync-world: Backport for frwiki: Enable the NewUserMessage extension (T382199) (duration: 14m 44s)
  • 20:47 thcipriani@deploy1003: thcipriani, wpld: Continuing with sync
  • 20:44 thcipriani@deploy1003: thcipriani, wpld: Backport for frwiki: Enable the NewUserMessage extension (T382199) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:38 thcipriani@deploy1003: Started scap sync-world: Backport for frwiki: Enable the NewUserMessage extension (T382199)
  • 20:35 thcipriani@deploy1003: Finished scap sync-world: Backport for Design Research participant recruitment survey on eswiki: Deploy (T394315) (duration: 13m 46s)
  • 20:33 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2002.codfw.wmnet with OS bullseye
  • 20:28 thcipriani@deploy1003: thcipriani, dani: Continuing with sync
  • 20:27 thcipriani@deploy1003: thcipriani, dani: Backport for Design Research participant recruitment survey on eswiki: Deploy (T394315) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:21 thcipriani@deploy1003: Started scap sync-world: Backport for Design Research participant recruitment survey on eswiki: Deploy (T394315)
  • 20:17 bvibber@deploy1003: Finished scap sync-world: Backport for Enable Chart extension on phase 2 wikis (T393518) (duration: 13m 15s)
  • 20:13 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
  • 20:10 bvibber@deploy1003: bvibber: Continuing with sync
  • 20:09 bvibber@deploy1003: bvibber: Backport for Enable Chart extension on phase 2 wikis (T393518) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:09 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
  • 20:04 bvibber@deploy1003: Started scap sync-world: Backport for Enable Chart extension on phase 2 wikis (T393518)
  • 19:54 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS bullseye
  • 19:08 dancy@deploy1003: Installation of scap version "4.168.1" completed for 2 hosts
  • 19:06 dancy@deploy1003: Installing scap version "4.168.1" for 2 host(s)
  • 18:55 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqsin - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7fc4014eef10>>
  • 18:55 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqsin - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f87783bdac0>>
  • 18:53 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7fd386623c70>>
  • 18:49 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f58099c1b50>>
  • 18:41 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:40 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:36 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:36 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:35 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:34 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:46 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw
  • 17:24 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw
  • 17:23 topranks: add remaining bgp peerings from codfw row A-D switches to new spines in rows E/F T394021
  • 17:12 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:04 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad
  • 16:40 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad
  • 16:36 sbassett: helmfile [staging] HALTED helmfile.d/services/miscweb: apply
  • 16:35 topranks: add bgp peerings from codfw row A-D switches to new spines in rows E/F T394021
  • 16:27 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:27 sbassett@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:17 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:16 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3073.esams.wmnet
  • 16:16 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3081.esams.wmnet
  • 16:14 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3081.esams.wmnet
  • 16:14 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3073.esams.wmnet
  • 16:13 logmsgbot: mszabo Deployed security patch for T394393
  • 16:07 logmsgbot: mszabo Deployed security patch for T394393
  • 16:07 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
  • 15:56 mszabo: Starting patch deployment for T394393
  • 15:55 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3073.esams.wmnet
  • 15:55 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3081.esams.wmnet
  • 15:50 dancy@deploy1003: Installation of scap version "4.168.0" completed for 2 hosts
  • 15:49 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
  • 15:48 dancy@deploy1003: Installing scap version "4.168.0" for 2 host(s)
  • 15:45 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f58099c1b50>>
  • 15:45 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7fd386623c70>>
  • 15:40 fabfur: reenabling puppet on A:cp (T393927)
  • 15:32 brett@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-upload_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f1e600881c0>>
  • 15:32 brett@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-text_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f10f2f03a00>>
  • 15:31 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f1e600881c0>>
  • 15:31 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f10f2f03a00>>
  • 15:21 jnuche@deploy1003: Finished scap sync-world: Backport for Revert "Make weighted tags no longer be WMF-specific" (duration: 11m 45s)
  • 15:15 jnuche@deploy1003: dcausse, jnuche: Continuing with sync
  • 15:14 jnuche@deploy1003: dcausse, jnuche: Backport for Revert "Make weighted tags no longer be WMF-specific" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:10 jnuche@deploy1003: Started scap sync-world: Backport for Revert "Make weighted tags no longer be WMF-specific"
  • 15:02 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3081.esams.wmnet
  • 15:02 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3073.esams.wmnet
  • 15:02 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1071.eqiad.wmnet with OS bookworm
  • 15:02 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 14:58 fabfur: disable puppet on A:cp to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1144620 (T393927)
  • 14:58 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 14:58 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1073.eqiad.wmnet with OS bookworm
  • 14:58 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 14:54 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 14:40 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1071.eqiad.wmnet with reason: host reimage
  • 14:37 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1073.eqiad.wmnet with reason: host reimage
  • 14:37 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1071.eqiad.wmnet with reason: host reimage
  • 14:33 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1073.eqiad.wmnet with reason: host reimage
  • 14:22 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1071.eqiad.wmnet with OS bookworm
  • 14:21 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:18 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1073.eqiad.wmnet with OS bookworm
  • 14:17 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:13 sukhe: finished running lowering of dyna/upload TTL to 240s: T394312
  • 14:13 sukhe@dns1004: END - running authdns-update
  • 14:12 sukhe@dns1004: START - running authdns-update
  • 14:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:08 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1071.eqiad.wmnet with OS bookworm
  • 14:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:04 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1073.eqiad.wmnet with OS bookworm
  • 14:04 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1074.eqiad.wmnet with OS bookworm
  • 14:04 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 14:03 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:01 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:01 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 13:58 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1072.eqiad.wmnet with OS bookworm
  • 13:58 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 13:58 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 13:57 moritzm: installing openjdk-8 security updates
  • 13:46 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1073.eqiad.wmnet with OS bookworm
  • 13:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1073.eqiad.wmnet with OS bookworm
  • 13:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1074.eqiad.wmnet with reason: host reimage
  • 13:40 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1072.eqiad.wmnet with reason: host reimage
  • 13:38 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:37 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [Growth] eswiki: Bump mentorship to 70% of users (T392869) (duration: 20m 39s)
  • 13:36 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1074.eqiad.wmnet with reason: host reimage
  • 13:36 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1072.eqiad.wmnet with reason: host reimage
  • 13:36 aokoth@dns1004: END - running authdns-update
  • 13:34 aokoth@dns1004: START - running authdns-update
  • 13:30 lucaswerkmeister-wmde@deploy1003: urbanecm, lucaswerkmeister-wmde: Continuing with sync
  • 13:26 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1073.eqiad.wmnet with OS bookworm
  • 13:22 lucaswerkmeister-wmde@deploy1003: urbanecm, lucaswerkmeister-wmde: Backport for [Growth] eswiki: Bump mentorship to 70% of users (T392869) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:22 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1071.eqiad.wmnet with OS bookworm
  • 13:20 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1074.eqiad.wmnet with OS bookworm
  • 13:19 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1072.eqiad.wmnet with OS bookworm
  • 13:16 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [Growth] eswiki: Bump mentorship to 70% of users (T392869)
  • 13:14 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
  • 12:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generic update - jhancock@cumin2002"
  • 12:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generic update - jhancock@cumin2002"
  • 12:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-be1004.eqiad.wmnet with OS bookworm
  • 12:46 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 12:25 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:22 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:19 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:19 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:16 fabfur@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Revert last template change - fabfur@cumin1002"
  • 12:16 fabfur@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Revert last template change - fabfur@cumin1002
  • 12:16 fabfur@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Revert last template change - fabfur@cumin1002
  • 12:16 fabfur@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Revert last template change - fabfur@cumin1002"
  • 12:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
  • 12:10 dreamyjazz@deploy1003: Finished scap sync-world: Backport for CustomBlockedDomainStorage::validateDomain: Undo hard-deprecation whilst prod callers exist (T394267) (duration: 13m 30s)
  • 12:10 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:03 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 12:03 dreamyjazz@deploy1003: dreamyjazz: Backport for CustomBlockedDomainStorage::validateDomain: Undo hard-deprecation whilst prod callers exist (T394267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:03 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 12:03 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 12:02 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 12:02 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:57 dreamyjazz@deploy1003: Started scap sync-world: Backport for CustomBlockedDomainStorage::validateDomain: Undo hard-deprecation whilst prod callers exist (T394267)
  • 11:45 sukhe: removing downtime on A:ncredir
  • 11:44 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 14 hosts
  • 11:44 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for 14 hosts
  • 11:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
  • 11:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
  • 11:31 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:28 sukhe@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 14 hosts with reason: monitoring alerts
  • 11:21 sukhe: sudo cumin -b1 -s10 "A:wikidough" "run-puppet-agent": T370821
  • 11:12 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1156.eqiad.wmnet
  • 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Minor template modification - fabfur@cumin1002"
  • 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Minor template modification - fabfur@cumin1002
  • 11:10 fabfur@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Minor template modification - fabfur@cumin1002
  • 11:09 fabfur@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Minor template modification - fabfur@cumin1002"
  • 11:05 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1156.eqiad.wmnet
  • 11:04 stevemunene@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1177.eqiad.wmnet with OS bullseye
  • 10:53 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 10:53 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 10:49 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.check-dbs (exit_code=99) Checking container DBs of wikipedia-commons-local-public.cde
  • 10:49 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cde
  • 10:49 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.cd
  • 10:48 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cd
  • 10:38 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.cd
  • 10:37 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cd
  • 10:36 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 10:36 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.check-dbs (exit_code=99) Checking container DBs of wikipedia-commons-local-public.cd
  • 10:36 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cd
  • 10:34 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f2
  • 10:34 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f2
  • 10:32 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ad
  • 10:32 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ad
  • 10:29 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.check-dbs (exit_code=99) Checking container DBs of wikipedia-commons-local-public.ad
  • 10:29 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ad
  • 10:23 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
  • 10:21 effie: mw-mcrouter minor update, memcached errors are expected
  • 10:20 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.check-dbs (exit_code=99) Checking container DBs of wikipedia-commons-local-public.ad
  • 10:20 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ad
  • 10:19 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 10:15 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ad
  • 10:15 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ad
  • 10:08 Emperor: depool thanos-fe100[1-3] prior to decom T391352
  • 10:07 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.1 refs T392171
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76218 and previous config saved to /var/cache/conftool/dbconfig/20250515-095108-root.json
  • 09:45 dreamyjazz@deploy1003: Finished scap sync-world: Backport for FlaggablePageView: don't call getId() on null (T394381) (duration: 16m 00s)
  • 09:44 isaranto@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:44 isaranto@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:39 isaranto@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:37 dreamyjazz@deploy1003: dreamyjazz, zabe: Continuing with sync
  • 09:36 dreamyjazz@deploy1003: dreamyjazz, zabe: Backport for FlaggablePageView: don't call getId() on null (T394381) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76217 and previous config saved to /var/cache/conftool/dbconfig/20250515-093602-root.json
  • 09:30 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1177.eqiad.wmnet
  • 09:29 dreamyjazz@deploy1003: Started scap sync-world: Backport for FlaggablePageView: don't call getId() on null (T394381)
  • 09:27 mvernon@cumin1002: conftool action : set/pooled=yes; selector: name=thanos-fe1007.eqiad.wmnet
  • 09:27 mvernon@cumin1002: conftool action : set/pooled=yes; selector: name=thanos-fe1006.eqiad.wmnet
  • 09:27 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76216 and previous config saved to /var/cache/conftool/dbconfig/20250515-092721-root.json
  • 09:27 mvernon@cumin1002: conftool action : set/pooled=yes; selector: name=thanos-fe1005.eqiad.wmnet
  • 09:27 mvernon@cumin1002: conftool action : set/weight=100; selector: name=thanos-fe1007.eqiad.wmnet
  • 09:27 mvernon@cumin1002: conftool action : set/weight=100; selector: name=thanos-fe1006.eqiad.wmnet
  • 09:27 mvernon@cumin1002: conftool action : set/weight=100; selector: name=thanos-fe1005.eqiad.wmnet
  • 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76215 and previous config saved to /var/cache/conftool/dbconfig/20250515-092314-root.json
  • 09:23 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 09:22 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.1 refs T392171
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76214 and previous config saved to /var/cache/conftool/dbconfig/20250515-092056-root.json
  • 09:17 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 09:12 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76213 and previous config saved to /var/cache/conftool/dbconfig/20250515-091216-root.json
  • 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76212 and previous config saved to /var/cache/conftool/dbconfig/20250515-090808-root.json
  • 09:07 Emperor: reboot thanos-fe100[5-7] prior to bringing into service T391352
  • 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76211 and previous config saved to /var/cache/conftool/dbconfig/20250515-090551-root.json
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76210 and previous config saved to /var/cache/conftool/dbconfig/20250515-085710-root.json
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76209 and previous config saved to /var/cache/conftool/dbconfig/20250515-085303-root.json
  • 08:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc7 T394260', diff saved to https://phabricator.wikimedia.org/P76208 and previous config saved to /var/cache/conftool/dbconfig/20250515-085256-marostegui.json
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76207 and previous config saved to /var/cache/conftool/dbconfig/20250515-085045-root.json
  • 08:50 dhinus: wikitech-static: rm -rf /srv/mediawiki/images/wikitech/archive/* (T338520)
  • 08:49 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1177.eqiad.wmnet
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76206 and previous config saved to /var/cache/conftool/dbconfig/20250515-084204-root.json
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76205 and previous config saved to /var/cache/conftool/dbconfig/20250515-083744-root.json
  • 08:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76204 and previous config saved to /var/cache/conftool/dbconfig/20250515-083540-root.json
  • 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76203 and previous config saved to /var/cache/conftool/dbconfig/20250515-082659-root.json
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1187 for testing T264016', diff saved to https://phabricator.wikimedia.org/P76202 and previous config saved to /var/cache/conftool/dbconfig/20250515-082333-marostegui.json
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76201 and previous config saved to /var/cache/conftool/dbconfig/20250515-082238-root.json
  • 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76200 and previous config saved to /var/cache/conftool/dbconfig/20250515-082002-root.json
  • 08:17 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.1 refs T392171
  • 08:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 08:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 08:12 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 08:12 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76198 and previous config saved to /var/cache/conftool/dbconfig/20250515-081153-root.json
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76197 and previous config saved to /var/cache/conftool/dbconfig/20250515-081141-root.json
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76196 and previous config saved to /var/cache/conftool/dbconfig/20250515-080733-root.json
  • 08:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76195 and previous config saved to /var/cache/conftool/dbconfig/20250515-080456-root.json
  • 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76194 and previous config saved to /var/cache/conftool/dbconfig/20250515-075648-root.json
  • 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76193 and previous config saved to /var/cache/conftool/dbconfig/20250515-075636-root.json
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76192 and previous config saved to /var/cache/conftool/dbconfig/20250515-075228-root.json
  • 07:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76191 and previous config saved to /var/cache/conftool/dbconfig/20250515-074950-root.json
  • 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76190 and previous config saved to /var/cache/conftool/dbconfig/20250515-074142-root.json
  • 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76189 and previous config saved to /var/cache/conftool/dbconfig/20250515-074131-root.json
  • 07:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 07:38 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76188 and previous config saved to /var/cache/conftool/dbconfig/20250515-073723-root.json
  • 07:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 07:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76187 and previous config saved to /var/cache/conftool/dbconfig/20250515-073445-root.json
  • 07:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 07:33 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 07:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2043.codfw.wmnet,es1041.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1041 es2043 T391921', diff saved to https://phabricator.wikimedia.org/P76186 and previous config saved to /var/cache/conftool/dbconfig/20250515-073033-marostegui.json
  • 07:26 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76185 and previous config saved to /var/cache/conftool/dbconfig/20250515-072625-root.json
  • 07:19 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76184 and previous config saved to /var/cache/conftool/dbconfig/20250515-071939-root.json
  • 07:18 moritzm: installing nginx security updates
  • 07:11 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76183 and previous config saved to /var/cache/conftool/dbconfig/20250515-071119-root.json
  • 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76182 and previous config saved to /var/cache/conftool/dbconfig/20250515-070706-root.json
  • 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76181 and previous config saved to /var/cache/conftool/dbconfig/20250515-070653-root.json
  • 07:06 godog: add 70G to arclamp /srv
  • 07:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76180 and previous config saved to /var/cache/conftool/dbconfig/20250515-070433-root.json
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76179 and previous config saved to /var/cache/conftool/dbconfig/20250515-065613-root.json
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76178 and previous config saved to /var/cache/conftool/dbconfig/20250515-065200-root.json
  • 06:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76177 and previous config saved to /var/cache/conftool/dbconfig/20250515-065147-root.json
  • 06:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2045.codfw.wmnet,es1045.eqiad.wmnet with reason: Maintenance
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1045 es2045 T391921', diff saved to https://phabricator.wikimedia.org/P76176 and previous config saved to /var/cache/conftool/dbconfig/20250515-065039-marostegui.json
  • 06:49 kart_: Updated cxserver to 2025-05-14-005542-production (T394008, T392499)
  • 06:46 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:46 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:43 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:43 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:38 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:38 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76175 and previous config saved to /var/cache/conftool/dbconfig/20250515-063655-root.json
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76174 and previous config saved to /var/cache/conftool/dbconfig/20250515-063641-root.json
  • 06:21 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76173 and previous config saved to /var/cache/conftool/dbconfig/20250515-062149-root.json
  • 06:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76172 and previous config saved to /var/cache/conftool/dbconfig/20250515-062135-root.json
  • 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76171 and previous config saved to /var/cache/conftool/dbconfig/20250515-060643-root.json
  • 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76170 and previous config saved to /var/cache/conftool/dbconfig/20250515-060629-root.json
  • 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76169 and previous config saved to /var/cache/conftool/dbconfig/20250515-055137-root.json
  • 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76168 and previous config saved to /var/cache/conftool/dbconfig/20250515-055124-root.json
  • 05:43 marostegui@dns1006: END - running authdns-update
  • 05:41 marostegui@dns1006: START - running authdns-update
  • 05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1042 and es2042 to es4 masters T391921', diff saved to https://phabricator.wikimedia.org/P76167 and previous config saved to /var/cache/conftool/dbconfig/20250515-053958-marostegui.json
  • 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76166 and previous config saved to /var/cache/conftool/dbconfig/20250515-053631-root.json
  • 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76165 and previous config saved to /var/cache/conftool/dbconfig/20250515-053618-root.json
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76164 and previous config saved to /var/cache/conftool/dbconfig/20250515-052126-root.json
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76163 and previous config saved to /var/cache/conftool/dbconfig/20250515-052113-root.json
  • 05:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on pc2017.codfw.wmnet,pc1017.eqiad.wmnet with reason: Maintenance
  • 05:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on pc2017.codfw.wmnet with reason: Maintenance
  • 05:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on pc1017.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc7 T394260', diff saved to https://phabricator.wikimedia.org/P76162 and previous config saved to /var/cache/conftool/dbconfig/20250515-050724-marostegui.json
  • 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76161 and previous config saved to /var/cache/conftool/dbconfig/20250515-050620-root.json
  • 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76160 and previous config saved to /var/cache/conftool/dbconfig/20250515-050607-root.json
  • 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1043 es2041 T391921', diff saved to https://phabricator.wikimedia.org/P76159 and previous config saved to /var/cache/conftool/dbconfig/20250515-045658-marostegui.json
  • 04:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2041.codfw.wmnet,es1043.eqiad.wmnet with reason: Maintenance
  • 04:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1192 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76158 and previous config saved to /var/cache/conftool/dbconfig/20250515-045631-ladsgroup.json
  • 04:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1256 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76157 and previous config saved to /var/cache/conftool/dbconfig/20250515-045345-ladsgroup.json
  • 03:35 eileen: civicrm upgraded from a8b7c589 to 5c45f41b
  • 01:08 cwhite: clear up some space on arclamp2001 to allow arclamp_compress_logs to complete

2025-05-14

  • 22:51 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_drmrs
  • 22:46 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_drmrs
  • 22:33 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1075.eqiad.wmnet with OS bullseye
  • 22:26 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:23 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1072.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:21 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1074.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:20 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:14 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1074.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:13 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1074.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1074.eqiad.wmnet with OS bullseye
  • 22:10 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2088.codfw.wmnet with reason: T381919
  • 22:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1075.eqiad.wmnet with reason: host reimage
  • 22:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:08 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:06 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1075.eqiad.wmnet with reason: host reimage
  • 22:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1072.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:04 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1072.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:03 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:02 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1075
  • 21:51 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1075
  • 21:51 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1075.eqiad.wmnet with OS bullseye
  • 21:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1075 to cirrussearch1075
  • 21:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1074.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:50 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1075
  • 21:49 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1074
  • 21:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1074.eqiad.wmnet with reason: host reimage
  • 21:49 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1074
  • 21:49 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1075
  • 21:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1075 on all recursors
  • 21:48 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1075 on all recursors
  • 21:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:48 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1075 to cirrussearch1075 - bking@cumin2002"
  • 21:48 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-be1004.eqiad.wmnet with OS bookworm
  • 21:47 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2088.codfw.wmnet with reason: T381919
  • 21:47 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:47 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1075 to cirrussearch1075 - bking@cumin2002"
  • 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1073
  • 21:46 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1073
  • 21:45 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1074.eqiad.wmnet with reason: host reimage
  • 21:43 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1072.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:43 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1072
  • 21:43 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1072
  • 21:43 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:42 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:40 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:40 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:40 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1075 to cirrussearch1075
  • 21:39 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1071
  • 21:39 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1071
  • 21:37 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1069.eqiad.wmnet with OS bookworm
  • 21:37 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 21:34 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 21:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1074
  • 21:31 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1074
  • 21:31 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1074.eqiad.wmnet with OS bullseye
  • 21:28 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1070.eqiad.wmnet with OS bookworm
  • 21:28 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 21:27 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 21:22 brennen: end of UTC late backport & config window (and spiderpig party)
  • 21:22 brennen@deploy1003: Finished scap sync-world: Backport for Design Research participant recruitment survey on eswiki: Pre-deploy (T394315) (duration: 16m 06s)
  • 21:17 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1069.eqiad.wmnet with reason: host reimage
  • 21:15 brennen@deploy1003: brennen, dani: Continuing with sync
  • 21:14 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1069.eqiad.wmnet with reason: host reimage
  • 21:13 brennen@deploy1003: brennen, dani: Backport for Design Research participant recruitment survey on eswiki: Pre-deploy (T394315) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:10 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1070.eqiad.wmnet with reason: host reimage
  • 21:08 sukhe@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=search
  • 21:07 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1070.eqiad.wmnet with reason: host reimage
  • 21:07 sukhe@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=search-omega
  • 21:07 sukhe@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=search-psi
  • 21:06 brennen@deploy1003: Started scap sync-world: Backport for Design Research participant recruitment survey on eswiki: Pre-deploy (T394315)
  • 21:06 sukhe@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=seach-psi
  • 21:04 jgleeson: civicrm upgraded from 4607c099 to a8b7c589
  • 21:01 cscott@deploy1003: Finished scap sync-world: Backport for Remove ParserMigration configuration that matches defaults (duration: 13m 10s)
  • 21:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1074 to cirrussearch1074
  • 20:59 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1074
  • 20:59 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1069.eqiad.wmnet with OS bookworm
  • 20:59 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1074
  • 20:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1074 on all recursors
  • 20:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1074 on all recursors
  • 20:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1074 to cirrussearch1074 - bking@cumin2002"
  • 20:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1074 to cirrussearch1074 - bking@cumin2002"
  • 20:55 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS bullseye
  • 20:54 cscott@deploy1003: cscott: Continuing with sync
  • 20:54 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 16 hosts
  • 20:54 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for 16 hosts
  • 20:53 sukhe: gdnsd reload issues should be fixed
  • 20:53 sukhe@dns1004: END - running authdns-update
  • 20:52 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:52 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1070.eqiad.wmnet with OS bookworm
  • 20:52 cscott@deploy1003: cscott: Backport for Remove ParserMigration configuration that matches defaults synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:52 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1074 to cirrussearch1074
  • 20:52 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:52 sukhe@dns1004: START - running authdns-update
  • 20:48 cscott@deploy1003: Started scap sync-world: Backport for Remove ParserMigration configuration that matches defaults
  • 20:47 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:41 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1069.eqiad.wmnet with OS bookworm
  • 20:41 sukhe@dns1004: START - running authdns-update
  • 20:40 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:40 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1075.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:40 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1075.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:36 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1070.eqiad.wmnet with OS bookworm
  • 20:36 sukhe@dns1004: START - running authdns-update
  • 20:32 jdrewniak@deploy1003: Finished scap sync-world: Backport for Add ArticleSummaries to beta cluster (T392520), Expand dark mode access for anons (May 2025 deployments) (T393386), Nearby should show file namespace on Commons (T52133) (duration: 12m 30s)
  • 20:26 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 20:26 bking@dns1004: START - running authdns-update
  • 20:25 jdrewniak@deploy1003: jdlrobson, jdrewniak: Continuing with sync
  • 20:25 jdrewniak@deploy1003: jdlrobson, jdrewniak: Backport for Add ArticleSummaries to beta cluster (T392520), Expand dark mode access for anons (May 2025 deployments) (T393386), Nearby should show file namespace on Commons (T52133) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:24 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search
  • 20:24 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search-omega
  • 20:24 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search-psi
  • 20:23 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 20:20 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:20 jdrewniak@deploy1003: Started scap sync-world: Backport for Add ArticleSummaries to beta cluster (T392520), Expand dark mode access for anons (May 2025 deployments) (T393386), Nearby should show file namespace on Commons (T52133)
  • 20:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
  • 20:17 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:08 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS bullseye
  • 20:07 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 19:50 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1068.eqiad.wmnet with reason: host reimage
  • 19:50 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1069.eqiad.wmnet with OS bookworm
  • 19:50 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1056.eqiad.wmnet with OS bullseye
  • 19:48 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1070.eqiad.wmnet with OS bookworm
  • 19:47 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1068.eqiad.wmnet with reason: host reimage
  • 19:40 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_drmrs
  • 19:40 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_drmrs
  • 19:36 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1070.eqiad.wmnet with OS bookworm
  • 19:32 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1068.eqiad.wmnet with OS bookworm
  • 19:31 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1068.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:29 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:28 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:27 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:26 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:24 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:21 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_ulsfo
  • 19:21 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1055.eqiad.wmnet with OS bullseye
  • 19:19 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:19 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1068.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:19 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1070.eqiad.wmnet with OS bookworm
  • 19:17 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:17 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_ulsfo
  • 19:16 jhuneidi@deploy1003: Finished scap sync-world: Backport for Stats: Add temporary deprecation for addLabel() normalization (T394053) (duration: 15m 24s)
  • 19:16 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1068.eqiad.wmnet with OS bookworm
  • 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1055.eqiad.wmnet with OS bullseye
  • 19:15 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1055.eqiad.wmnet with OS bullseye
  • 19:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1056
  • 19:13 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1056
  • 19:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1056.eqiad.wmnet with OS bullseye
  • 19:10 jhuneidi@deploy1003: jhuneidi, krinkle: Continuing with sync
  • 19:08 jhuneidi@deploy1003: jhuneidi, krinkle: Backport for Stats: Add temporary deprecation for addLabel() normalization (T394053) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:05 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1056 to cirrussearch1056
  • 19:03 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1056
  • 19:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:02 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1056
  • 19:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1056 on all recursors
  • 19:02 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1056 on all recursors
  • 19:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:02 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1056 to cirrussearch1056 - bking@cumin2002"
  • 19:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1056 to cirrussearch1056 - bking@cumin2002"
  • 19:01 jhuneidi@deploy1003: Started scap sync-world: Backport for Stats: Add temporary deprecation for addLabel() normalization (T394053)
  • 18:56 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1068.eqiad.wmnet with OS bookworm
  • 18:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1055
  • 18:56 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1055
  • 18:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1055.eqiad.wmnet with OS bullseye
  • 18:56 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 18:56 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1056 to cirrussearch1056
  • 18:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1055 to cirrussearch1055
  • 18:54 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1055
  • 18:53 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1055
  • 18:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1055 on all recursors
  • 18:53 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1055 on all recursors
  • 18:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:53 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1055 to cirrussearch1055 - bking@cumin2002"
  • 18:52 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1055 to cirrussearch1055 - bking@cumin2002"
  • 18:47 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 18:47 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1055 to cirrussearch1055
  • 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:38 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:37 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:29 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:27 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:26 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1070
  • 18:26 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1070
  • 18:25 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1070
  • 18:25 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1070
  • 18:15 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:15 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1069
  • 18:15 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1069
  • 18:02 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet
  • 18:00 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
  • 17:48 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1068.eqiad.wmnet with OS bookworm
  • 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 17:30 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 17:30 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 17:12 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:12 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:56 sukhe: updating nameservers for wiki.gives in Markmonitor to set up delegation: T379318
  • 16:43 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet
  • 16:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_ulsfo
  • 16:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_ulsfo
  • 16:16 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:11 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2088.codfw.wmnet with reason: T381919
  • 16:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-dev: apply
  • 15:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-dev: apply
  • 15:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
  • 15:25 fabfur: removing varnishkafka from magru (T393772)
  • 15:17 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 15:17 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 15:13 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:12 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:11 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:11 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:10 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:10 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:09 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:08 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 15:07 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 15:03 dancy@deploy1003: Installation of scap version "4.167.0" completed for 2 hosts
  • 15:01 dancy@deploy1003: Installing scap version "4.167.0" for 2 host(s)
  • 14:55 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Set $wgMediaModerationPhotoDNASubscriptionKey as empty in readme.php (T394299) (duration: 11m 20s)
  • 14:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db2181 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76153 and previous config saved to /var/cache/conftool/dbconfig/20250514-145336-ladsgroup.json
  • 14:48 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 14:48 dreamyjazz@deploy1003: dreamyjazz: Backport for Set $wgMediaModerationPhotoDNASubscriptionKey as empty in readme.php (T394299) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:43 dreamyjazz@deploy1003: Started scap sync-world: Backport for Set $wgMediaModerationPhotoDNASubscriptionKey as empty in readme.php (T394299)
  • 14:37 moritzm: installing glib2.0 security updates
  • 14:37 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1001.eqiad.wmnet
  • 14:31 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1001.eqiad.wmnet
  • 14:30 cgoubert@deploy1003: Finished scap sync-world: Deploy mediawiki: upgrade to mesh.configuration 1.13 - T391333 (duration: 12m 33s)
  • 14:18 cgoubert@deploy1003: Started scap sync-world: Deploy mediawiki: upgrade to mesh.configuration 1.13 - T391333
  • 14:16 moritzm: uploaded openjdk-8 8u452-ga-1~deb11u1 to component/jdk8 for bullseye-wikimedia
  • 14:16 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:16 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76152 and previous config saved to /var/cache/conftool/dbconfig/20250514-141532-root.json
  • 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:14 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet
  • 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:08 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
  • 14:07 klausman@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet
  • 14:07 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
  • 14:07 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1001.eqiad.wmnet
  • 14:01 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1001.eqiad.wmnet
  • 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76151 and previous config saved to /var/cache/conftool/dbconfig/20250514-140027-root.json
  • 13:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76150 and previous config saved to /var/cache/conftool/dbconfig/20250514-134521-root.json
  • 13:40 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1156.eqiad.wmnet
  • 13:38 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1156.eqiad.wmnet
  • 13:36 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:36 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:35 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:35 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:35 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:35 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:34 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:34 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:33 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:33 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:32 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:32 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76149 and previous config saved to /var/cache/conftool/dbconfig/20250514-133016-root.json
  • 13:20 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:19 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for manage-dblist: Rename to manage-dblist.php (T392819) (duration: 12m 48s)
  • 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76148 and previous config saved to /var/cache/conftool/dbconfig/20250514-131510-root.json
  • 13:13 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:13 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:12 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
  • 13:11 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for manage-dblist: Rename to manage-dblist.php (T392819) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:07 godog: correction, restart grafana-server on grafana1002
  • 13:06 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for manage-dblist: Rename to manage-dblist.php (T392819)
  • 13:05 godog: reboot grafana1002 - hard down
  • 13:01 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1068.eqiad.wmnet
  • 13:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76146 and previous config saved to /var/cache/conftool/dbconfig/20250514-130004-root.json
  • 12:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76145 and previous config saved to /var/cache/conftool/dbconfig/20250514-124458-root.json
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76144 and previous config saved to /var/cache/conftool/dbconfig/20250514-122952-root.json
  • 12:28 joal@deploy1003: Finished deploy [analytics/refinery@9d620d0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9d620d06] (duration: 00m 46s)
  • 12:28 joal@deploy1003: Started deploy [analytics/refinery@9d620d0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9d620d06]
  • 12:27 joal@deploy1003: Finished deploy [analytics/refinery@9d620d0] (thin): Analytics webrequest migration THIN [analytics/refinery@9d620d06] (duration: 01m 35s)
  • 12:26 joal@deploy1003: Started deploy [analytics/refinery@9d620d0] (thin): Analytics webrequest migration THIN [analytics/refinery@9d620d06]
  • 12:25 joal@deploy1003: Finished deploy [analytics/refinery@9d620d0]: Regular analytics weekly train [analytics/refinery@9d620d06] (duration: 02m 17s)
  • 12:23 joal@deploy1003: Started deploy [analytics/refinery@9d620d0]: Regular analytics weekly train [analytics/refinery@9d620d06]
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76143 and previous config saved to /var/cache/conftool/dbconfig/20250514-121446-root.json
  • 11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db2243 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76142 and previous config saved to /var/cache/conftool/dbconfig/20250514-114724-ladsgroup.json
  • 11:47 moritzm: installing librabbitmq securit updates
  • 11:41 ladsgroup@deploy1003: Finished scap sync-world: Backport for Move production term store traffic to x3 (T351820) (duration: 20m 48s)
  • 11:41 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1068.eqiad.wmnet
  • 11:38 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
  • 11:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 11:27 ladsgroup@deploy1003: ladsgroup: Backport for Move production term store traffic to x3 (T351820) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:21 ladsgroup@deploy1003: Started scap sync-world: Backport for Move production term store traffic to x3 (T351820)
  • 11:18 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@22aa307]: T393561 (duration: 01m 10s)
  • 11:17 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@22aa307]: T393561
  • 11:15 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
  • 11:15 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:14 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:13 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:12 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on A:cephosd
  • 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:10 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:03 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:01 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:50 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:49 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:47 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on A:cephosd
  • 10:44 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:43 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:41 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:41 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:40 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:39 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:28 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.1 refs T392171
  • 10:12 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Also merge fields if stemming settings empty on one side (T394274) (duration: 15m 53s)
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76138 and previous config saved to /var/cache/conftool/dbconfig/20250514-101057-root.json
  • 10:05 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
  • 10:03 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Also merge fields if stemming settings empty on one side (T394274) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:58 marostegui@dns1006: END - running authdns-update
  • 09:57 marostegui@dns1006: START - running authdns-update
  • 09:56 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Also merge fields if stemming settings empty on one side (T394274)
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76137 and previous config saved to /var/cache/conftool/dbconfig/20250514-095553-root.json
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76136 and previous config saved to /var/cache/conftool/dbconfig/20250514-095552-root.json
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Add x3 codfw T390530', diff saved to https://phabricator.wikimedia.org/P76135 and previous config saved to /var/cache/conftool/dbconfig/20250514-095031-marostegui.json
  • 09:50 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site esams [reason: esams routers upgrade finished, T364092]
  • 09:49 ayounsi@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site esams [reason: esams routers upgrade finished, T364092]
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76133 and previous config saved to /var/cache/conftool/dbconfig/20250514-094048-root.json
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76132 and previous config saved to /var/cache/conftool/dbconfig/20250514-094047-root.json
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Add x3 eqiad T390530', diff saved to https://phabricator.wikimedia.org/P76131 and previous config saved to /var/cache/conftool/dbconfig/20250514-094038-marostegui.json
  • 09:38 XioNoX: repool cr1-esams - T364092
  • 09:35 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 09:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 09:28 XioNoX: cr1-esams> request chassis routing-engine master switch - T364092
  • 09:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:25 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:25 moritzm: retry full planet import for Bookworm maps master (the one yesterday failed due to a bug now fixed) T381565
  • 09:21 XioNoX: re1.cr1-esams> request vmhost reboot re0 - T364092
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76129 and previous config saved to /var/cache/conftool/dbconfig/20250514-092126-root.json
  • 09:12 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
  • 09:12 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
  • 09:12 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
  • 09:12 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
  • 09:12 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
  • 09:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76128 and previous config saved to /var/cache/conftool/dbconfig/20250514-091200-root.json
  • 09:12 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
  • 09:11 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
  • 09:11 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
  • 09:11 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
  • 09:11 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
  • 09:11 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
  • 09:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76127 and previous config saved to /var/cache/conftool/dbconfig/20250514-091100-root.json
  • 09:10 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
  • 09:10 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 09:10 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 09:10 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 09:10 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 09:10 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 09:10 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76126 and previous config saved to /var/cache/conftool/dbconfig/20250514-090621-root.json
  • 09:05 XioNoX: cr1-esams> request chassis routing-engine master switch - T364092
  • 08:58 XioNoX: cr1-esams request vmhost reboot re1 - T364092
  • 08:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76125 and previous config saved to /var/cache/conftool/dbconfig/20250514-085655-root.json
  • 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76124 and previous config saved to /var/cache/conftool/dbconfig/20250514-085555-root.json
  • 08:53 marostegui: Mark db2241 as x3 master in zarcillo T390530
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76123 and previous config saved to /var/cache/conftool/dbconfig/20250514-085115-root.json
  • 08:46 XioNoX: cr1-esams - Install image on backup RE - T364092
  • 08:44 XioNoX: cr1-esams - disable transit/IX BGP sessions - T364092
  • 08:43 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr1-esams,cr1-esams IPv6 with reason: cr1-esams upgrade
  • 08:43 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on re0.cr1-esams.mgmt with reason: cr1-esams upgrade
  • 08:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76122 and previous config saved to /var/cache/conftool/dbconfig/20250514-084149-root.json
  • 08:41 ayounsi@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on cr1-esams,cr1-esams IPv6,cr1-esams.mgmt with reason: cr1-esams upgrade
  • 08:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76121 and previous config saved to /var/cache/conftool/dbconfig/20250514-084049-root.json
  • 08:39 XioNoX: cr1-esams# set protocols bgp graceful-shutdown sender - T364092
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76120 and previous config saved to /var/cache/conftool/dbconfig/20250514-083609-root.json
  • 08:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76119 and previous config saved to /var/cache/conftool/dbconfig/20250514-083233-root.json
  • 08:31 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1257.eqiad.wmnet onto db1258.eqiad.wmnet
  • 08:31 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1257 gradually with 4 steps - Pool db1257.eqiad.wmnet in after cloning
  • 08:30 marostegui@dns1006: END - running authdns-update
  • 08:30 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.1 refs T392171
  • 08:29 marostegui@dns1006: START - running authdns-update
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76117 and previous config saved to /var/cache/conftool/dbconfig/20250514-082815-root.json
  • 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76116 and previous config saved to /var/cache/conftool/dbconfig/20250514-082644-root.json
  • 08:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76115 and previous config saved to /var/cache/conftool/dbconfig/20250514-082543-root.json
  • 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76114 and previous config saved to /var/cache/conftool/dbconfig/20250514-082102-root.json
  • 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76113 and previous config saved to /var/cache/conftool/dbconfig/20250514-081728-root.json
  • 08:13 XioNoX: cr2-esams> request vmhost reboot - T364092
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76111 and previous config saved to /var/cache/conftool/dbconfig/20250514-081311-root.json
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76110 and previous config saved to /var/cache/conftool/dbconfig/20250514-081139-root.json
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76109 and previous config saved to /var/cache/conftool/dbconfig/20250514-081037-root.json
  • 08:07 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "T393381 - oblivian@cumin2002"
  • 08:07 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: T393381 - oblivian@cumin2002
  • 08:06 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: T393381 - oblivian@cumin2002
  • 08:06 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "T393381 - oblivian@cumin2002"
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76108 and previous config saved to /var/cache/conftool/dbconfig/20250514-080557-root.json
  • 08:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76107 and previous config saved to /var/cache/conftool/dbconfig/20250514-080222-root.json
  • 08:01 marostegui@cumin1002: START - Cookbook sre.mysql.pool db1257 gradually with 4 steps - Pool db1257.eqiad.wmnet in after cloning
  • 07:59 XioNoX: cr2-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-23.4R2-S3.9.tgz - T364092
  • 07:58 XioNoX: cr2-esams - disable transit/IX BGP sessions - T364092
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76105 and previous config saved to /var/cache/conftool/dbconfig/20250514-075805-root.json
  • 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76104 and previous config saved to /var/cache/conftool/dbconfig/20250514-075633-root.json
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76102 and previous config saved to /var/cache/conftool/dbconfig/20250514-075532-root.json
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1258 to dbctl T393989', diff saved to https://phabricator.wikimedia.org/P76101 and previous config saved to /var/cache/conftool/dbconfig/20250514-075254-marostegui.json
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76099 and previous config saved to /var/cache/conftool/dbconfig/20250514-075052-root.json
  • 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76097 and previous config saved to /var/cache/conftool/dbconfig/20250514-074717-root.json
  • 07:44 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1257.eqiad.wmnet with reason: Maintenance
  • 07:44 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1256.eqiad.wmnet with reason: Maintenance
  • 07:43 XioNoX: cr2-esams# set protocols bgp graceful-shutdown sender - T364092
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76095 and previous config saved to /var/cache/conftool/dbconfig/20250514-074300-root.json
  • 07:43 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-esams,cr2-esams IPv6,cr2-esams.mgmt with reason: cr2-esams upgrade
  • 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76094 and previous config saved to /var/cache/conftool/dbconfig/20250514-074128-root.json
  • 07:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76093 and previous config saved to /var/cache/conftool/dbconfig/20250514-074027-root.json
  • 07:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site esams [reason: esams routers upgrade, T364092]
  • 07:36 ayounsi@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site esams [reason: esams routers upgrade, T364092]
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76092 and previous config saved to /var/cache/conftool/dbconfig/20250514-073547-root.json
  • 07:34 moritzm: installing glibc security updates
  • 07:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2042.codfw.wmnet with reason: Maintenance
  • 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76091 and previous config saved to /var/cache/conftool/dbconfig/20250514-073211-root.json
  • 07:31 kostajh: UTC morning deploys done
  • 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76090 and previous config saved to /var/cache/conftool/dbconfig/20250514-072755-root.json
  • 07:26 kharlan@deploy1003: Finished scap sync-world: Backport for Use anonymous user when creating named account from temp account (T393628) (duration: 19m 51s)
  • 07:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P76089 and previous config saved to /var/cache/conftool/dbconfig/20250514-072622-root.json
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76088 and previous config saved to /var/cache/conftool/dbconfig/20250514-072042-root.json
  • 07:20 kharlan@deploy1003: kharlan: Continuing with sync
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76087 and previous config saved to /var/cache/conftool/dbconfig/20250514-071706-root.json
  • 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76086 and previous config saved to /var/cache/conftool/dbconfig/20250514-071250-root.json
  • 07:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2042.codfw.wmnet,es1042.eqiad.wmnet with reason: Maintenance
  • 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1042 es2042 T391921', diff saved to https://phabricator.wikimedia.org/P76085 and previous config saved to /var/cache/conftool/dbconfig/20250514-071159-marostegui.json
  • 07:11 kharlan@deploy1003: kharlan: Backport for Use anonymous user when creating named account from temp account (T393628) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P76084 and previous config saved to /var/cache/conftool/dbconfig/20250514-071117-root.json
  • 07:06 kharlan@deploy1003: Started scap sync-world: Backport for Use anonymous user when creating named account from temp account (T393628)
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76083 and previous config saved to /var/cache/conftool/dbconfig/20250514-070200-root.json
  • 06:57 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76082 and previous config saved to /var/cache/conftool/dbconfig/20250514-065744-root.json
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P76081 and previous config saved to /var/cache/conftool/dbconfig/20250514-065611-root.json
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76080 and previous config saved to /var/cache/conftool/dbconfig/20250514-064654-root.json
  • 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76079 and previous config saved to /var/cache/conftool/dbconfig/20250514-064238-root.json
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76078 and previous config saved to /var/cache/conftool/dbconfig/20250514-064106-root.json
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76077 and previous config saved to /var/cache/conftool/dbconfig/20250514-063149-root.json
  • 06:27 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76076 and previous config saved to /var/cache/conftool/dbconfig/20250514-062733-root.json
  • 06:27 marostegui: es3 migrated to MariaDB 10.11 T391921
  • 06:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2034.codfw.wmnet,es1034.eqiad.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1034 es2034 T391921', diff saved to https://phabricator.wikimedia.org/P76075 and previous config saved to /var/cache/conftool/dbconfig/20250514-061721-marostegui.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1031 and es2029 to es3 masters T391921', diff saved to https://phabricator.wikimedia.org/P76074 and previous config saved to /var/cache/conftool/dbconfig/20250514-061650-marostegui.json
  • 06:11 marostegui: Drop query killers from parsercache T387740
  • 05:49 marostegui: Mark db1255 as x3 master in zarcillo T390530
  • 05:36 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1257 - Depool db1257.eqiad.wmnet to then clone it to db1258.eqiad.wmnet - marostegui@cumin1002
  • 05:36 marostegui@cumin1002: START - Cookbook sre.mysql.depool db1257 - Depool db1257.eqiad.wmnet to then clone it to db1258.eqiad.wmnet - marostegui@cumin1002
  • 05:36 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1257.eqiad.wmnet onto db1258.eqiad.wmnet
  • 01:43 eileen: config revision changed from c4cda34a to 5c4b83ad
  • 01:38 eileen: civicrm upgraded from 18deba4c to 4607c099
  • 01:13 eileen: civicrm upgraded from 40d488b8 to 18deba4c
  • 00:53 eileen: config revision changed from ddf64519 to c4cda34a

2025-05-13

  • 23:30 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1068.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:45 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_magru
  • 22:45 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_magru
  • 22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1075.eqiad.wmnet with OS bookworm
  • 22:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:13 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1007.eqiad.wmnet with OS bullseye
  • 22:13 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 22:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:07 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 21:49 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1007.eqiad.wmnet with reason: host reimage
  • 21:49 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1068.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:48 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1068
  • 21:48 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1068
  • 21:47 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:47 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudvirt1068 - vriley@cumin1002"
  • 21:47 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudvirt1068 - vriley@cumin1002"
  • 21:47 ejegg: civicrm upgraded from 852c6ee6 to 40d488b8
  • 21:45 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1007.eqiad.wmnet with reason: host reimage
  • 21:44 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:43 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:43 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for apus-be1004 - jclark@cumin1002"
  • 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 21:42 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for apus-be1004 - jclark@cumin1002"
  • 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 21:39 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:29 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
  • 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1075.eqiad.wmnet with reason: host reimage
  • 21:14 jforrester@deploy1003: Finished scap sync-world: Backport for Register our magic vars, so the parser knows to ask us what their values are (T345477), Register our magic vars, so the parser knows to ask us what their values are (T345477) (duration: 13m 13s)
  • 21:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1075.eqiad.wmnet with reason: host reimage
  • 21:07 jforrester@deploy1003: jforrester: Continuing with sync
  • 21:07 jforrester@deploy1003: jforrester: Backport for Register our magic vars, so the parser knows to ask us what their values are (T345477), Register our magic vars, so the parser knows to ask us what their values are (T345477) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:01 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 21:01 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 21:00 jforrester@deploy1003: Started scap sync-world: Backport for Register our magic vars, so the parser knows to ask us what their values are (T345477), Register our magic vars, so the parser knows to ask us what their values are (T345477)
  • 21:00 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 21:00 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 20:59 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 20:58 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1075.eqiad.wmnet with OS bookworm
  • 20:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1075']
  • 20:53 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1075']
  • 20:53 jforrester@deploy1003: Finished scap sync-world: Backport for Remove web_ab_test_enrollment schema (T386247) (duration: 13m 36s)
  • 20:46 jforrester@deploy1003: bwang, jforrester: Continuing with sync
  • 20:46 jforrester@deploy1003: bwang, jforrester: Backport for Remove web_ab_test_enrollment schema (T386247) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1075.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:40 jforrester@deploy1003: Started scap sync-world: Backport for Remove web_ab_test_enrollment schema (T386247)
  • 20:38 jforrester@deploy1003: Finished scap sync-world: Backport for Stream registration for article summaries (T389097 T387406) (duration: 13m 12s)
  • 20:31 jforrester@deploy1003: ksarabia, jforrester: Continuing with sync
  • 20:31 jforrester@deploy1003: ksarabia, jforrester: Backport for Stream registration for article summaries (T389097 T387406) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:29 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
  • 20:27 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:24 jforrester@deploy1003: Started scap sync-world: Backport for Stream registration for article summaries (T389097 T387406)
  • 20:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1076.eqiad.wmnet with OS bookworm
  • 20:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:23 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:22 jforrester@deploy1003: Finished scap sync-world: Backport for Update to echarts 5.6.0 (T393377) (duration: 11m 36s)
  • 20:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:21 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1018.eqiad.wmnet with OS bookworm
  • 20:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1075.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:16 jforrester@deploy1003: jforrester, jdlrobson: Continuing with sync
  • 20:15 jforrester@deploy1003: jforrester, jdlrobson: Backport for Update to echarts 5.6.0 (T393377) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1075
  • 20:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1075
  • 20:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1075 to codfw - jhancock@cumin2002"
  • 20:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1075 to codfw - jhancock@cumin2002"
  • 20:11 jforrester@deploy1003: Started scap sync-world: Backport for Update to echarts 5.6.0 (T393377)
  • 20:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 20:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1018.eqiad.wmnet with reason: host reimage
  • 20:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1076.eqiad.wmnet with reason: host reimage
  • 20:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1018.eqiad.wmnet with reason: host reimage
  • 20:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1018.eqiad.wmnet with OS bookworm
  • 20:02 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1076.eqiad.wmnet with reason: host reimage
  • 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1258.eqiad.wmnet with OS bookworm
  • 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:56 ejegg: standalone (IPN listener) SmashPig upgraded from 4ac271dd to f96b898e
  • 19:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1018.eqiad.wmnet with OS bookworm
  • 19:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1076.eqiad.wmnet with OS bookworm
  • 19:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1076']
  • 19:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1076']
  • 19:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1258.eqiad.wmnet with reason: host reimage
  • 19:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1076.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1018.eqiad.wmnet with reason: host reimage
  • 19:40 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_magru
  • 19:40 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_magru
  • 19:40 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2088.codfw.wmnet with reason: T381919
  • 19:37 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs3009*} and A:liberica (T393616)
  • 19:37 brett@cumin2002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs3009*} and A:liberica (T393616)
  • 19:37 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1258.eqiad.wmnet with reason: host reimage
  • 19:37 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1018.eqiad.wmnet with reason: host reimage
  • 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1076.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:23 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_eqiad
  • 19:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1258.eqiad.wmnet with OS bookworm
  • 19:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1018.eqiad.wmnet with OS bookworm
  • 19:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1076.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:18 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_eqiad
  • 19:17 brett: Import Varnish 7.1.1-2~bpo11+wmf1 into bullseye-wikimedia (T394004)
  • 19:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1076.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:10 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1076
  • 19:10 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1076
  • 19:10 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1076 to codfw - jhancock@cumin2002"
  • 19:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1076 to codfw - jhancock@cumin2002"
  • 19:04 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1084.eqiad.wmnet with OS bullseye
  • 17:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1073.eqiad.wmnet with OS bullseye
  • 17:51 cstone: payments-wiki upgraded from 92a8cbb8 to 01de91b7
  • 17:43 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 264595
  • 17:42 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:41 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2088.codfw.wmnet with reason: T381919
  • 17:41 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 17:41 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 264595
  • 17:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1084.eqiad.wmnet with reason: host reimage
  • 17:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1073.eqiad.wmnet with reason: host reimage
  • 17:34 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1084.eqiad.wmnet with reason: host reimage
  • 17:31 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1073.eqiad.wmnet with reason: host reimage
  • 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 17:20 papaul: maintenance complete on all 3 switches
  • 17:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1084
  • 17:20 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1084
  • 17:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1084.eqiad.wmnet with OS bullseye
  • 17:17 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:17 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 17:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1084 to cirrussearch1084
  • 17:17 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1084
  • 17:16 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1073
  • 17:16 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1073
  • 17:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1073.eqiad.wmnet with OS bullseye
  • 17:15 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1084
  • 17:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1084 on all recursors
  • 17:15 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1084 on all recursors
  • 17:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1084 to cirrussearch1084 - bking@cumin2002"
  • 17:15 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1084 to cirrussearch1084 - bking@cumin2002"
  • 17:11 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 17:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1073 to cirrussearch1073
  • 17:11 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1084 to cirrussearch1084
  • 17:10 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1073
  • 17:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 17:09 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1073
  • 17:09 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1073 on all recursors
  • 17:09 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1073 on all recursors
  • 17:09 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1073 to cirrussearch1073 - bking@cumin2002"
  • 17:08 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1073 to cirrussearch1073 - bking@cumin2002"
  • 17:04 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 17:04 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1073 to cirrussearch1073
  • 16:55 papaul: on going maintenance on msw2-codfw
  • 16:50 papaul: maintenance complete on msw2-eqiad
  • 16:48 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:48 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:48 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:47 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:47 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:47 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:47 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:46 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:46 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:46 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:46 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:45 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:34 dancy@deploy1003: Installation of scap version "4.166.0" completed for 2 hosts
  • 16:32 dancy@deploy1003: Installing scap version "4.166.0" for 2 host(s)
  • 16:28 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:28 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:28 papaul: maintenance complete on msw2-eqiad
  • 16:28 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:27 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:27 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:26 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:21 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqiad
  • 16:21 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqiad
  • 16:20 papaul: maintenance complete on msw1-eqiad
  • 16:11 dancy@deploy1003: Installation of scap version "4.165.0" completed for 2 hosts
  • 16:09 dancy@deploy1003: Installing scap version "4.165.0" for 2 host(s)
  • 16:09 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1018.eqiad.wmnet with OS bookworm
  • 16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testreduce1002.eqiad.wmnet
  • 15:57 cgoubert@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM testreduce1002.eqiad.wmnet
  • 15:57 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.ad in eqiad
  • 15:56 claime: gnt-instance modify -B memory=10g testreduce1002.eqiad.wmnet - T393904
  • 15:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76071 and previous config saved to /var/cache/conftool/dbconfig/20250513-155547-root.json
  • 15:54 mvernon@cumin1002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.ad in eqiad
  • 15:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1258.eqiad.wmnet with OS bookworm
  • 15:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76070 and previous config saved to /var/cache/conftool/dbconfig/20250513-154041-root.json
  • 15:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1072.eqiad.wmnet with OS bullseye
  • 15:35 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1258.eqiad.wmnet with OS bookworm
  • 15:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1258.eqiad.wmnet with OS bookworm
  • 15:33 cmooney@dns2005: END - running authdns-update
  • 15:33 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:32 cmooney@dns2005: START - running authdns-update
  • 15:32 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1071.eqiad.wmnet with OS bullseye
  • 15:30 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:27 papaul: on going maintenance on msw1-eqiad
  • 15:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76069 and previous config saved to /var/cache/conftool/dbconfig/20250513-152631-root.json
  • 15:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76068 and previous config saved to /var/cache/conftool/dbconfig/20250513-152536-root.json
  • 15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate dns recrods for new codfw switches - cmooney@cumin1002"
  • 15:22 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate dns recrods for new codfw switches - cmooney@cumin1002"
  • 15:16 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1072.eqiad.wmnet with reason: host reimage
  • 15:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76067 and previous config saved to /var/cache/conftool/dbconfig/20250513-151125-root.json
  • 15:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76066 and previous config saved to /var/cache/conftool/dbconfig/20250513-151031-root.json
  • 15:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1072.eqiad.wmnet with reason: host reimage
  • 15:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1071.eqiad.wmnet with reason: host reimage
  • 15:04 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1071.eqiad.wmnet with reason: host reimage
  • 15:02 tchin@deploy1003: Finished deploy [airflow-dags/analytics@0550b16]: Deploying airflow artifacts for T384962 (duration: 02m 22s)
  • 15:00 tchin@deploy1003: Started deploy [airflow-dags/analytics@0550b16]: Deploying airflow artifacts for T384962
  • 14:59 papaul: maintenance complete on msw1-codfw
  • 14:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76064 and previous config saved to /var/cache/conftool/dbconfig/20250513-145620-root.json
  • 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76063 and previous config saved to /var/cache/conftool/dbconfig/20250513-145525-root.json
  • 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76062 and previous config saved to /var/cache/conftool/dbconfig/20250513-145514-root.json
  • 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76061 and previous config saved to /var/cache/conftool/dbconfig/20250513-145513-root.json
  • 14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1072
  • 14:54 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1072
  • 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1072.eqiad.wmnet with OS bullseye
  • 14:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1071
  • 14:49 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1071
  • 14:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1071.eqiad.wmnet with OS bullseye
  • 14:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1018.eqiad.wmnet with OS bookworm
  • 14:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76060 and previous config saved to /var/cache/conftool/dbconfig/20250513-144113-root.json
  • 14:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1258.eqiad.wmnet with OS bookworm
  • 14:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76059 and previous config saved to /var/cache/conftool/dbconfig/20250513-144019-root.json
  • 14:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76058 and previous config saved to /var/cache/conftool/dbconfig/20250513-144008-root.json
  • 14:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76057 and previous config saved to /var/cache/conftool/dbconfig/20250513-144007-root.json
  • 14:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1258.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1072 to cirrussearch1072
  • 14:39 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1072
  • 14:38 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:37 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1072
  • 14:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1072 on all recursors
  • 14:37 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1072 on all recursors
  • 14:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1072 to cirrussearch1072 - bking@cumin2002"
  • 14:37 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1072 to cirrussearch1072 - bking@cumin2002"
  • 14:34 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:33 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1072 to cirrussearch1072
  • 14:32 papaul: on going maintenance on msw1-codfw
  • 14:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1071 to cirrussearch1071
  • 14:30 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1071
  • 14:29 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1071
  • 14:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1071 on all recursors
  • 14:29 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1071 on all recursors
  • 14:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1071 to cirrussearch1071 - bking@cumin2002"
  • 14:29 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1071 to cirrussearch1071 - bking@cumin2002"
  • 14:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76056 and previous config saved to /var/cache/conftool/dbconfig/20250513-142608-root.json
  • 14:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76055 and previous config saved to /var/cache/conftool/dbconfig/20250513-142513-root.json
  • 14:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76054 and previous config saved to /var/cache/conftool/dbconfig/20250513-142503-root.json
  • 14:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76053 and previous config saved to /var/cache/conftool/dbconfig/20250513-142501-root.json
  • 14:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host pc1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host pc1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1258.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:15 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for pc1018 db1258 - jclark@cumin1002"
  • 14:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for pc1018 db1258 - jclark@cumin1002"
  • 14:14 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:11 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1071 to cirrussearch1071
  • 14:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76052 and previous config saved to /var/cache/conftool/dbconfig/20250513-141102-root.json
  • 14:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76051 and previous config saved to /var/cache/conftool/dbconfig/20250513-141007-root.json
  • 14:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76050 and previous config saved to /var/cache/conftool/dbconfig/20250513-140958-root.json
  • 14:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76049 and previous config saved to /var/cache/conftool/dbconfig/20250513-140956-root.json
  • 14:09 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 14:00 hnowlan: finalising rollout of restbaseless enwiki PCS APIs routed via rest-gateway
  • 13:59 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 13:58 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 13:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76048 and previous config saved to /var/cache/conftool/dbconfig/20250513-135557-root.json
  • 13:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76047 and previous config saved to /var/cache/conftool/dbconfig/20250513-135502-root.json
  • 13:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76046 and previous config saved to /var/cache/conftool/dbconfig/20250513-135452-root.json
  • 13:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76045 and previous config saved to /var/cache/conftool/dbconfig/20250513-135451-root.json
  • 13:51 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:50 lucaswerkmeister-wmde@deploy1003: Sync cancelled.
  • 13:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: cr3-eqsin upgrade finished, T364092]
  • 13:47 ayounsi@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: cr3-eqsin upgrade finished, T364092]
  • 13:46 lucaswerkmeister-wmde@deploy1003: d3r1ck01, lucaswerkmeister-wmde: Backport for SUL3: Fix account creation by username & email (with temp password) (T390751) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76044 and previous config saved to /var/cache/conftool/dbconfig/20250513-134051-root.json
  • 13:40 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for SUL3: Fix account creation by username & email (with temp password) (T390751)
  • 13:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76043 and previous config saved to /var/cache/conftool/dbconfig/20250513-133956-root.json
  • 13:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76042 and previous config saved to /var/cache/conftool/dbconfig/20250513-133947-root.json
  • 13:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76041 and previous config saved to /var/cache/conftool/dbconfig/20250513-133946-root.json
  • 13:37 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for SUL3: Fix account creation by username & email (with temp password) (T390751), SUL3: Fix account creation by username & email (with temp password) (T390751) (duration: 14m 07s)
  • 13:31 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, matmarex, d3r1ck01: Continuing with sync
  • 13:30 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, matmarex, d3r1ck01: Backport for SUL3: Fix account creation by username & email (with temp password) (T390751), SUL3: Fix account creation by username & email (with temp password) (T390751) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76039 and previous config saved to /var/cache/conftool/dbconfig/20250513-132545-root.json
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P76038 and previous config saved to /var/cache/conftool/dbconfig/20250513-132451-root.json
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76037 and previous config saved to /var/cache/conftool/dbconfig/20250513-132442-root.json
  • 13:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76036 and previous config saved to /var/cache/conftool/dbconfig/20250513-132441-root.json
  • 13:23 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for SUL3: Fix account creation by username & email (with temp password) (T390751), SUL3: Fix account creation by username & email (with temp password) (T390751)
  • 13:22 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for CirrusSearch: weighted tags mapping (during maintenance inflicted reindexing) (T389053) (duration: 15m 19s)
  • 13:16 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, pfischer: Continuing with sync
  • 13:15 XioNoX: cr3-eqsin> request vmhost reboot - T364092
  • 13:14 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, pfischer: Backport for CirrusSearch: weighted tags mapping (during maintenance inflicted reindexing) (T389053) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76035 and previous config saved to /var/cache/conftool/dbconfig/20250513-131040-root.json
  • 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P76034 and previous config saved to /var/cache/conftool/dbconfig/20250513-130945-root.json
  • 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76033 and previous config saved to /var/cache/conftool/dbconfig/20250513-130937-root.json
  • 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76032 and previous config saved to /var/cache/conftool/dbconfig/20250513-130935-root.json
  • 13:07 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for CirrusSearch: weighted tags mapping (during maintenance inflicted reindexing) (T389053)
  • 13:00 XioNoX: cr3-eqsin> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-23.4R2-S3.9.tgz - T364092
  • 12:59 volans: upgrading python3-wmflib fleetwide (except eqsin for now)
  • 12:57 XioNoX: cr3-eqsin - shutdown transit/peering BGP sessions - T364092
  • 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P76031 and previous config saved to /var/cache/conftool/dbconfig/20250513-125535-root.json
  • 12:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P76030 and previous config saved to /var/cache/conftool/dbconfig/20250513-125440-root.json
  • 12:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76029 and previous config saved to /var/cache/conftool/dbconfig/20250513-125431-root.json
  • 12:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76028 and previous config saved to /var/cache/conftool/dbconfig/20250513-125430-root.json
  • 12:53 XioNoX: cr3-eqsin - lower vrrp priority - T364092
  • 12:50 moritzm: trigger full planet import for Bookworm maps master T381565
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76027 and previous config saved to /var/cache/conftool/dbconfig/20250513-124910-root.json
  • 12:47 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 12:46 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 12:40 XioNoX: cr3-eqsin# set protocols bgp graceful-shutdown sender - T364092
  • 12:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P76026 and previous config saved to /var/cache/conftool/dbconfig/20250513-124029-root.json
  • 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76025 and previous config saved to /var/cache/conftool/dbconfig/20250513-123935-root.json
  • 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76024 and previous config saved to /var/cache/conftool/dbconfig/20250513-123926-root.json
  • 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76023 and previous config saved to /var/cache/conftool/dbconfig/20250513-123925-root.json
  • 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1256 future x3 hosts, to s8 T390530', diff saved to https://phabricator.wikimedia.org/P76022 and previous config saved to /var/cache/conftool/dbconfig/20250513-123917-marostegui.json
  • 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76021 and previous config saved to /var/cache/conftool/dbconfig/20250513-123631-root.json
  • 12:36 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr3-eqsin with reason: upgrade
  • 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76020 and previous config saved to /var/cache/conftool/dbconfig/20250513-123404-root.json
  • 12:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: cr3-eqsin upgrade, T364092]
  • 12:31 ayounsi@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: cr3-eqsin upgrade, T364092]
  • 12:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P76019 and previous config saved to /var/cache/conftool/dbconfig/20250513-122523-root.json
  • 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P76018 and previous config saved to /var/cache/conftool/dbconfig/20250513-122407-root.json
  • 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P76017 and previous config saved to /var/cache/conftool/dbconfig/20250513-122406-root.json
  • 12:22 moritzm: installing libapache2-mod-auth-openidc security updates
  • 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76016 and previous config saved to /var/cache/conftool/dbconfig/20250513-122126-root.json
  • 12:18 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76014 and previous config saved to /var/cache/conftool/dbconfig/20250513-121858-root.json
  • 12:18 volans: uploaded python3-wmflib_1.3.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia,trixie-wikimedia
  • 12:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76012 and previous config saved to /var/cache/conftool/dbconfig/20250513-121018-root.json
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P76011 and previous config saved to /var/cache/conftool/dbconfig/20250513-120902-root.json
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P76010 and previous config saved to /var/cache/conftool/dbconfig/20250513-120901-root.json
  • 12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1255 future x3 hosts, to s8 T390530', diff saved to https://phabricator.wikimedia.org/P76009 and previous config saved to /var/cache/conftool/dbconfig/20250513-120853-marostegui.json
  • 12:06 moritzm: installing ucf security updates
  • 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76008 and previous config saved to /var/cache/conftool/dbconfig/20250513-120621-root.json
  • 12:03 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76007 and previous config saved to /var/cache/conftool/dbconfig/20250513-120352-root.json
  • 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P76006 and previous config saved to /var/cache/conftool/dbconfig/20250513-115322-root.json
  • 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P76005 and previous config saved to /var/cache/conftool/dbconfig/20250513-115317-root.json
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76003 and previous config saved to /var/cache/conftool/dbconfig/20250513-115115-root.json
  • 11:48 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76002 and previous config saved to /var/cache/conftool/dbconfig/20250513-114847-root.json
  • 11:43 tchin@deploy1003: Finished deploy [airflow-dags/analytics@146dab1]: Deploying airflow artifacts for T384962 (duration: 02m 44s)
  • 11:41 tchin@deploy1003: Started deploy [airflow-dags/analytics@146dab1]: Deploying airflow artifacts for T384962
  • 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76001 and previous config saved to /var/cache/conftool/dbconfig/20250513-113816-root.json
  • 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76000 and previous config saved to /var/cache/conftool/dbconfig/20250513-113810-root.json
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75999 and previous config saved to /var/cache/conftool/dbconfig/20250513-113610-root.json
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75998 and previous config saved to /var/cache/conftool/dbconfig/20250513-113342-root.json
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2241 and db2242 future x3 hosts, to s8 T390530', diff saved to https://phabricator.wikimedia.org/P75996 and previous config saved to /var/cache/conftool/dbconfig/20250513-113138-marostegui.json
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75995 and previous config saved to /var/cache/conftool/dbconfig/20250513-112104-root.json
  • 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75994 and previous config saved to /var/cache/conftool/dbconfig/20250513-111836-root.json
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75993 and previous config saved to /var/cache/conftool/dbconfig/20250513-110559-root.json
  • 11:03 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75992 and previous config saved to /var/cache/conftool/dbconfig/20250513-110330-root.json
  • 10:58 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75991 and previous config saved to /var/cache/conftool/dbconfig/20250513-105053-root.json
  • 10:48 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:48 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75990 and previous config saved to /var/cache/conftool/dbconfig/20250513-104825-root.json
  • 10:46 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:43 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
  • 10:40 jnuche: train finished
  • 10:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
  • 10:38 jayme@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
  • 10:38 jayme@cumin1002: START - Cookbook sre.discovery.datacenter
  • 10:38 jayme@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
  • 10:38 jayme@cumin1002: START - Cookbook sre.discovery.datacenter
  • 10:37 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75989 and previous config saved to /var/cache/conftool/dbconfig/20250513-103548-root.json
  • 10:35 jnuche@deploy1003: Finished scap sync-world: Backport for Update for Parsoid's rename of XMLSerializer to XHtmlSerializer (T393983) (duration: 16m 38s)
  • 10:26 jnuche@deploy1003: matmarex, jnuche: Continuing with sync
  • 10:26 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:26 jnuche@deploy1003: matmarex, jnuche: Backport for Update for Parsoid's rename of XMLSerializer to XHtmlSerializer (T393983) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2027.codfw.wmnet,es1028.eqiad.wmnet with reason: Maintenance
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1028 es2027 T391921', diff saved to https://phabricator.wikimedia.org/P75988 and previous config saved to /var/cache/conftool/dbconfig/20250513-102455-marostegui.json
  • 10:18 jnuche@deploy1003: Started scap sync-world: Backport for Update for Parsoid's rename of XMLSerializer to XHtmlSerializer (T393983)
  • 10:14 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.1 refs T392171
  • 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
  • 10:04 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
  • 09:55 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:51 hnowlan: Route all PCS calls for enwiki articles starting with A via rest-gateway and without restbase
  • 09:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:49 moritzm: installing wget security updates
  • 09:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
  • 09:41 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
  • 09:38 moritzm: imported confd 0.16.0-1+deb13u0 to trixie-wikimedia T391083
  • 09:14 moritzm: installing nginx security updates
  • 09:11 jnuche@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.1 refs T392171
  • 09:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
  • 09:08 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
  • 09:00 vgutierrez: rolling reboot of eqiad load balancers to add E8/F8 interfaces - T393911 | T382017
  • 08:52 zabe@deploy1003: Finished scap sync-world: Backport for expanddblist: Add missing use statement (T393992) (duration: 11m 48s)
  • 08:45 zabe@deploy1003: zabe: Continuing with sync
  • 08:45 zabe@deploy1003: zabe: Backport for expanddblist: Add missing use statement (T393992) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:40 zabe@deploy1003: Started scap sync-world: Backport for expanddblist: Add missing use statement (T393992)
  • 08:35 godog: bounce thanos-query on titan1*
  • 08:34 XioNoX: pfw1-eqiad - delete specific system-services in favor of "any-service" T390052
  • 08:31 XioNoX: pfw1-codfw - delete specific system-services in favor of "any-service" T390052
  • 08:21 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
  • 08:12 moritzm: copied prometheus-rsyslog-exporter 1.0.0+git20221110-1 from bookworm-wikimedia to trixie-wikimedia T391083
  • 08:11 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 3856
  • 08:04 moritzm: imported python-wmflib 1.3.1+deb13u1 to trixie-wikimedia T391083
  • 07:55 XioNoX: delete all unterminated cables - T393188
  • 07:54 moritzm: imported python-wmflib 1.3.1+deb13u1 to trixie-wikimedia T391083
  • 07:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 07:38 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 07:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 07:35 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 07:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75987 and previous config saved to /var/cache/conftool/dbconfig/20250513-073145-root.json
  • 07:31 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75986 and previous config saved to /var/cache/conftool/dbconfig/20250513-072956-root.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75985 and previous config saved to /var/cache/conftool/dbconfig/20250513-071639-root.json
  • 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75984 and previous config saved to /var/cache/conftool/dbconfig/20250513-071451-root.json
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75983 and previous config saved to /var/cache/conftool/dbconfig/20250513-070135-root.json
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75982 and previous config saved to /var/cache/conftool/dbconfig/20250513-065946-root.json
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75981 and previous config saved to /var/cache/conftool/dbconfig/20250513-064629-root.json
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75980 and previous config saved to /var/cache/conftool/dbconfig/20250513-064440-root.json
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75979 and previous config saved to /var/cache/conftool/dbconfig/20250513-063123-root.json
  • 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75978 and previous config saved to /var/cache/conftool/dbconfig/20250513-062935-root.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75977 and previous config saved to /var/cache/conftool/dbconfig/20250513-061618-root.json
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75976 and previous config saved to /var/cache/conftool/dbconfig/20250513-061430-root.json
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75975 and previous config saved to /var/cache/conftool/dbconfig/20250513-060113-root.json
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75974 and previous config saved to /var/cache/conftool/dbconfig/20250513-055924-root.json
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75973 and previous config saved to /var/cache/conftool/dbconfig/20250513-054607-root.json
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75972 and previous config saved to /var/cache/conftool/dbconfig/20250513-054418-root.json
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75971 and previous config saved to /var/cache/conftool/dbconfig/20250513-053102-root.json
  • 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75970 and previous config saved to /var/cache/conftool/dbconfig/20250513-052913-root.json
  • 05:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1031.eqiad.wmnet with reason: Maintenance
  • 05:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2029.codfw.wmnet with reason: Maintenance
  • 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1031 es2029 T391921', diff saved to https://phabricator.wikimedia.org/P75969 and previous config saved to /var/cache/conftool/dbconfig/20250513-051617-marostegui.json
  • 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.25 (duration: 04m 17s)
  • 02:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T392806)', diff saved to https://phabricator.wikimedia.org/P75968 and previous config saved to /var/cache/conftool/dbconfig/20250513-025634-fceratto.json
  • 02:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P75967 and previous config saved to /var/cache/conftool/dbconfig/20250513-024127-fceratto.json
  • 02:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P75966 and previous config saved to /var/cache/conftool/dbconfig/20250513-022619-fceratto.json
  • 02:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T392806)', diff saved to https://phabricator.wikimedia.org/P75965 and previous config saved to /var/cache/conftool/dbconfig/20250513-021112-fceratto.json
  • 02:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T392806)', diff saved to https://phabricator.wikimedia.org/P75964 and previous config saved to /var/cache/conftool/dbconfig/20250513-020415-fceratto.json
  • 02:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 02:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T392806)', diff saved to https://phabricator.wikimedia.org/P75963 and previous config saved to /var/cache/conftool/dbconfig/20250513-020349-fceratto.json
  • 01:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P75962 and previous config saved to /var/cache/conftool/dbconfig/20250513-014841-fceratto.json
  • 01:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P75961 and previous config saved to /var/cache/conftool/dbconfig/20250513-013334-fceratto.json
  • 01:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T392806)', diff saved to https://phabricator.wikimedia.org/P75960 and previous config saved to /var/cache/conftool/dbconfig/20250513-011827-fceratto.json
  • 01:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T392806)', diff saved to https://phabricator.wikimedia.org/P75959 and previous config saved to /var/cache/conftool/dbconfig/20250513-011026-fceratto.json
  • 01:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 01:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T392806)', diff saved to https://phabricator.wikimedia.org/P75958 and previous config saved to /var/cache/conftool/dbconfig/20250513-010959-fceratto.json
  • 00:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P75957 and previous config saved to /var/cache/conftool/dbconfig/20250513-005451-fceratto.json
  • 00:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P75956 and previous config saved to /var/cache/conftool/dbconfig/20250513-003944-fceratto.json
  • 00:32 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_codfw
  • 00:31 sukhe: run agent on A:lvs-eqiad to re-enable puppet: T393911
  • 00:30 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_codfw
  • 00:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T392806)', diff saved to https://phabricator.wikimedia.org/P75955 and previous config saved to /var/cache/conftool/dbconfig/20250513-002436-fceratto.json
  • 00:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T392806)', diff saved to https://phabricator.wikimedia.org/P75954 and previous config saved to /var/cache/conftool/dbconfig/20250513-001736-fceratto.json
  • 00:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 00:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T392806)', diff saved to https://phabricator.wikimedia.org/P75953 and previous config saved to /var/cache/conftool/dbconfig/20250513-001704-fceratto.json
  • 00:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P75952 and previous config saved to /var/cache/conftool/dbconfig/20250513-000157-fceratto.json

2025-05-12

  • 23:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P75951 and previous config saved to /var/cache/conftool/dbconfig/20250512-234650-fceratto.json
  • 23:44 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS bullseye
  • 23:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T392806)', diff saved to https://phabricator.wikimedia.org/P75950 and previous config saved to /var/cache/conftool/dbconfig/20250512-233142-fceratto.json
  • 23:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T392806)', diff saved to https://phabricator.wikimedia.org/P75949 and previous config saved to /var/cache/conftool/dbconfig/20250512-232504-fceratto.json
  • 23:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 23:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75948 and previous config saved to /var/cache/conftool/dbconfig/20250512-232437-fceratto.json
  • 23:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P75946 and previous config saved to /var/cache/conftool/dbconfig/20250512-230930-fceratto.json
  • 22:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P75945 and previous config saved to /var/cache/conftool/dbconfig/20250512-225422-fceratto.json
  • 22:51 ladsgroup@deploy1003: Finished scap sync-world: Backport for objectcache: Cast explicitly to integer (T393879) (duration: 11m 33s)
  • 22:44 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 22:44 ladsgroup@deploy1003: ladsgroup: Backport for objectcache: Cast explicitly to integer (T393879) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:39 ladsgroup@deploy1003: Started scap sync-world: Backport for objectcache: Cast explicitly to integer (T393879)
  • 22:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75944 and previous config saved to /var/cache/conftool/dbconfig/20250512-223915-fceratto.json
  • 22:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75943 and previous config saved to /var/cache/conftool/dbconfig/20250512-223131-fceratto.json
  • 22:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 22:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T392806)', diff saved to https://phabricator.wikimedia.org/P75942 and previous config saved to /var/cache/conftool/dbconfig/20250512-223103-fceratto.json
  • 22:16 rzl: rzl@titan1002:~$ sudo systemctl restart thanos-query
  • 22:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P75941 and previous config saved to /var/cache/conftool/dbconfig/20250512-221556-fceratto.json
  • 22:09 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts lvs3009.esams.wmnet
  • 22:08 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3009.esams.wmnet
  • 22:07 cwhite: restart thanos-query on titan1001
  • 22:02 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 22:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P75940 and previous config saved to /var/cache/conftool/dbconfig/20250512-220049-fceratto.json
  • 21:59 robh@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs3009.esams.wmnet
  • 21:58 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3009.esams.wmnet
  • 21:58 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 21:52 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts lvs3009.esams.wmnet
  • 21:48 sbassett: Deployed security fixes 03, 04 and 05 for T392341
  • 21:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T392806)', diff saved to https://phabricator.wikimedia.org/P75939 and previous config saved to /var/cache/conftool/dbconfig/20250512-214542-fceratto.json
  • 21:42 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3009.esams.wmnet
  • 21:42 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS bullseye
  • 21:41 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp2029.codfw.wmnet with reason: Potential failed memory - T393968
  • 21:40 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2029.codfw.wmnet with reason: Potential failed memory - T393968
  • 21:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T392806)', diff saved to https://phabricator.wikimedia.org/P75938 and previous config saved to /var/cache/conftool/dbconfig/20250512-213731-fceratto.json
  • 21:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 21:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T392806)', diff saved to https://phabricator.wikimedia.org/P75937 and previous config saved to /var/cache/conftool/dbconfig/20250512-213704-fceratto.json
  • 21:33 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts lvs3009.esams.wmnet
  • 21:31 sbassett: Removed mitigation for T390887 and T393367
  • 21:31 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_codfw
  • 21:31 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_codfw
  • 21:31 denisse: Testing rsyslog_8.2504.0-1~bpo12+1 on centrallog1002 - T383309
  • 21:28 ryankemper@cumin2002: conftool action : set/pooled=no; selector: name=cirrussearch2091.codfw.wmnet|cirrussearch2055.codfw.wmnet|cirrussearch2113.codfw.wmnet|cirrussearch1118.eqiad.wmnet|elastic1080.eqiad.wmnet|elastic1057.eqiad.wmnet|elastic1059.eqiad.wmnet|elastic1083.eqiad.wmnet|elastic1076.eqiad.wmnet
  • 21:22 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3009.esams.wmnet
  • 21:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P75936 and previous config saved to /var/cache/conftool/dbconfig/20250512-212157-fceratto.json
  • 21:21 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts lvs3009.esams.wmnet
  • 21:21 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3009.esams.wmnet
  • 21:17 tgr@deploy1003: Finished scap sync-world: Backport for multiversion: Move remaining dblist helper to WmfConfig class (duration: 13m 25s)
  • 21:16 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet
  • 21:10 tgr@deploy1003: tgr, krinkle: Continuing with sync
  • 21:08 tgr@deploy1003: tgr, krinkle: Backport for multiversion: Move remaining dblist helper to WmfConfig class synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P75935 and previous config saved to /var/cache/conftool/dbconfig/20250512-210650-fceratto.json
  • 21:03 tgr@deploy1003: Started scap sync-world: Backport for multiversion: Move remaining dblist helper to WmfConfig class
  • 20:53 tgr@deploy1003: Finished scap sync-world: Backport for mc: remove unused "memcached-pecl" definition from wgObjectCaches (T371378) (duration: 17m 27s)
  • 20:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T392806)', diff saved to https://phabricator.wikimedia.org/P75934 and previous config saved to /var/cache/conftool/dbconfig/20250512-205143-fceratto.json
  • 20:46 tgr@deploy1003: tgr, krinkle: Continuing with sync
  • 20:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T392806)', diff saved to https://phabricator.wikimedia.org/P75933 and previous config saved to /var/cache/conftool/dbconfig/20250512-204336-fceratto.json
  • 20:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 20:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T392806)', diff saved to https://phabricator.wikimedia.org/P75932 and previous config saved to /var/cache/conftool/dbconfig/20250512-204253-fceratto.json
  • 20:40 tgr@deploy1003: tgr, krinkle: Backport for mc: remove unused "memcached-pecl" definition from wgObjectCaches (T371378) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:35 tgr@deploy1003: Started scap sync-world: Backport for mc: remove unused "memcached-pecl" definition from wgObjectCaches (T371378)
  • 20:31 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts lvs3009.esams.wmnet
  • 20:30 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3009.esams.wmnet
  • 20:30 dr0ptp4kt@deploy1003: Finished scap sync-world: Backport for Stream config for edge uniques on prod cluster (T391959) (duration: 18m 53s)
  • 20:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P75931 and previous config saved to /var/cache/conftool/dbconfig/20250512-202746-fceratto.json
  • 20:23 dr0ptp4kt@deploy1003: dr0ptp4kt: Continuing with sync
  • 20:16 dr0ptp4kt@deploy1003: dr0ptp4kt: Backport for Stream config for edge uniques on prod cluster (T391959) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:14 sukhe@dns1004: END - running authdns-update
  • 20:13 sukhe@dns1004: START - running authdns-update
  • 20:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P75930 and previous config saved to /var/cache/conftool/dbconfig/20250512-201240-fceratto.json
  • 20:11 dr0ptp4kt@deploy1003: Started scap sync-world: Backport for Stream config for edge uniques on prod cluster (T391959)
  • 20:11 bearloga@deploy1003: Finished deploy [airflow-dags/analytics_product@17f8417]: (no justification provided) (duration: 00m 53s)
  • 20:10 bearloga@deploy1003: Started deploy [airflow-dags/analytics_product@17f8417]: (no justification provided)
  • 19:58 bking@dns1004: START - running authdns-update
  • 19:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T392806)', diff saved to https://phabricator.wikimedia.org/P75929 and previous config saved to /var/cache/conftool/dbconfig/20250512-195732-fceratto.json
  • 19:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) search-chi.svc.eqiad.wmnet on all recursors
  • 19:49 bking@cumin2002: START - Cookbook sre.dns.wipe-cache search-chi.svc.eqiad.wmnet on all recursors
  • 19:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T392806)', diff saved to https://phabricator.wikimedia.org/P75928 and previous config saved to /var/cache/conftool/dbconfig/20250512-194933-fceratto.json
  • 19:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 19:40 bking@dns1004: START - running authdns-update
  • 19:34 bking@dns1004: START - running authdns-update
  • 19:20 jgleeson: payments-wiki upgraded from fac09775 to 92a8cbb8
  • 18:46 dwisehaupt@dns1004: END - running authdns-update
  • 18:45 dwisehaupt@dns1004: START - running authdns-update
  • 18:37 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_ulsfo
  • 18:35 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_ulsfo
  • 18:01 cmooney@dns2005: END - running authdns-update
  • 18:00 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:59 cmooney@dns2005: START - running authdns-update
  • 17:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1070.eqiad.wmnet with OS bullseye
  • 17:58 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 17:51 cmooney@dns2005: START - running authdns-update
  • 17:38 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:38 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate dns recrods for new codfw switches - cmooney@cumin1002"
  • 17:38 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate dns recrods for new codfw switches - cmooney@cumin1002"
  • 17:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 17:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1070.eqiad.wmnet with reason: host reimage
  • 17:28 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1070.eqiad.wmnet with reason: host reimage
  • 17:25 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:25 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 17:16 krinkle@deploy1003: Finished scap sync-world: Backport for tests: Remove one-off test-only getDblistsUsedInSettings() and isWikiFamily(), multiversion: Update readDbListFile() calls from alias to WmfConfig, tests: Replace array_keys(wikiversions.json) with all.dblist (duration: 17m 05s)
  • 17:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1070
  • 17:10 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1070
  • 17:10 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1070.eqiad.wmnet with OS bullseye
  • 17:09 krinkle@deploy1003: krinkle: Continuing with sync
  • 17:04 krinkle@deploy1003: krinkle: Backport for tests: Remove one-off test-only getDblistsUsedInSettings() and isWikiFamily(), multiversion: Update readDbListFile() calls from alias to WmfConfig, tests: Replace array_keys(wikiversions.json) with all.dblist synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:59 krinkle@deploy1003: Started scap sync-world: Backport for tests: Remove one-off test-only getDblistsUsedInSettings() and isWikiFamily(), multiversion: Update readDbListFile() calls from alias to WmfConfig, tests: Replace array_keys(wikiversions.json) with all.dblist
  • 16:52 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 16:52 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:43 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 16:43 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:34 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.10.1 - volans@cumin1003
  • 16:33 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.10.1 - volans@cumin1003
  • 16:32 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1002.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
  • 16:32 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin1002.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
  • 16:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1070 to cirrussearch1070
  • 16:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1070
  • 16:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1070
  • 16:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1070 on all recursors
  • 16:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1070 on all recursors
  • 16:26 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:26 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1070 to cirrussearch1070 - bking@cumin2002"
  • 16:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1070 to cirrussearch1070 - bking@cumin2002"
  • 16:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1069.eqiad.wmnet with OS bullseye
  • 16:17 jelto: update helm311 and helm317 on contint1002 contint2002 - T387548
  • 16:16 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:16 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1070 to cirrussearch1070
  • 16:16 dwisehaupt@dns1004: END - running authdns-update
  • 16:15 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:15 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:14 dwisehaupt@dns1004: START - running authdns-update
  • 16:05 jelto: update helm311 and helm317 on deploy1003 - T387548
  • 16:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1069.eqiad.wmnet with reason: host reimage
  • 16:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T392806)', diff saved to https://phabricator.wikimedia.org/P75925 and previous config saved to /var/cache/conftool/dbconfig/20250512-160230-fceratto.json
  • 15:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1069.eqiad.wmnet with reason: host reimage
  • 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P75924 and previous config saved to /var/cache/conftool/dbconfig/20250512-154723-fceratto.json
  • 15:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1069
  • 15:44 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1069
  • 15:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1069.eqiad.wmnet with OS bullseye
  • 15:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1069 to cirrussearch1069
  • 15:42 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1069
  • 15:41 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1069
  • 15:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1069 on all recursors
  • 15:41 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1069 on all recursors
  • 15:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:41 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1069 to cirrussearch1069 - bking@cumin2002"
  • 15:40 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1069 to cirrussearch1069 - bking@cumin2002"
  • 15:35 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:35 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1069 to cirrussearch1069
  • 15:34 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_ulsfo
  • 15:34 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_ulsfo
  • 15:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P75922 and previous config saved to /var/cache/conftool/dbconfig/20250512-153216-fceratto.json
  • 15:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1068.eqiad.wmnet with OS bullseye
  • 15:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T392806)', diff saved to https://phabricator.wikimedia.org/P75921 and previous config saved to /var/cache/conftool/dbconfig/20250512-151709-fceratto.json
  • 15:13 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
  • 15:13 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
  • 15:12 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
  • 15:12 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
  • 15:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T392806)', diff saved to https://phabricator.wikimedia.org/P75920 and previous config saved to /var/cache/conftool/dbconfig/20250512-151020-fceratto.json
  • 15:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 15:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:05 volans: upgraded spicerack to v10.2.0 on cumin1002
  • 15:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T392806)', diff saved to https://phabricator.wikimedia.org/P75919 and previous config saved to /var/cache/conftool/dbconfig/20250512-150454-fceratto.json
  • 15:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1068.eqiad.wmnet with reason: host reimage
  • 14:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1068.eqiad.wmnet with reason: host reimage
  • 14:58 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
  • 14:58 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
  • 14:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 60 hosts
  • 14:57 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
  • 14:57 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 60 hosts
  • 14:57 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
  • 14:54 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on 60 hosts with reason: surpress CirrusSearchNodeIndexingNotIncreasing alerts with CODFW is depooled
  • 14:50 dancy@deploy1003: Installation of scap version "4.163.0" completed for 2 hosts
  • 14:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P75918 and previous config saved to /var/cache/conftool/dbconfig/20250512-144948-fceratto.json
  • 14:48 dancy@deploy1003: Installing scap version "4.163.0" for 2 host(s)
  • 14:44 jelto: update helm311 and helm317 on deploy2002 - T387548
  • 14:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1068
  • 14:42 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1068
  • 14:42 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1068.eqiad.wmnet with OS bullseye
  • 14:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1068 to cirrussearch1068
  • 14:40 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1068
  • 14:39 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7001.magru.wmnet
  • 14:39 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7001.magru.wmnet
  • 14:39 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
  • 14:35 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P75917 and previous config saved to /var/cache/conftool/dbconfig/20250512-143441-fceratto.json
  • 14:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1068
  • 14:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1068 on all recursors
  • 14:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1068 on all recursors
  • 14:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:27 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1068 to cirrussearch1068 - bking@cumin2002"
  • 14:27 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1068 to cirrussearch1068 - bking@cumin2002"
  • 14:23 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:23 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:23 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:23 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1068 to cirrussearch1068
  • 14:22 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1003.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
  • 14:21 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin1003.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
  • 14:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T392806)', diff saved to https://phabricator.wikimedia.org/P75916 and previous config saved to /var/cache/conftool/dbconfig/20250512-141933-fceratto.json
  • 14:17 tgr@deploy1003: Finished scap sync-world: Backport for Improve session logging (T393038) (duration: 17m 24s)
  • 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T392806)', diff saved to https://phabricator.wikimedia.org/P75915 and previous config saved to /var/cache/conftool/dbconfig/20250512-141139-fceratto.json
  • 14:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T392806)', diff saved to https://phabricator.wikimedia.org/P75914 and previous config saved to /var/cache/conftool/dbconfig/20250512-141114-fceratto.json
  • 14:10 tgr@deploy1003: tgr: Continuing with sync
  • 14:04 tgr@deploy1003: tgr: Backport for Improve session logging (T393038) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:04 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:01 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 13:59 tgr@deploy1003: Started scap sync-world: Backport for Improve session logging (T393038)
  • 13:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P75913 and previous config saved to /var/cache/conftool/dbconfig/20250512-135607-fceratto.json
  • 13:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:52 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
  • 13:51 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: Testing in progress
  • 13:45 hashar@deploy1003: Finished deploy [integration/docroot@21bebf5]: build: Updating mediawiki/mediawiki-codesniffer to 47.0.0 (duration: 00m 11s)
  • 13:45 hashar@deploy1003: Started deploy [integration/docroot@21bebf5]: build: Updating mediawiki/mediawiki-codesniffer to 47.0.0
  • 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P75912 and previous config saved to /var/cache/conftool/dbconfig/20250512-134100-fceratto.json
  • 13:34 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:34 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 13:33 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for htmlform: Fix rendering contents for cloner fields (T393790) (duration: 14m 50s)
  • 13:29 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:29 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T392806)', diff saved to https://phabricator.wikimedia.org/P75911 and previous config saved to /var/cache/conftool/dbconfig/20250512-132552-fceratto.json
  • 13:25 lucaswerkmeister-wmde@deploy1003: stran, lucaswerkmeister-wmde: Continuing with sync
  • 13:22 lucaswerkmeister-wmde@deploy1003: stran, lucaswerkmeister-wmde: Backport for htmlform: Fix rendering contents for cloner fields (T393790) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:18 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for htmlform: Fix rendering contents for cloner fields (T393790)
  • 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T392806)', diff saved to https://phabricator.wikimedia.org/P75910 and previous config saved to /var/cache/conftool/dbconfig/20250512-131756-fceratto.json
  • 13:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75909 and previous config saved to /var/cache/conftool/dbconfig/20250512-131731-fceratto.json
  • 13:16 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 13:15 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 13:14 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:14 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:12 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 13:12 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 13:08 tgr@deploy1003: Finished scap sync-world: Backport for Get rid of ancient session_name call (T124371), Do not use $_SESSION (T29887 T124371), Set wgPHPSessionHandling to 'warn' (T362324) (duration: 32m 12s)
  • 13:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P75908 and previous config saved to /var/cache/conftool/dbconfig/20250512-130225-fceratto.json
  • 13:01 elukey: `puppet ca destroy thanos.discovery.wmnet` on puppetmaster1001 - old cert not used anymore
  • 12:59 tgr@deploy1003: tgr, mszabo: Continuing with sync
  • 12:52 tgr@deploy1003: tgr, mszabo: Backport for Get rid of ancient session_name call (T124371), Do not use $_SESSION (T29887 T124371), Set wgPHPSessionHandling to 'warn' (T362324) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P75907 and previous config saved to /var/cache/conftool/dbconfig/20250512-124718-fceratto.json
  • 12:36 tgr@deploy1003: Started scap sync-world: Backport for Get rid of ancient session_name call (T124371), Do not use $_SESSION (T29887 T124371), Set wgPHPSessionHandling to 'warn' (T362324)
  • 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75906 and previous config saved to /var/cache/conftool/dbconfig/20250512-123211-fceratto.json
  • 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75905 and previous config saved to /var/cache/conftool/dbconfig/20250512-122626-fceratto.json
  • 12:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T392806)', diff saved to https://phabricator.wikimedia.org/P75904 and previous config saved to /var/cache/conftool/dbconfig/20250512-122600-fceratto.json
  • 12:25 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 12:24 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 12:18 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 12:18 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 12:18 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P75903 and previous config saved to /var/cache/conftool/dbconfig/20250512-121053-fceratto.json
  • 11:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P75902 and previous config saved to /var/cache/conftool/dbconfig/20250512-115545-fceratto.json
  • 11:45 jgleeson: civicrm upgraded from dc096105 to 852c6ee6
  • 11:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T392806)', diff saved to https://phabricator.wikimedia.org/P75901 and previous config saved to /var/cache/conftool/dbconfig/20250512-114038-fceratto.json
  • 11:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T392806)', diff saved to https://phabricator.wikimedia.org/P75900 and previous config saved to /var/cache/conftool/dbconfig/20250512-113350-fceratto.json
  • 11:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T392806)', diff saved to https://phabricator.wikimedia.org/P75899 and previous config saved to /var/cache/conftool/dbconfig/20250512-113324-fceratto.json
  • 11:25 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
  • 11:22 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
  • 11:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P75898 and previous config saved to /var/cache/conftool/dbconfig/20250512-111817-fceratto.json
  • 11:17 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
  • 11:16 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
  • 11:12 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.0 - volans@cumin1003
  • 11:11 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.0 - volans@cumin1003
  • 11:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 11:08 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 11:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 11:03 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1088.eqiad.wmnet with OS bullseye
  • 11:03 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 11:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P75897 and previous config saved to /var/cache/conftool/dbconfig/20250512-110310-fceratto.json
  • 10:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T392806)', diff saved to https://phabricator.wikimedia.org/P75896 and previous config saved to /var/cache/conftool/dbconfig/20250512-104803-fceratto.json
  • 10:47 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
  • 10:44 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
  • 10:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T392806)', diff saved to https://phabricator.wikimedia.org/P75895 and previous config saved to /var/cache/conftool/dbconfig/20250512-104116-fceratto.json
  • 10:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:32 XioNoX: delete some exterminated cables from Netbox - T393188
  • 10:31 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1088.eqiad.wmnet with OS bullseye
  • 10:22 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 10:22 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 10:08 Ammar: Ran fixStuckGlobalRename.php for T393877
  • 09:36 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1007.eqiad.wmnet with OS bullseye
  • 09:25 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
  • 09:04 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thanos-fe[2001-2003].codfw.wmnet
  • 09:04 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:04 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-fe[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1002"
  • 09:03 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-fe[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1002"
  • 09:00 mvernon@cumin1002: START - Cookbook sre.dns.netbox
  • 08:55 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:55 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:50 mvernon@cumin1002: START - Cookbook sre.hosts.decommission for hosts thanos-fe[2001-2003].codfw.wmnet
  • 08:49 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts thanos-be[2001-2003].codfw.wmnet
  • 08:48 mvernon@cumin1002: START - Cookbook sre.hosts.decommission for hosts thanos-be[2001-2003].codfw.wmnet
  • 08:47 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=1) rolling restart_daemons on P{thanos-fe200[4-7]*} or P{thanos-fe1*} and (A:thanos-fe or A:thanos-fe-codfw or A:thanos-fe-eqiad)
  • 08:43 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on P{thanos-fe200[4-7]*} or P{thanos-fe1*} and (A:thanos-fe or A:thanos-fe-codfw or A:thanos-fe-eqiad)
  • 08:39 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=1) rolling restart_daemons on A:thanos-fe
  • 08:39 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 08:35 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:34 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:33 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:31 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:29 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=apus,name=apus-fe1003.eqiad.wmnet
  • 08:29 mvernon@cumin1002: conftool action : set/weight=40; selector: service=apus,name=apus-fe1003.eqiad.wmnet
  • 08:10 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:09 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:57 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Cicalese out of all services on: 2402 hosts
  • 07:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:12 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Debt out of all services on: 2402 hosts

2025-05-11

  • 22:55 tchin@deploy1003: Finished deploy [airflow-dags/analytics@301c74b]: Deploying airflow artifacts for T384962 (duration: 02m 01s)
  • 22:54 tchin@deploy1003: Started deploy [airflow-dags/analytics@301c74b]: Deploying airflow artifacts for T384962

2025-05-10

  • 00:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 00:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 00:41 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 00:41 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 00:41 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 00:41 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 00:23 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 00:22 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 00:22 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 00:22 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 00:22 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 00:22 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 00:16 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 00:16 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 00:16 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 00:15 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 00:15 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 00:15 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply

2025-05-09

  • 23:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1007.eqiad.wmnet with OS bullseye
  • 22:10 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
  • 22:09 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:03 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:57 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1007.eqiad.wmnet with OS bullseye
  • 21:05 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
  • 20:53 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from elastic1068 to cirrussearch1068
  • 20:52 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:50 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1068 to cirrussearch1068
  • 20:46 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on elastic1054.eqiad.wmnet with reason: downtime prior to decom
  • 20:39 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1006.eqiad.wmnet with OS bullseye
  • 20:39 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 20:35 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 20:30 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1053.eqiad.wmnet with OS bullseye
  • 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1053
  • 20:23 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1053
  • 20:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1053.eqiad.wmnet with OS bullseye
  • 20:20 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1053.eqiad.wmnet with OS bullseye
  • 20:18 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1006.eqiad.wmnet with reason: host reimage
  • 20:15 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1006.eqiad.wmnet with reason: host reimage
  • 20:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1053
  • 20:14 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1053
  • 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1053.eqiad.wmnet with OS bullseye
  • 20:11 jgreen@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:09 jgreen@cumin1002: START - Cookbook sre.dns.netbox
  • 20:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1053 to cirrussearch1053
  • 20:07 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1053
  • 20:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1053
  • 20:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1053 on all recursors
  • 20:05 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1053 on all recursors
  • 20:05 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:05 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1053 to cirrussearch1053 - bking@cumin2002"
  • 20:04 inflatador: bking@cumin2002 removed unrelated `fran1001` DNS record during a rename
  • 20:03 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1053 to cirrussearch1053 - bking@cumin2002"
  • 20:00 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:00 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1053 to cirrussearch1053
  • 19:55 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1006.eqiad.wmnet with OS bullseye
  • 19:50 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 19:50 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 19:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1006.eqiad.wmnet with OS bullseye
  • 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 19:45 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 19:24 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
  • 19:06 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:56 ryankemper@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs1012.eqiad.wmnet|wdqs1013.eqiad.wmnet|wdqs1014.eqiad.wmnet|wdqs1015.eqiad.wmnet|wdqs2007.codfw.wmnet|wdqs2010.codfw.wmnet|wdqs2011.codfw.wmnet|wdqs2012.codfw.wmnet|wdqs2013.codfw.wmnet
  • 18:28 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1006.eqiad.wmnet with OS bullseye
  • 18:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:24 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe1007
  • 18:24 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:23 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe1007
  • 18:23 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:23 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1007 - vriley@cumin1002"
  • 18:22 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1007 - vriley@cumin1002"
  • 18:19 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 18:17 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:16 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe1006
  • 18:15 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe1006
  • 18:14 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:14 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1006 - vriley@cumin1002"
  • 18:14 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1006 - vriley@cumin1002"
  • 18:11 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 17:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:21 krinkle@deploy1003: Finished scap sync-world: Backport for noc: Fix "Class MWMultiVersion not found" in wiki.php (duration: 13m 42s)
  • 16:20 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@bfb9c63]: bump image suggestions to 1.6.0 (duration: 01m 49s)
  • 16:19 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@bfb9c63]: bump image suggestions to 1.6.0
  • 16:14 krinkle@deploy1003: krinkle: Continuing with sync
  • 16:14 krinkle@deploy1003: krinkle: Backport for noc: Fix "Class MWMultiVersion not found" in wiki.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:07 krinkle@deploy1003: Started scap sync-world: Backport for noc: Fix "Class MWMultiVersion not found" in wiki.php
  • 15:57 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from elastic1053 to cirrussearch1053
  • 15:57 bking@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:57 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:57 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1053 to cirrussearch1053
  • 15:49 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.rename (exit_code=93) from elastic1053 to cirrussearch1053
  • 15:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1053 to cirrussearch1053
  • 15:41 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) (duration: 15m 22s)
  • 15:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:34 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
  • 15:32 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:25 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641)
  • 14:30 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 38s)
  • 14:29 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
  • 14:25 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 31s)
  • 14:24 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
  • 14:21 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) (duration: 14m 12s)
  • 14:15 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
  • 14:14 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:07 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641)
  • 13:36 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 04m 10s)
  • 13:32 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
  • 12:51 godog: upload prometheus-blackbox-exporter 0.26.0-0~bpo12+1 to bookworm-wikimedia - T385022
  • 11:45 taavi: update toolforge arc-enabled exim4 packages (component/exim4-arc) to latest in debian 12 T356171
  • 11:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:16 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 11:02 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1005.eqiad.wmnet with OS bullseye
  • 11:02 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
  • 10:58 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
  • 10:40 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1005.eqiad.wmnet with reason: host reimage
  • 10:37 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1005.eqiad.wmnet with reason: host reimage
  • 10:20 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1005.eqiad.wmnet with OS bullseye
  • 09:50 moritzm: imported debmonitor-client 0.4.0-3+deb13u1 for trixie-wikimedia T391083
  • 09:05 zabe: zabe@deploy1003:~$ mwscript-k8s --comment="T393761" --follow -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=amwiki --logwiki=metawiki 'Jeroen' 'Retireduser-vfs199s31yvbtxsfmygg'
  • 09:03 zabe: zabe@deploy1003:~$ mwscript-k8s --comment="T393372" --follow -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwikibooks --logwiki=metawiki 'Adityaindumdum' 'Renamed user a71c8354dc822ea0d3aab24d1ce886f02c25fe91'
  • 08:17 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2013.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 08:10 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1003.eqiad.wmnet with reason: Release v0.9.0 - volans@cumin2002
  • 08:09 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin1003.eqiad.wmnet with reason: Release v0.9.0 - volans@cumin2002
  • 07:57 moritzm: imported puppet-agent 7.23.0-1+wmf13u1 to component/puppet7 for trixie-wikimedia T392790
  • 07:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2013.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 07:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2012.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 07:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 07:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1013.eqiad.wmnet -> wdqs1015.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 07:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1014.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 07:15 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1013.eqiad.wmnet -> wdqs1015.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1014.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 06:26 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2012.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 06:26 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 06:26 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 06:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 06:10 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2010.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 06:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1013.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 05:30 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on cumin1003.eqiad.wmnet with reason: WIP new Bookworm host
  • 05:12 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1013.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 05:12 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2010.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 05:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 04:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 04:03 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 06s)
  • 04:03 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
  • 04:03 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
  • 04:03 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
  • 04:02 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
  • 04:02 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
  • 04:02 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
  • 04:02 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
  • 04:01 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
  • 04:01 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
  • 00:07 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_drmrs
  • 00:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_drmrs

2025-05-08

  • 23:37 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2012.codfw.wmnet
  • 23:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1005.eqiad.wmnet with OS bullseye
  • 23:35 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2011.codfw.wmnet
  • 23:35 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1014.eqiad.wmnet
  • 23:34 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2010.codfw.wmnet
  • 23:30 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1015.eqiad.wmnet
  • 23:26 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1013.eqiad.wmnet
  • 23:22 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2013.codfw.wmnet
  • 23:19 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2007.codfw.wmnet
  • 23:06 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1012.eqiad.wmnet
  • 22:28 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1012.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 22:27 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 22:17 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1005.eqiad.wmnet with OS bullseye
  • 22:05 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2047.codfw.wmnet with OS bookworm
  • 21:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2048.codfw.wmnet with OS bookworm
  • 21:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:50 tzatziki: removing 1 file for legal compliance
  • 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2013.codfw.wmnet
  • 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2012.codfw.wmnet
  • 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2011.codfw.wmnet
  • 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2010.codfw.wmnet
  • 21:47 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe1005
  • 21:45 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe1005
  • 21:44 tzatziki: removing 3 files for legal compliance
  • 21:44 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1015.eqiad.wmnet
  • 21:44 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:44 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1005 - vriley@cumin1002"
  • 21:44 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1014.eqiad.wmnet
  • 21:43 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1005 - vriley@cumin1002"
  • 21:43 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1013.eqiad.wmnet
  • 21:43 ryankemper: T388134 Cutover completed about an hour ago. Metrics look good; we're in the process of shifting over some of the old `wdqs` hosts to `wdqs-main` to increase capacity
  • 21:40 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:38 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on wdqs[2007,2013].codfw.wmnet,wdqs[1012-1014].eqiad.wmnet with reason: bringing hosts online with a data transfer
  • 21:35 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1247.eqiad.wmnet with reason: Host has crashed - T393612
  • 21:34 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:33 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2007.codfw.wmnet
  • 21:29 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1012.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 21:29 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1012.eqiad.wmnet
  • 20:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_drmrs
  • 20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_drmrs
  • 20:54 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp50[19-24].eqsin.wmnet} and A:cp
  • 20:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:49 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp50[27-32].eqsin.wmnet} and A:cp
  • 20:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2047.codfw.wmnet with reason: host reimage
  • 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2048.codfw.wmnet with reason: host reimage
  • 20:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2047.codfw.wmnet with reason: host reimage
  • 20:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2048.codfw.wmnet with reason: host reimage
  • 20:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
  • 20:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
  • 20:16 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-fe1003.eqiad.wmnet with OS bookworm
  • 20:16 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 20:14 ryankemper: T388134 Beginning cutover of query.wikidata.org from `wdqs` to `wdqs-main`. Starting to see requests increase on wdqs-main (and decrease on wdqs) as expected. Rolling change to rest of cp text hosts. Traffic should be fully moved over in ~20 mins
  • 20:03 swfrench@deploy1003: Stopping before sync operations
  • 20:03 swfrench@deploy1003: Started scap sync-world: Non-deploy scap run to switch mw-script/main to PHP 8.1 - T391057
  • 19:30 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 19:08 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-fe1003.eqiad.wmnet with reason: host reimage
  • 19:04 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-fe1003.eqiad.wmnet with reason: host reimage
  • 19:01 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.28 refs T386223
  • 18:48 sukhe@dns1004: END - running authdns-update
  • 18:46 sukhe@dns1004: START - running authdns-update
  • 18:45 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
  • 18:45 zabe: move all translateable subpages of "Wikimedia Foundation Board of Trustees" to subpages of "Wikimedia Foundation/Board of Trustees" on metawiki (T393619)
  • 18:43 zabe: mwscript-k8s [...]moveTranslatableBundle.php metawiki "Wikimedia Foundation Board of Trustees/Call for feedback: Board of Trustees elections" "Wikimedia Foundation/Board of Trustees/Call for feedback: Board of Trustees elections" "Zabe" --reason "per request T393619"
  • 18:42 zabe: zabe@deploy1003:~$ mwscript-k8s --attach -- extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Wikimedia Foundation Board of Trustees/Call for feedback: Board of Trustees elections" "Wikimedia Foundation/Board of Trustees/Call for feedback: Board of Trustees elections" "Zabe" --reason "per request
  • 18:38 zabe: zabe@deploy1003:~$ mwscript-k8s --attach -- extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Wikimedia Foundation Board of Trustees/Call for feedback:2022 Board of Trustees election/Upcoming Call for Feedback about the Board of Trustees elections" "Wikimedia Foundation/Board of Trustees/Call for feedback:2022 Board of
  • 18:30 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp50[27-32].eqsin.wmnet} and A:cp
  • 18:29 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp50[19-24].eqsin.wmnet} and A:cp
  • 18:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2048.codfw.wmnet with OS bookworm
  • 18:03 dancy@deploy1003: Installation of scap version "4.162.0" completed for 2 hosts
  • 18:01 dancy@deploy1003: Installing scap version "4.162.0" for 2 host(s)
  • 17:38 cdanis@dns1004: END - running authdns-update
  • 17:36 cdanis@dns1004: START - running authdns-update
  • 17:28 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=97) rolling upgrade of Varnish on A:cp-text_eqsin
  • 17:28 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=97) rolling upgrade of Varnish on A:cp-upload_eqsin
  • 17:25 cdanis@dns1004: END - running authdns-update
  • 17:23 cdanis@dns1004: START - running authdns-update
  • 17:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2047.codfw.wmnet with OS bookworm
  • 17:12 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch1112.eqiad.wmnet|cirrussearch1113.eqiad.wmnet|cirrussearch1114.eqiad.wmnet|cirrussearch1115.eqiad.wmnet|cirrussearch1116.eqiad.wmnet|cirrussearch1117.eqiad.wmnet|cirrussearch1118.eqiad.wmnet|cirrussearch1119.eqiad.wmnet|cirrussearch1120.eqiad.wmnet|cirrussearch1121.eqiad.wmnet|cirrussearch1122.eqiad.wmnet|cirrussearch1123.eqiad.wmn
  • 17:09 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch1111.eqiad.wmnet|name=cirrussearch1112.eqiad.wmnet|name=cirrussearch1113.eqiad.wmnet|name=cirrussearch1114.eqiad.wmnet|name=cirrussearch1115.eqiad.wmnet|name=cirrussearch1116.eqiad.wmnet|name=cirrussearch1117.eqiad.wmnet|name=cirrussearch1118.eqiad.wmnet|name=cirrussearch1119.eqiad.wmnet|name=cirrussearch1120.eqiad.wmnet|name=cirru
  • 17:06 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:05 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:05 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:05 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:05 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:04 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqsin
  • 16:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqsin
  • 16:49 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
  • 16:48 fabfur: repooling cp7001 (T393671)
  • 16:48 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7001.magru.wmnet
  • 16:48 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7001.magru.wmnet
  • 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
  • 16:29 brett@dns1005: END - running authdns-update
  • 16:28 brett@dns1005: START - running authdns-update
  • 16:27 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
  • 16:22 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: Host has crashed - T393296
  • 16:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2048
  • 16:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2048
  • 16:11 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2048 to codfw - jhancock@cumin2002"
  • 16:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2048 to codfw - jhancock@cumin2002"
  • 16:09 sukhe@dns1004: END - running authdns-update
  • 16:08 sukhe@dns1004: START - running authdns-update
  • 16:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
  • 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
  • 15:46 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
  • 15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
  • 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
  • 15:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:07 sukhe: sudo cumin -b1 -s10 'A:dnsbox' 'run-puppet-agent'
  • 15:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
  • 14:57 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
  • 14:45 moritzm: imported ripe-atlas-tools 2.3.0-3+wmf12u1 to apt.wikimedia.org/bookworm T389380
  • 14:45 moritzm: imported ripe-atlas-sagan 1.3.1-1~wmf12u1 to apt.wikimedia.org/bookworm T389380
  • 14:36 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1246.eqiad.wmnet with OS bookworm
  • 14:34 James_F: Running `foreachwiki extensions/Echo/maintenance/removeInvalidNotification.php --remove # T389673` for MatmaRex
  • 14:23 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase[1028-1030].eqiad.wmnet
  • 14:23 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:23 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[1028-1030].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
  • 14:21 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 14:21 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 14:21 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 14:20 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 14:20 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:20 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[1028-1030].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
  • 14:20 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 14:14 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
  • 14:12 pt1979@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
  • 14:03 eevans@cumin1002: START - Cookbook sre.dns.netbox
  • 13:52 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bookworm
  • 13:51 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts restbase[1028-1030].eqiad.wmnet
  • 13:42 volans: forced removal of db1246 from puppetdb to unblock reimage (was failing due to a puppet change in the meanwhile)
  • 13:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:34 tchanders@deploy1003: Finished scap sync-world: Backport for temp accounts: Remove AutopromoteOnce configuration (T393358) (duration: 16m 30s)
  • 13:27 tchanders@deploy1003: tchanders, kharlan: Continuing with sync
  • 13:24 tchanders@deploy1003: tchanders, kharlan: Backport for temp accounts: Remove AutopromoteOnce configuration (T393358) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:17 tchanders@deploy1003: Started scap sync-world: Backport for temp accounts: Remove AutopromoteOnce configuration (T393358)
  • 13:03 moritzm: installing jetty9 security updates
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
  • 12:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
  • 12:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
  • 12:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
  • 11:57 moritzm: import transferpy 1.1+deb12u1 to bookworm-wikimedia T389380
  • 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2004.codfw.wmnet
  • 11:44 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2004.codfw.wmnet
  • 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1004.eqiad.wmnet
  • 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
  • 11:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1004.eqiad.wmnet
  • 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 11:19 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 11:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 11:15 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 11:15 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
  • 10:45 zabe: zabe@deploy1003:~$ mwscript-k8s --attach -- extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Wikimedia Foundation Board of Trustees" "Wikimedia Foundation/Board of Trustees" "Zabe" --reason "per request T393619"
  • 10:31 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 10:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 09:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:26 Emperor: swift delete wikipedia-commons-local-public.e7 'e/e7/Hawkmoth_(Meganoton_nyctiphanes)_(8688240817).jpg' ms-fe1009 and ms-fe2009 T392658
  • 09:02 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-fe1003.eqiad.wmnet with OS bookworm
  • 08:53 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
  • 08:52 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-fe1003.eqiad.wmnet with OS bookworm
  • 08:47 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
  • 08:37 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: Testing in progress
  • 08:19 dcausse: closing UTC morning backport window
  • 08:12 dcausse@deploy1003: Finished scap sync-world: Backport for cirrus: explicitly route search traffic to codfw (T388610) (duration: 23m 19s)
  • 08:06 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
  • 08:05 fabfur: depooling and disabling puppet on cp7001 to perform tests (T393671)
  • 08:03 dcausse@deploy1003: dcausse: Continuing with sync
  • 07:56 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 29s)
  • 07:55 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
  • 07:55 dcausse@deploy1003: dcausse: Backport for cirrus: explicitly route search traffic to codfw (T388610) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:52 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 42s)
  • 07:51 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
  • 07:49 dcausse@deploy1003: Started scap sync-world: Backport for cirrus: explicitly route search traffic to codfw (T388610)
  • 07:46 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 05m 42s)
  • 07:40 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
  • 07:40 fab@deploy1003: Finished deploy [airflow-dags/research@4367417]: (no justification provided) (duration: 00m 40s)
  • 07:39 fab@deploy1003: Started deploy [airflow-dags/research@4367417]: (no justification provided)
  • 07:06 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2034.codfw.wmnet
  • 07:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
  • 07:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 06:56 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
  • 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
  • 06:54 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 06:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
  • 06:47 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
  • 06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
  • 06:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
  • 01:52 tstarling@deploy1003: Finished scap sync-world: Backport for Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601), Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601) (duration: 46m 12s)
  • 01:38 tstarling@deploy1003: tstarling: Continuing with sync
  • 01:37 tstarling@deploy1003: tstarling: Backport for Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601), Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 01:06 tstarling@deploy1003: Started scap sync-world: Backport for Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601), Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601)
  • 00:14 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_esams
  • 00:09 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_esams

2025-05-07

  • 21:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839), ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839) (duration: 14m 12s)
  • 21:29 ejegg: payments-wiki upgraded from 822bac34 to fac09775
  • 21:27 ladsgroup@deploy1003: ladsgroup, sbisson: Continuing with sync
  • 21:26 ladsgroup@deploy1003: ladsgroup, sbisson: Backport for ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839), ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1124.eqiad.wmnet with OS bullseye
  • 21:19 ladsgroup@deploy1003: Started scap sync-world: Backport for ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839), ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839)
  • 21:06 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_esams
  • 21:06 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_esams
  • 21:05 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_magru
  • 21:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for Charts phase 1 deployment (T393517), Clear floats to avoid tall charts (T393286), Clear floats to avoid tall charts (T393286) (duration: 17m 21s)
  • 21:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1124.eqiad.wmnet with reason: host reimage
  • 21:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1124.eqiad.wmnet with reason: host reimage
  • 20:59 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_magru
  • 20:56 ladsgroup@deploy1003: jdlrobson, bvibber, ladsgroup: Continuing with sync
  • 20:55 ladsgroup@deploy1003: jdlrobson, bvibber, ladsgroup: Backport for Charts phase 1 deployment (T393517), Clear floats to avoid tall charts (T393286), Clear floats to avoid tall charts (T393286) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1124.eqiad.wmnet with OS bullseye
  • 20:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1124.eqiad.wmnet with OS bullseye
  • 20:48 ladsgroup@deploy1003: Started scap sync-world: Backport for Charts phase 1 deployment (T393517), Clear floats to avoid tall charts (T393286), Clear floats to avoid tall charts (T393286)
  • 20:46 ladsgroup@deploy1003: Finished scap sync-world: Backport for Remove whatlinkshere hook (T393513), Improve circuit breaking error message (T360930), Remove hard-coded timestamps in SpecialGlobalContributionsTest (T393531) (duration: 41m 41s)
  • 20:33 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 20:33 ladsgroup@deploy1003: ladsgroup: Backport for Remove whatlinkshere hook (T393513), Improve circuit breaking error message (T360930), Remove hard-coded timestamps in SpecialGlobalContributionsTest (T393531) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:28 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:26 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs3009*} and A:liberica (T393616)
  • 20:26 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs3009*} and A:liberica (T393616)
  • 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:23 sukhe: depooling lvs3009 for HW maint: T393616
  • 20:04 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513), Improve circuit breaking error message (T360930), Remove hard-coded timestamps in SpecialGlobalContributionsTest (T393531)
  • 19:35 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.eqiad.wmnet with OS bookworm
  • 18:55 hmonroy@deploy1003: Finished scap sync-world: Backport for Enable Codex and Multiblocks in Hebrew wiki (T377121) (duration: 17m 21s)
  • 18:49 hmonroy@deploy1003: hmonroy: Continuing with sync
  • 18:45 hmonroy@deploy1003: hmonroy: Backport for Enable Codex and Multiblocks in Hebrew wiki (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1125.eqiad.wmnet with OS bullseye
  • 18:38 hmonroy@deploy1003: Started scap sync-world: Backport for Enable Codex and Multiblocks in Hebrew wiki (T377121)
  • 18:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:31 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:30 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:29 dancy@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.28 refs T386223
  • 18:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1125.eqiad.wmnet with reason: host reimage
  • 18:21 volans: uploaded spicerack_10.2.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
  • 18:20 aokoth@dns1004: END - running authdns-update
  • 18:19 aokoth@dns1004: START - running authdns-update
  • 18:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1125.eqiad.wmnet with reason: host reimage
  • 18:14 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
  • 18:13 dancy@deploy1003: Finished scap build-images: (no justification provided) (duration: 00m 30s)
  • 18:12 dancy@deploy1003: Started scap build-images: (no justification provided)
  • 18:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1125.eqiad.wmnet with OS bullseye
  • 18:06 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1124.eqiad.wmnet with OS bullseye
  • 18:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1123.eqiad.wmnet with OS bullseye
  • 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1125 to cirrussearch1125
  • 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1122.eqiad.wmnet with OS bullseye
  • 18:01 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1125
  • 17:53 ladsgroup@deploy1003: Finished scap sync-world: Backport for Remove whatlinkshere hook (T393513) (duration: 36m 00s)
  • 17:52 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:40 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 17:37 ladsgroup@deploy1003: ladsgroup: Backport for Remove whatlinkshere hook (T393513) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1123.eqiad.wmnet with reason: host reimage
  • 17:35 swfrench-wmf: deploy1003 and deploy2002 updated to PHP 8.1 - T392938
  • 17:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1122.eqiad.wmnet with reason: host reimage
  • 17:34 vriley@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:31 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1123.eqiad.wmnet with reason: host reimage
  • 17:29 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1125
  • 17:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1125 on all recursors
  • 17:29 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1125 on all recursors
  • 17:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:29 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1125 to cirrussearch1125 - bking@cumin2002"
  • 17:29 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1125 to cirrussearch1125 - bking@cumin2002"
  • 17:28 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 17:26 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1122.eqiad.wmnet with reason: host reimage
  • 17:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1124 to cirrussearch1124
  • 17:24 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_magru
  • 17:24 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1124
  • 17:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1124
  • 17:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1124 on all recursors
  • 17:23 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 17:23 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1124 on all recursors
  • 17:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:23 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1124 to cirrussearch1124 - bking@cumin2002"
  • 17:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1124 to cirrussearch1124 - bking@cumin2002"
  • 17:20 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1125 to cirrussearch1125
  • 17:19 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 17:17 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513)
  • 17:17 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1124 to cirrussearch1124
  • 17:16 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe1003
  • 17:15 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1123.eqiad.wmnet with OS bullseye
  • 17:15 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe1003
  • 17:14 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:13 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_magru
  • 17:13 swfrench-wmf: disable-puppet "In-place update to PHP 8.1 - T392938" on deploy1003 and deploy2002
  • 17:11 vriley@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1123 to cirrussearch1123
  • 17:08 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1123
  • 17:08 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1122.eqiad.wmnet with OS bullseye
  • 17:08 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1123
  • 17:08 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1123 on all recursors
  • 17:08 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1123 on all recursors
  • 17:08 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:08 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1123 to cirrussearch1123 - bking@cumin2002"
  • 17:08 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1123 to cirrussearch1123 - bking@cumin2002"
  • 17:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1122 to cirrussearch1122
  • 17:07 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1122
  • 17:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1122
  • 17:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1122 on all recursors
  • 17:06 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1122 on all recursors
  • 17:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:06 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1122 to cirrussearch1122 - bking@cumin2002"
  • 17:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1122 to cirrussearch1122 - bking@cumin2002"
  • 17:04 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:58 cdanis: per dwisehaupt T196336 💙cdanis@alert1002.wikimedia.org ~ 🕐☕ sudo systemctl restart nsca.service
  • 16:58 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:56 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1123 to cirrussearch1123
  • 16:56 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1122 to cirrussearch1122
  • 16:43 ladsgroup@deploy1003: sync-world aborted: Backport for Remove whatlinkshere hook (T393513) (duration: 06m 07s)
  • 16:36 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513)
  • 16:36 ladsgroup@deploy1003: sync-world aborted: Backport for Remove whatlinkshere hook (T393513) (duration: 29m 10s)
  • 16:31 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 16:31 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 16:31 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 16:30 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1121.eqiad.wmnet with OS bullseye
  • 16:09 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 16:09 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:07 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 16:07 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 16:07 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513)
  • 15:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1120.eqiad.wmnet with OS bullseye
  • 15:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1121.eqiad.wmnet with reason: host reimage
  • 15:53 moritzm: uploaded a python-pynetbox 7.4.1-1~wmf12u1 to bookworm-wikimedia (needed for Cumin update) T389380
  • 15:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1121.eqiad.wmnet with reason: host reimage
  • 15:49 zabe: zabe@mwmaint1002:~$ mwscript findBadBlobs.php enwiki --revisions 276146284,819689534,1289169661 --mark "T393237"
  • 15:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1119.eqiad.wmnet with OS bullseye
  • 15:43 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1247.eqiad.wmnet with reason: Host has crashed - T393612
  • 15:40 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1121.eqiad.wmnet with OS bullseye
  • 15:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1121 to cirrussearch1121
  • 15:39 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1121
  • 15:38 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ms-be1060.eqiad.wmnet
  • 15:38 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1121
  • 15:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1121 on all recursors
  • 15:37 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1121 on all recursors
  • 15:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1121 to cirrussearch1121 - bking@cumin2002"
  • 15:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1120.eqiad.wmnet with reason: host reimage
  • 15:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1121 to cirrussearch1121 - bking@cumin2002"
  • 15:36 mvernon@cumin1002: START - Cookbook sre.dns.netbox
  • 15:32 cdanis@cumin1002: dbctl commit (dc=all): 'depool db1247', diff saved to https://phabricator.wikimedia.org/P75876 and previous config saved to /var/cache/conftool/dbconfig/20250507-153228-cdanis.json
  • 15:32 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 15:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1120.eqiad.wmnet with reason: host reimage
  • 15:31 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 15:31 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 15:31 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:30 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 15:30 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:30 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:30 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 15:30 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 15:30 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:30 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:29 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:29 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:29 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:28 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1121 to cirrussearch1121
  • 15:26 mvernon@cumin1002: START - Cookbook sre.hosts.decommission for hosts ms-be1060.eqiad.wmnet
  • 15:21 damilare: civicrm upgraded from 6ffbde61 to dc096105
  • 15:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1118.eqiad.wmnet with OS bullseye
  • 15:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1119.eqiad.wmnet with reason: host reimage
  • 15:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1120.eqiad.wmnet with OS bullseye
  • 15:10 sukhe@dns1004: END - running authdns-update
  • 15:10 sukhe: timing authdns-update for T393602
  • 15:09 sukhe@dns1004: START - running authdns-update
  • 15:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1119.eqiad.wmnet with reason: host reimage
  • 15:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1120 to cirrussearch1120
  • 15:08 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1120
  • 15:08 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1060*,elastic1081*,elastic1083* for thread pool rejections - bking@cumin2002
  • 15:08 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1060*,elastic1081*,elastic1083* for thread pool rejections - bking@cumin2002
  • 15:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1120
  • 15:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1120 on all recursors
  • 15:06 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1120 on all recursors
  • 15:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:06 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1120 to cirrussearch1120 - bking@cumin2002"
  • 15:06 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1120 to cirrussearch1120 - bking@cumin2002"
  • 15:06 sukhe: sudo cumin -b1 -s10 'A:dnsbox' 'sudo -u authdns git -C /srv/authdns/git maintenance run' T393602
  • 15:05 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1016.eqiad.wmnet
  • 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1016.eqiad.wmnet
  • 15:04 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1016.eqiad.wmnet
  • 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1016.eqiad.wmnet
  • 15:04 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1015.eqiad.wmnet
  • 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1015.eqiad.wmnet
  • 15:04 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1015.eqiad.wmnet
  • 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1015.eqiad.wmnet
  • 15:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1060*,elastic1081* for thread pool rejections - bking@cumin2002
  • 15:04 sukhe@dns1004: END - running authdns-update
  • 15:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1060*,elastic1081* for thread pool rejections - bking@cumin2002
  • 15:04 Emperor: pool ms-fe1015 ms-fe1016 new frontends T388886 T391354
  • 15:02 sukhe@dns1004: START - running authdns-update
  • 15:00 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:59 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1081* for thread pool rejections - bking@cumin2002
  • 14:59 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1081* for thread pool rejections - bking@cumin2002
  • 14:58 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1120 to cirrussearch1120
  • 14:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1119.eqiad.wmnet with OS bullseye
  • 14:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1119 to cirrussearch1119
  • 14:47 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1119
  • 14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1118.eqiad.wmnet with reason: host reimage
  • 14:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1117.eqiad.wmnet with OS bullseye
  • 14:40 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1119
  • 14:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1119 on all recursors
  • 14:40 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1119 on all recursors
  • 14:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:40 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1119 to cirrussearch1119 - bking@cumin2002"
  • 14:39 moritzm: installing openjdk-17 security updates
  • 14:39 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1118.eqiad.wmnet with reason: host reimage
  • 14:33 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1119 to cirrussearch1119 - bking@cumin2002"
  • 14:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1116.eqiad.wmnet with OS bullseye
  • 14:26 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:26 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1119 to cirrussearch1119
  • 14:15 gengh@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1117.eqiad.wmnet with reason: host reimage
  • 14:15 gengh@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:14 gengh@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:14 gengh@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:13 gengh@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 gengh@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1116.eqiad.wmnet with reason: host reimage
  • 14:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 14:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 14:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1117.eqiad.wmnet with reason: host reimage
  • 14:09 sukhe@dns1004: END - running authdns-update
  • 14:09 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1118.eqiad.wmnet with OS bullseye
  • 14:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1116.eqiad.wmnet with reason: host reimage
  • 14:08 gengh@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 gengh@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:07 gengh@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 sukhe@dns1004: START - running authdns-update
  • 14:06 gengh@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1118 to cirrussearch1118
  • 14:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 14:05 gengh@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1118
  • 14:04 gengh@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:03 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1118
  • 14:03 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1118 on all recursors
  • 14:03 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1118 on all recursors
  • 14:03 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:03 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1118 to cirrussearch1118 - bking@cumin2002"
  • 14:03 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1118 to cirrussearch1118 - bking@cumin2002"
  • 14:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 13:59 sukhe@dns1004: END - running authdns-update
  • 13:59 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:58 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1118 to cirrussearch1118
  • 13:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1117.eqiad.wmnet with OS bullseye
  • 13:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1116.eqiad.wmnet with OS bullseye
  • 13:57 sukhe@dns1004: START - running authdns-update
  • 13:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1117 to cirrussearch1117
  • 13:52 moritzm: installing nginx security updates
  • 13:51 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1117
  • 13:50 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 13:50 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1117
  • 13:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1117 on all recursors
  • 13:50 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1117 on all recursors
  • 13:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1117 to cirrussearch1117 - bking@cumin2002"
  • 13:50 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1117 to cirrussearch1117 - bking@cumin2002"
  • 13:47 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 13:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
  • 13:43 mvernon@cumin1002: END (ERROR) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=97) rolling restart_daemons on A:swift-fe-eqiad
  • 13:43 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 13:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 13:41 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1016.eqiad.wmnet
  • 13:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 13:37 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-fe1016.eqiad.wmnet
  • 13:36 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1015.eqiad.wmnet
  • 13:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1116 to cirrussearch1116
  • 13:34 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1116
  • 13:33 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1116
  • 13:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1116 on all recursors
  • 13:33 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1116 on all recursors
  • 13:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:33 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1116 to cirrussearch1116 - bking@cumin2002"
  • 13:33 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:32 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1116 to cirrussearch1116 - bking@cumin2002"
  • 13:31 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1117 to cirrussearch1117
  • 13:30 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-fe1015.eqiad.wmnet
  • 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1046.eqiad.wmnet
  • 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
  • 13:28 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 13:27 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1116 to cirrussearch1116
  • 13:25 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
  • 13:21 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:21 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1046.eqiad.wmnet
  • 13:15 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1045.eqiad.wmnet
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
  • 13:07 moritzm: installing poppler security updates
  • 13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
  • 13:07 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:07 hashar: Restarted Apache httpd server on Gerrit server
  • 13:07 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1045.eqiad.wmnet
  • 12:58 Amir1: [wikishared]> CREATE INDEX translation_last_updated_timestamp ON cx_translations (translation_last_updated_timestamp); (T392839)
  • 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1044.eqiad.wmnet
  • 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
  • 12:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
  • 12:38 moritzm: installing imagemagick security updates
  • 12:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1044.eqiad.wmnet
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
  • 12:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
  • 12:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
  • 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
  • 11:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
  • 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
  • 11:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
  • 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
  • 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
  • 11:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
  • 10:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
  • 10:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
  • 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
  • 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
  • 10:27 moritzm: upgrading krb2002 to Bookworm T390863
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
  • 10:22 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on krb2002.codfw.wmnet with reason: update to Bookworm
  • 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
  • 10:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
  • 10:14 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1037.eqiad.wmnet
  • 10:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1037.eqiad.wmnet
  • 10:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
  • 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
  • 09:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1036.eqiad.wmnet
  • 09:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1036.eqiad.wmnet
  • 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
  • 09:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
  • 08:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1035.eqiad.wmnet
  • 08:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1035.eqiad.wmnet
  • 08:54 XioNoX: update `host-inbound-traffic system-services` on pfw1-eqiad - T390052
  • 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
  • 08:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
  • 08:09 zabe@deploy1003: Finished scap sync-world: Backport for SkinTemplate: Restore a string 'class' in tabAction() (T393504) (duration: 19m 01s)
  • 08:02 zabe@deploy1003: zabe: Continuing with sync
  • 07:56 zabe@deploy1003: zabe: Backport for SkinTemplate: Restore a string 'class' in tabAction() (T393504) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:50 zabe@deploy1003: Started scap sync-world: Backport for SkinTemplate: Restore a string 'class' in tabAction() (T393504)
  • 07:17 slyngshede@dns1004: END - running authdns-update
  • 07:14 slyngshede@dns1004: START - running authdns-update
  • 06:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61588
  • 06:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61588
  • 06:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 24441
  • 06:54 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 24441
  • 06:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268097
  • 06:53 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268097
  • 06:53 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 35847
  • 06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 35847
  • 06:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 264595
  • 06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 264595
  • 06:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268517
  • 06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268517
  • 06:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263569
  • 06:51 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 263569
  • 06:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
  • 06:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
  • 06:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
  • 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
  • 06:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 05:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 05:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 05:48 XioNoX: decom Tele2 transit in esams - T393401
  • 05:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
  • 05:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
  • 05:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
  • 05:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
  • 04:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T382778)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250507-042334-ladsgroup.json
  • 04:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P75869 and previous config saved to /var/cache/conftool/dbconfig/20250507-040826-ladsgroup.json
  • 03:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P75868 and previous config saved to /var/cache/conftool/dbconfig/20250507-035319-ladsgroup.json
  • 03:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T382778)', diff saved to https://phabricator.wikimedia.org/P75867 and previous config saved to /var/cache/conftool/dbconfig/20250507-033812-ladsgroup.json
  • 03:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T382778)', diff saved to https://phabricator.wikimedia.org/P75866 and previous config saved to /var/cache/conftool/dbconfig/20250507-033518-ladsgroup.json
  • 03:35 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 03:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T382778)', diff saved to https://phabricator.wikimedia.org/P75865 and previous config saved to /var/cache/conftool/dbconfig/20250507-033455-ladsgroup.json
  • 03:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P75864 and previous config saved to /var/cache/conftool/dbconfig/20250507-031947-ladsgroup.json
  • 03:07 tstarling@deploy1003: Finished scap sync-world: Backport for Hooks: disable if content model is unset AND CodeMirror beta is set (T373711) (duration: 32m 06s)
  • 03:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P75863 and previous config saved to /var/cache/conftool/dbconfig/20250507-030440-ladsgroup.json
  • 02:58 tstarling@deploy1003: tstarling, musikanimal: Continuing with sync
  • 02:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T382778)', diff saved to https://phabricator.wikimedia.org/P75862 and previous config saved to /var/cache/conftool/dbconfig/20250507-024933-ladsgroup.json
  • 02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T382778)', diff saved to https://phabricator.wikimedia.org/P75861 and previous config saved to /var/cache/conftool/dbconfig/20250507-024638-ladsgroup.json
  • 02:46 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T382778)', diff saved to https://phabricator.wikimedia.org/P75860 and previous config saved to /var/cache/conftool/dbconfig/20250507-024518-ladsgroup.json
  • 02:41 tstarling@deploy1003: tstarling, musikanimal: Backport for Hooks: disable if content model is unset AND CodeMirror beta is set (T373711) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 02:34 tstarling@deploy1003: Started scap sync-world: Backport for Hooks: disable if content model is unset AND CodeMirror beta is set (T373711)
  • 02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P75859 and previous config saved to /var/cache/conftool/dbconfig/20250507-023009-ladsgroup.json
  • 02:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P75858 and previous config saved to /var/cache/conftool/dbconfig/20250507-021502-ladsgroup.json
  • 01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T382778)', diff saved to https://phabricator.wikimedia.org/P75857 and previous config saved to /var/cache/conftool/dbconfig/20250507-015955-ladsgroup.json
  • 01:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T382778)', diff saved to https://phabricator.wikimedia.org/P75856 and previous config saved to /var/cache/conftool/dbconfig/20250507-015658-ladsgroup.json
  • 01:56 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 01:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T382778)', diff saved to https://phabricator.wikimedia.org/P75855 and previous config saved to /var/cache/conftool/dbconfig/20250507-015636-ladsgroup.json
  • 01:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P75854 and previous config saved to /var/cache/conftool/dbconfig/20250507-014128-ladsgroup.json
  • 01:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P75853 and previous config saved to /var/cache/conftool/dbconfig/20250507-012621-ladsgroup.json
  • 01:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T382778)', diff saved to https://phabricator.wikimedia.org/P75852 and previous config saved to /var/cache/conftool/dbconfig/20250507-011114-ladsgroup.json
  • 01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T382778)', diff saved to https://phabricator.wikimedia.org/P75851 and previous config saved to /var/cache/conftool/dbconfig/20250507-010811-ladsgroup.json
  • 01:08 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 01:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T382778)', diff saved to https://phabricator.wikimedia.org/P75850 and previous config saved to /var/cache/conftool/dbconfig/20250507-010748-ladsgroup.json
  • 00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P75849 and previous config saved to /var/cache/conftool/dbconfig/20250507-005240-ladsgroup.json
  • 00:39 hmonroy@deploy1003: Finished scap sync-world: Backport for Revert "JavaScript: ESLint 8.57.0" (T381577) (duration: 47m 14s)
  • 00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P75848 and previous config saved to /var/cache/conftool/dbconfig/20250507-003733-ladsgroup.json
  • 00:33 andrew@dns1004: END - running authdns-update
  • 00:30 andrew@dns1004: START - running authdns-update
  • 00:26 hmonroy@deploy1003: hmonroy, musikanimal: Continuing with sync
  • 00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T382778)', diff saved to https://phabricator.wikimedia.org/P75847 and previous config saved to /var/cache/conftool/dbconfig/20250507-002226-ladsgroup.json
  • 00:21 hmonroy@deploy1003: hmonroy, musikanimal: Backport for Revert "JavaScript: ESLint 8.57.0" (T381577) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:19 andrew@dns1004: END - running authdns-update
  • 00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T382778)', diff saved to https://phabricator.wikimedia.org/P75846 and previous config saved to /var/cache/conftool/dbconfig/20250507-001924-ladsgroup.json
  • 00:19 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T382778)', diff saved to https://phabricator.wikimedia.org/P75845 and previous config saved to /var/cache/conftool/dbconfig/20250507-001901-ladsgroup.json
  • 00:16 andrew@dns1004: START - running authdns-update
  • 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P75844 and previous config saved to /var/cache/conftool/dbconfig/20250507-000354-ladsgroup.json

2025-05-06

  • 23:52 hmonroy@deploy1003: Started scap sync-world: Backport for Revert "JavaScript: ESLint 8.57.0" (T381577)
  • 23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P75843 and previous config saved to /var/cache/conftool/dbconfig/20250506-234846-ladsgroup.json
  • 23:37 hmonroy@deploy1003: Finished scap sync-world: Backport for InitialiseSettings: enable multiblocks on group0 (T377121) (duration: 14m 17s)
  • 23:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T382778)', diff saved to https://phabricator.wikimedia.org/P75842 and previous config saved to /var/cache/conftool/dbconfig/20250506-233339-ladsgroup.json
  • 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T382778)', diff saved to https://phabricator.wikimedia.org/P75841 and previous config saved to /var/cache/conftool/dbconfig/20250506-233041-ladsgroup.json
  • 23:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:30 hmonroy@deploy1003: musikanimal, hmonroy: Continuing with sync
  • 23:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T382778)', diff saved to https://phabricator.wikimedia.org/P75840 and previous config saved to /var/cache/conftool/dbconfig/20250506-233002-ladsgroup.json
  • 23:29 hmonroy@deploy1003: musikanimal, hmonroy: Backport for InitialiseSettings: enable multiblocks on group0 (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:22 hmonroy@deploy1003: Started scap sync-world: Backport for InitialiseSettings: enable multiblocks on group0 (T377121)
  • 23:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1115.eqiad.wmnet with OS bullseye
  • 23:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1114.eqiad.wmnet with OS bullseye
  • 23:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P75839 and previous config saved to /var/cache/conftool/dbconfig/20250506-231454-ladsgroup.json
  • 22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P75838 and previous config saved to /var/cache/conftool/dbconfig/20250506-225947-ladsgroup.json
  • 22:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1115.eqiad.wmnet with reason: host reimage
  • 22:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1114.eqiad.wmnet with reason: host reimage
  • 22:45 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1115.eqiad.wmnet with reason: host reimage
  • 22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T382778)', diff saved to https://phabricator.wikimedia.org/P75837 and previous config saved to /var/cache/conftool/dbconfig/20250506-224440-ladsgroup.json
  • 22:44 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1114.eqiad.wmnet with reason: host reimage
  • 22:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T382778)', diff saved to https://phabricator.wikimedia.org/P75836 and previous config saved to /var/cache/conftool/dbconfig/20250506-224132-ladsgroup.json
  • 22:41 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T382778)', diff saved to https://phabricator.wikimedia.org/P75835 and previous config saved to /var/cache/conftool/dbconfig/20250506-224110-ladsgroup.json
  • 22:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1113.eqiad.wmnet with OS bullseye
  • 22:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1115.eqiad.wmnet with OS bullseye
  • 22:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1114.eqiad.wmnet with OS bullseye
  • 22:29 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
  • 22:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P75834 and previous config saved to /var/cache/conftool/dbconfig/20250506-222603-ladsgroup.json
  • 22:25 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
  • 22:21 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
  • 22:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
  • 22:13 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
  • 22:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P75833 and previous config saved to /var/cache/conftool/dbconfig/20250506-221056-ladsgroup.json
  • 22:10 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
  • 22:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
  • 22:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1115 to cirrussearch1115
  • 22:02 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1115
  • 22:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1113.eqiad.wmnet with OS bullseye
  • 22:02 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
  • 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1115
  • 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1115 on all recursors
  • 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1115 on all recursors
  • 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1115 to cirrussearch1115 - bking@cumin2002"
  • 22:00 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
  • 21:59 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
  • 21:59 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
  • 21:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T382778)', diff saved to https://phabricator.wikimedia.org/P75832 and previous config saved to /var/cache/conftool/dbconfig/20250506-215549-ladsgroup.json
  • 21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T382778)', diff saved to https://phabricator.wikimedia.org/P75831 and previous config saved to /var/cache/conftool/dbconfig/20250506-215242-ladsgroup.json
  • 21:52 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T382778)', diff saved to https://phabricator.wikimedia.org/P75830 and previous config saved to /var/cache/conftool/dbconfig/20250506-215219-ladsgroup.json
  • 21:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
  • 21:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
  • 21:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
  • 21:40 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
  • 21:40 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
  • 21:40 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
  • 21:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1113.eqiad.wmnet with OS bullseye
  • 21:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P75829 and previous config saved to /var/cache/conftool/dbconfig/20250506-213712-ladsgroup.json
  • 21:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1112.eqiad.wmnet with OS bullseye
  • 21:28 ryankemper: T388134 Seeing 502 errors; that explains why the drop in requests to wdqs-full is not matched by an increase to wdqs-main. Rolling back for now while we figure out what piece we're missing
  • 21:24 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1115 to cirrussearch1115 - bking@cumin2002"
  • 21:23 ryankemper: T388134 Cutover of query.wikidata.org to `wdqs-main` instead of `wdqs` is ongoing. We're seeing the expected drop in queries to the main cluster (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1746565806937&to=1746566592047) but not seeing corresponding increase in wdqs-main yet
  • 21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P75828 and previous config saved to /var/cache/conftool/dbconfig/20250506-212204-ladsgroup.json
  • 21:20 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
  • 21:18 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 21:18 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 21:17 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 21:17 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:17 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 21:16 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1115 to cirrussearch1115
  • 21:16 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 21:16 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 21:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1114 to cirrussearch1114
  • 21:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
  • 21:15 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1114
  • 21:12 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1114
  • 21:12 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1114 on all recursors
  • 21:12 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1114 on all recursors
  • 21:12 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:12 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1114 to cirrussearch1114 - bking@cumin2002"
  • 21:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1112.eqiad.wmnet with reason: host reimage
  • 21:12 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
  • 21:10 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1114 to cirrussearch1114 - bking@cumin2002"
  • 21:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
  • 21:07 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1112.eqiad.wmnet with reason: host reimage
  • 21:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T382778)', diff saved to https://phabricator.wikimedia.org/P75827 and previous config saved to /var/cache/conftool/dbconfig/20250506-210658-ladsgroup.json
  • 21:06 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.wikimedia.org with OS bookworm
  • 21:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
  • 21:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
  • 21:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
  • 21:03 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T382778)', diff saved to https://phabricator.wikimedia.org/P75826 and previous config saved to /var/cache/conftool/dbconfig/20250506-210329-ladsgroup.json
  • 21:03 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 21:03 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1114 to cirrussearch1114
  • 21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T382778)', diff saved to https://phabricator.wikimedia.org/P75825 and previous config saved to /var/cache/conftool/dbconfig/20250506-210307-ladsgroup.json
  • 21:02 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=wdqs1011.eqiad.wmnet|wdqs1016.eqiad.wmnet|wdqs1017.eqiad.wmnet|wdqs2008.codfw.wmnet|wdqs2014.codfw.wmnet|wdqs2015.codfw.wmnet
  • 21:01 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
  • 21:00 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1113.eqiad.wmnet with OS bullseye
  • 20:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1113 to cirrussearch1113
  • 20:58 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1113
  • 20:57 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1113
  • 20:57 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1113 on all recursors
  • 20:57 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1113 on all recursors
  • 20:57 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:57 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1113 to cirrussearch1113 - bking@cumin2002"
  • 20:56 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1113 to cirrussearch1113 - bking@cumin2002"
  • 20:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1112.eqiad.wmnet with OS bullseye
  • 20:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P75824 and previous config saved to /var/cache/conftool/dbconfig/20250506-204758-ladsgroup.json
  • 20:45 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
  • 20:44 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
  • 20:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
  • 20:43 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:42 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1113 to cirrussearch1113
  • 20:40 andrew@cumin1002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cloudrabbit2001-dev.codfw.wmnet: Renew puppet certificate - andrew@cumin1002
  • 20:40 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:39 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1112 to cirrussearch1112
  • 20:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1112
  • 20:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1112
  • 20:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1112 on all recursors
  • 20:36 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1112 on all recursors
  • 20:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:36 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1112 to cirrussearch1112 - bking@cumin2002"
  • 20:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1112 to cirrussearch1112 - bking@cumin2002"
  • 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P75823 and previous config saved to /var/cache/conftool/dbconfig/20250506-203251-ladsgroup.json
  • 20:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:28 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:27 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1112 to cirrussearch1112
  • 20:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T382778)', diff saved to https://phabricator.wikimedia.org/P75822 and previous config saved to /var/cache/conftool/dbconfig/20250506-201744-ladsgroup.json
  • 20:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T382778)', diff saved to https://phabricator.wikimedia.org/P75821 and previous config saved to /var/cache/conftool/dbconfig/20250506-201421-ladsgroup.json
  • 20:14 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 20:13 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 20:13 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 20:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 20:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 20:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T382778)', diff saved to https://phabricator.wikimedia.org/P75820 and previous config saved to /var/cache/conftool/dbconfig/20250506-201145-ladsgroup.json
  • 19:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P75819 and previous config saved to /var/cache/conftool/dbconfig/20250506-195638-ladsgroup.json
  • 19:46 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
  • 19:43 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 19:42 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 19:42 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 19:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P75818 and previous config saved to /var/cache/conftool/dbconfig/20250506-194131-ladsgroup.json
  • 19:41 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 19:38 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 19:38 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 19:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T382778)', diff saved to https://phabricator.wikimedia.org/P75817 and previous config saved to /var/cache/conftool/dbconfig/20250506-192624-ladsgroup.json
  • 19:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
  • 19:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 19:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1251 (T382778)', diff saved to https://phabricator.wikimedia.org/P75816 and previous config saved to /var/cache/conftool/dbconfig/20250506-192333-ladsgroup.json
  • 19:23 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1251.eqiad.wmnet with reason: Maintenance
  • 19:22 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 19:21 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 19:21 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling neither afterwards
  • 19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T382778)', diff saved to https://phabricator.wikimedia.org/P75815 and previous config saved to /var/cache/conftool/dbconfig/20250506-192054-ladsgroup.json
  • 19:20 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 19:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 19:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P75814 and previous config saved to /var/cache/conftool/dbconfig/20250506-190547-ladsgroup.json
  • 18:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P75813 and previous config saved to /var/cache/conftool/dbconfig/20250506-185040-ladsgroup.json
  • 18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T382778)', diff saved to https://phabricator.wikimedia.org/P75812 and previous config saved to /var/cache/conftool/dbconfig/20250506-183533-ladsgroup.json
  • 18:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T382778)', diff saved to https://phabricator.wikimedia.org/P75811 and previous config saved to /var/cache/conftool/dbconfig/20250506-183222-ladsgroup.json
  • 18:32 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T382778)', diff saved to https://phabricator.wikimedia.org/P75810 and previous config saved to /var/cache/conftool/dbconfig/20250506-183159-ladsgroup.json
  • 18:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 18:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 18:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 18:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 18:17 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.28 refs T386223
  • 18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P75808 and previous config saved to /var/cache/conftool/dbconfig/20250506-181652-ladsgroup.json
  • 18:13 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 18:12 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
  • 18:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P75807 and previous config saved to /var/cache/conftool/dbconfig/20250506-180146-ladsgroup.json
  • 17:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
  • 17:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
  • 17:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
  • 17:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
  • 17:49 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 17:49 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
  • 17:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T382778)', diff saved to https://phabricator.wikimedia.org/P75806 and previous config saved to /var/cache/conftool/dbconfig/20250506-174639-ladsgroup.json
  • 17:44 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 17:44 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
  • 17:44 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
  • 17:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
  • 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T382778)', diff saved to https://phabricator.wikimedia.org/P75805 and previous config saved to /var/cache/conftool/dbconfig/20250506-174325-ladsgroup.json
  • 17:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T382778)', diff saved to https://phabricator.wikimedia.org/P75804 and previous config saved to /var/cache/conftool/dbconfig/20250506-174313-ladsgroup.json
  • 17:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
  • 17:40 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
  • 17:39 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
  • 17:31 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
  • 17:30 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 17:29 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 11s)
  • 17:29 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
  • 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P75803 and previous config saved to /var/cache/conftool/dbconfig/20250506-172807-ladsgroup.json
  • 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P75802 and previous config saved to /var/cache/conftool/dbconfig/20250506-171259-ladsgroup.json
  • 17:12 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:11 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T382778)', diff saved to https://phabricator.wikimedia.org/P75801 and previous config saved to /var/cache/conftool/dbconfig/20250506-165752-ladsgroup.json
  • 16:55 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
  • 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T382778)', diff saved to https://phabricator.wikimedia.org/P75800 and previous config saved to /var/cache/conftool/dbconfig/20250506-165438-ladsgroup.json
  • 16:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T382778)', diff saved to https://phabricator.wikimedia.org/P75799 and previous config saved to /var/cache/conftool/dbconfig/20250506-165415-ladsgroup.json
  • 16:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P75798 and previous config saved to /var/cache/conftool/dbconfig/20250506-163908-ladsgroup.json
  • 16:34 denisse: enable Puppet on Grafana2001 - T384841
  • 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin1002"
  • 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin1002
  • 16:33 cdanis@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin1002
  • 16:33 cdanis@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin1002"
  • 16:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P75797 and previous config saved to /var/cache/conftool/dbconfig/20250506-162401-ladsgroup.json
  • 16:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T382778)', diff saved to https://phabricator.wikimedia.org/P75796 and previous config saved to /var/cache/conftool/dbconfig/20250506-160854-ladsgroup.json
  • 16:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T382778)', diff saved to https://phabricator.wikimedia.org/P75795 and previous config saved to /var/cache/conftool/dbconfig/20250506-160535-ladsgroup.json
  • 16:05 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T382778)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250506-160507-ladsgroup.json
  • 16:04 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P75793 and previous config saved to /var/cache/conftool/dbconfig/20250506-155000-ladsgroup.json
  • 15:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 15:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 15:48 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 15:48 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 15:45 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: Host has crashed - T393296
  • 15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P75792 and previous config saved to /var/cache/conftool/dbconfig/20250506-153453-ladsgroup.json
  • 15:28 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
  • 15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T382778)', diff saved to https://phabricator.wikimedia.org/P75790 and previous config saved to /var/cache/conftool/dbconfig/20250506-151946-ladsgroup.json
  • 15:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1111.eqiad.wmnet with OS bullseye
  • 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T382778)', diff saved to https://phabricator.wikimedia.org/P75789 and previous config saved to /var/cache/conftool/dbconfig/20250506-151652-ladsgroup.json
  • 15:16 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T382778)', diff saved to https://phabricator.wikimedia.org/P75788 and previous config saved to /var/cache/conftool/dbconfig/20250506-151629-ladsgroup.json
  • 15:11 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 15:11 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 15:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1111.eqiad.wmnet with reason: host reimage
  • 15:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P75787 and previous config saved to /var/cache/conftool/dbconfig/20250506-150122-ladsgroup.json
  • 14:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1111.eqiad.wmnet with reason: host reimage
  • 14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P75786 and previous config saved to /var/cache/conftool/dbconfig/20250506-144615-ladsgroup.json
  • 14:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1111.eqiad.wmnet with OS bullseye
  • 14:44 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1177.eqiad.wmnet with reason: Harddrive replacement
  • 14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1111 to cirrussearch1111
  • 14:43 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1156.eqiad.wmnet with reason: Harddrive replacement
  • 14:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1111
  • 14:41 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1111
  • 14:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1111 on all recursors
  • 14:41 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1111 on all recursors
  • 14:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:41 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
  • 14:41 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
  • 14:37 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating IPs for cloudrabbit200[123]-dev - andrew@cumin1002"
  • 14:37 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:37 jnuche@deploy1003: Installation of scap version "4.161.0" completed for 2 hosts
  • 14:36 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating IPs for cloudrabbit200[123]-dev - andrew@cumin1002"
  • 14:36 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1111 to cirrussearch1111
  • 14:35 jnuche@deploy1003: Installing scap version "4.161.0" for 2 host(s)
  • 14:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 14:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:32 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T382778)', diff saved to https://phabricator.wikimedia.org/P75785 and previous config saved to /var/cache/conftool/dbconfig/20250506-143108-ladsgroup.json
  • 14:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T382778)', diff saved to https://phabricator.wikimedia.org/P75784 and previous config saved to /var/cache/conftool/dbconfig/20250506-142748-ladsgroup.json
  • 14:27 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T382778)', diff saved to https://phabricator.wikimedia.org/P75783 and previous config saved to /var/cache/conftool/dbconfig/20250506-142726-ladsgroup.json
  • 14:25 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1002"
  • 14:25 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1002
  • 14:25 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1002
  • 14:25 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1002"
  • 14:23 tgr_: UTC afternoon deploys done
  • 14:20 tgr@deploy1003: Finished scap sync-world: Backport for logging: Add context processor (T142313) (duration: 20m 37s)
  • 14:15 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on wdqs1017.eqiad.wmnet with reason: bringing host online after reimage
  • 14:13 tgr@deploy1003: tgr: Continuing with sync
  • 14:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P75782 and previous config saved to /var/cache/conftool/dbconfig/20250506-141220-ladsgroup.json
  • 14:06 tgr@deploy1003: tgr: Backport for logging: Add context processor (T142313) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:59 tgr@deploy1003: Started scap sync-world: Backport for logging: Add context processor (T142313)
  • 13:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P75781 and previous config saved to /var/cache/conftool/dbconfig/20250506-135713-ladsgroup.json
  • 13:53 tgr@deploy1003: Finished scap sync-world: Backport for private: Drop $wgCentralAuthSul3SharedDomainRestrictions (T390329) (duration: 16m 32s)
  • 13:44 tgr@deploy1003: tgr: Continuing with sync
  • 13:43 tgr@deploy1003: tgr: Backport for private: Drop $wgCentralAuthSul3SharedDomainRestrictions (T390329) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T382778)', diff saved to https://phabricator.wikimedia.org/P75780 and previous config saved to /var/cache/conftool/dbconfig/20250506-134207-ladsgroup.json
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T382778)', diff saved to https://phabricator.wikimedia.org/P75779 and previous config saved to /var/cache/conftool/dbconfig/20250506-133943-ladsgroup.json
  • 13:39 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T382778)', diff saved to https://phabricator.wikimedia.org/P75778 and previous config saved to /var/cache/conftool/dbconfig/20250506-133920-ladsgroup.json
  • 13:36 tgr@deploy1003: Started scap sync-world: Backport for private: Drop $wgCentralAuthSul3SharedDomainRestrictions (T390329)
  • 13:25 tgr@deploy1003: Finished scap sync-world: Backport for CommonSettings: Document wmfGetPrivilegedGroups usage, Revert "Add .well-known/matrix for wikimedia.org" (T223835 T261531), core-Permissions: add move-subpages to enwiki templateeditor user group (T393167), Growth-Beta: Configure higher Impact Module edit limits for pilot wikis (T341599), [
  • 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P75777 and previous config saved to /var/cache/conftool/dbconfig/20250506-132413-ladsgroup.json
  • 13:16 tgr@deploy1003: tgr, novemlinguae, cyndywikime, lucaswerkmeister-wmde: Continuing with sync
  • {{safesubst:SAL entry|1=13:14 tgr@deploy1003: tgr, novemlinguae, cyndywikime, lucaswerkmeister-wmde: Backport for CommonSettings: Document wmfGetPrivilegedGroups usage, Revert "Add .well-known/matrix for wikimedia.org" (T223835 T261531), core-Permissions: add move-subpages to enwiki templateeditor user group (T393167), [[gerrit:1136986|Growth-Beta: Configure higher Impact Module edit limits f}}
  • 13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P75776 and previous config saved to /var/cache/conftool/dbconfig/20250506-130905-ladsgroup.json
  • 13:08 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • {{safesubst:SAL entry|1=13:07 tgr@deploy1003: Started scap sync-world: Backport for CommonSettings: Document wmfGetPrivilegedGroups usage, Revert "Add .well-known/matrix for wikimedia.org" (T223835 T261531), core-Permissions: add move-subpages to enwiki templateeditor user group (T393167), Growth-Beta: Configure higher Impact Module edit limits for pilot wikis (T341599), [[}}
  • 13:01 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-staging-worker
  • 12:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T382778)', diff saved to https://phabricator.wikimedia.org/P75775 and previous config saved to /var/cache/conftool/dbconfig/20250506-125358-ladsgroup.json
  • 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T382778)', diff saved to https://phabricator.wikimedia.org/P75774 and previous config saved to /var/cache/conftool/dbconfig/20250506-125034-ladsgroup.json
  • 12:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 12:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T382778)', diff saved to https://phabricator.wikimedia.org/P75773 and previous config saved to /var/cache/conftool/dbconfig/20250506-124954-ladsgroup.json
  • 12:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P75772 and previous config saved to /var/cache/conftool/dbconfig/20250506-123448-ladsgroup.json
  • 12:27 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
  • 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P75771 and previous config saved to /var/cache/conftool/dbconfig/20250506-121940-ladsgroup.json
  • 12:11 joal@deploy1003: Finished deploy [analytics/refinery@43a5f61] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@43a5f617] (duration: 01m 37s)
  • 12:09 joal@deploy1003: Started deploy [analytics/refinery@43a5f61] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@43a5f617]
  • 12:09 joal@deploy1003: Finished deploy [analytics/refinery@43a5f61] (thin): Regular analytics weekly train THIN [analytics/refinery@43a5f617] (duration: 01m 20s)
  • 12:08 joal@deploy1003: Started deploy [analytics/refinery@43a5f61] (thin): Regular analytics weekly train THIN [analytics/refinery@43a5f617]
  • 12:07 joal@deploy1003: Finished deploy [analytics/refinery@43a5f61]: Regular analytics weekly train [analytics/refinery@43a5f617] (duration: 02m 56s)
  • 12:04 joal@deploy1003: Started deploy [analytics/refinery@43a5f61]: Regular analytics weekly train [analytics/refinery@43a5f617]
  • 12:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T382778)', diff saved to https://phabricator.wikimedia.org/P75770 and previous config saved to /var/cache/conftool/dbconfig/20250506-120434-ladsgroup.json
  • 12:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T382778)', diff saved to https://phabricator.wikimedia.org/P75769 and previous config saved to /var/cache/conftool/dbconfig/20250506-120108-ladsgroup.json
  • 12:01 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T382778)', diff saved to https://phabricator.wikimedia.org/P75768 and previous config saved to /var/cache/conftool/dbconfig/20250506-120045-ladsgroup.json
  • 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P75767 and previous config saved to /var/cache/conftool/dbconfig/20250506-114538-ladsgroup.json
  • 11:43 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 11:42 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 11:37 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on backup[2010-2014].codfw.wmnet with reason: Upgrade and restart
  • 11:36 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on backup1013.eqiad.wmnet with reason: Upgrade and restart
  • 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P75766 and previous config saved to /var/cache/conftool/dbconfig/20250506-113031-ladsgroup.json
  • 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T382778)', diff saved to https://phabricator.wikimedia.org/P75765 and previous config saved to /var/cache/conftool/dbconfig/20250506-111524-ladsgroup.json
  • 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T382778)', diff saved to https://phabricator.wikimedia.org/P75764 and previous config saved to /var/cache/conftool/dbconfig/20250506-111157-ladsgroup.json
  • 11:11 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T382778)', diff saved to https://phabricator.wikimedia.org/P75763 and previous config saved to /var/cache/conftool/dbconfig/20250506-111146-ladsgroup.json
  • 10:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P75762 and previous config saved to /var/cache/conftool/dbconfig/20250506-105639-ladsgroup.json
  • 10:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P75761 and previous config saved to /var/cache/conftool/dbconfig/20250506-104131-ladsgroup.json
  • 10:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T382778)', diff saved to https://phabricator.wikimedia.org/P75760 and previous config saved to /var/cache/conftool/dbconfig/20250506-102624-ladsgroup.json
  • 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T382778)', diff saved to https://phabricator.wikimedia.org/P75759 and previous config saved to /var/cache/conftool/dbconfig/20250506-102236-ladsgroup.json
  • 10:22 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T382778)', diff saved to https://phabricator.wikimedia.org/P75758 and previous config saved to /var/cache/conftool/dbconfig/20250506-102226-ladsgroup.json
  • 10:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P75757 and previous config saved to /var/cache/conftool/dbconfig/20250506-100719-ladsgroup.json
  • 09:57 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 09:57 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 09:56 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 09:56 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 09:56 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 09:56 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 09:55 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:55 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P75756 and previous config saved to /var/cache/conftool/dbconfig/20250506-095212-ladsgroup.json
  • 09:44 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 09:43 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 09:42 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 09:42 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 09:41 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 09:40 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 09:40 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 09:40 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 09:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database nupwiki (T390714)
  • 09:40 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database nupwiki (T390714)
  • 09:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T382778)', diff saved to https://phabricator.wikimedia.org/P75755 and previous config saved to /var/cache/conftool/dbconfig/20250506-093704-ladsgroup.json
  • 09:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T382778)', diff saved to https://phabricator.wikimedia.org/P75754 and previous config saved to /var/cache/conftool/dbconfig/20250506-093410-ladsgroup.json
  • 09:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 09:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 09:28 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 09:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 09:28 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 09:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 09:27 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 09:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 09:27 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 09:26 lucaswerkmeister-wmde@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 09:26 lucaswerkmeister-wmde@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 07:49 elukey: restart apache2 on puppetmaster1001
  • 04:07 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.24 (duration: 07m 35s)
  • 04:06 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.44.0-wmf.28 refs T386223 (duration: 62m 44s)
  • 03:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
  • 03:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
  • 03:45 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15:00:00 on wdqs[2008,2014-2015].codfw.wmnet,wdqs[1011,1016].eqiad.wmnet with reason: T388134
  • 03:44 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
  • 03:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
  • 03:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
  • 03:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
  • 03:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
  • 03:18 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
  • 03:18 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 12s)
  • 03:17 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
  • 03:17 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 13s)
  • 03:17 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
  • 03:17 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 13s)
  • 03:16 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
  • 03:16 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 14s)
  • 03:16 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
  • 03:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.28 refs T386223
  • 03:00 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 02:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 02:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 02:43 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 02:41 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 02:32 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 02:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 02:24 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 02:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 02:24 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 12s)
  • 02:24 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
  • 02:22 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 02:22 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
  • 00:24 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye

2025-05-05

  • 23:32 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
  • 23:29 eileen: civicrm upgraded from 5a1f3e8e to 6ffbde61
  • 23:14 zabe: zabe@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php enwiki --delete /home/zabe/afl_text_table_deletedump/enwiki --sleep 0.3 # T381599
  • 23:04 zabe@deploy1003: Finished scap sync-world: Backport for core-Permissions: refactor enwiki wgRemoveGroups (duration: 11m 13s)
  • 23:01 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
  • 23:01 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
  • 23:00 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
  • 22:59 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
  • 22:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
  • 22:57 zabe@deploy1003: zabe, novemlinguae: Continuing with sync
  • 22:57 zabe@deploy1003: zabe, novemlinguae: Backport for core-Permissions: refactor enwiki wgRemoveGroups synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:52 zabe@deploy1003: Started scap sync-world: Backport for core-Permissions: refactor enwiki wgRemoveGroups
  • 22:47 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
  • 22:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
  • 22:46 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
  • 22:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
  • 22:35 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.wikimedia.org with OS bookworm
  • 22:12 sbassett: Deployed security fix (2) for T392341
  • 21:57 sbassett: Deployed security fix (1) for T392341
  • 21:34 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
  • 21:15 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
  • 21:14 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.wikimedia.org with OS bookworm
  • 21:03 jsn@deploy1003: Finished scap sync-world: Backport for Fix link for first set of Patroller Tools surveys (T389401) (duration: 14m 43s)
  • 20:59 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
  • 20:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
  • 20:56 jsn@deploy1003: jsn: Continuing with sync
  • 20:56 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:56 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:55 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:55 jsn@deploy1003: jsn: Backport for Fix link for first set of Patroller Tools surveys (T389401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:51 vriley@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:50 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe1003
  • 20:49 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe1003
  • 20:48 jsn@deploy1003: Started scap sync-world: Backport for Fix link for first set of Patroller Tools surveys (T389401)
  • 20:48 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:48 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt apus-fe1003 - vriley@cumin1002"
  • 20:48 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt apus-fe1003 - vriley@cumin1002"
  • 20:44 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on wdqs[2008,2014-2015].codfw.wmnet,wdqs[1011,1016].eqiad.wmnet with reason: T388134
  • 20:41 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 20:35 jsn@deploy1003: Finished scap sync-world: Backport for Design Research Participant Survey: Undeploy (T392325), Deploy first set of Patroller Tools surveys (T389401) (duration: 19m 58s)
  • 20:28 jsn@deploy1003: dani, jsn: Continuing with sync
  • 20:21 jsn@deploy1003: dani, jsn: Backport for Design Research Participant Survey: Undeploy (T392325), Deploy first set of Patroller Tools surveys (T389401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:15 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1184.eqiad.wmnet with OS bullseye
  • 20:15 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 20:15 jsn@deploy1003: Started scap sync-world: Backport for Design Research Participant Survey: Undeploy (T392325), Deploy first set of Patroller Tools surveys (T389401)
  • 20:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 19:58 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
  • 19:46 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1184.eqiad.wmnet with reason: host reimage
  • 19:43 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1184.eqiad.wmnet with reason: host reimage
  • 19:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:35 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:27 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1184.eqiad.wmnet with OS bullseye
  • 18:12 aokoth@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 18:07 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:07 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 18:07 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 18:03 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:02 aokoth@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:30 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:30 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 17:19 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from elastic1111 to cirrussearch1111
  • 17:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:19 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rolling back cirrussearch1111 to elastic1111 - bking@cumin2002"
  • 17:19 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rolling back cirrussearch1111 to elastic1111 - bking@cumin2002"
  • 17:16 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 17:16 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
  • 16:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
  • 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:49 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
  • 16:49 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:48 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
  • 16:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:48 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
  • 16:46 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:45 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
  • 16:44 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:39 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:38 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1111 to cirrussearch1111
  • 16:30 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
  • 16:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
  • 16:24 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:20 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:20 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 16:09 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:09 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 16:07 aokoth@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:06 aokoth@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:03 aokoth@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:02 aokoth@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:46 hoo@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
  • 15:46 hoo@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:46 hoo@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:45 hoo@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 15:45 hoo@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:44 hoo@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 15:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2048.codfw.wmnet with OS bookworm
  • 15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2047.codfw.wmnet with OS bookworm
  • 15:33 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:33 dancy@deploy1003: Installation of scap version "4.160.0" completed for 2 hosts
  • 15:32 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:32 hoo@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:32 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:32 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1031.eqiad.wmnet
  • 15:32 hoo@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 15:32 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:31 hoo@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:31 dancy@deploy1003: Installing scap version "4.160.0" for 2 host(s)
  • 15:31 hoo@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 15:30 hoo@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:29 hoo@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 15:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
  • 15:25 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 15:25 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 15:23 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 15:23 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 15:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 15:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
  • 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 15:12 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:12 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:12 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:11 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:11 kartik@deploy1003: Finished scap sync-world: Backport for Revert "Remove links to Special:ContentTranslationStats from dashboards" (duration: 30m 27s)
  • 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
  • 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
  • 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
  • 15:00 kartik@deploy1003: kartik: Continuing with sync
  • 14:58 kartik@deploy1003: kartik: Backport for Revert "Remove links to Special:ContentTranslationStats from dashboards" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
  • 14:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
  • 14:44 elukey@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:42 elukey@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:40 kartik@deploy1003: Started scap sync-world: Backport for Revert "Remove links to Special:ContentTranslationStats from dashboards"
  • 14:39 kartik@deploy1003: Finished scap sync-world: Backport for Growth: Remove GELevelingUpFeaturesEnabled and GEMentorDashboardEnabled feature flags (T379566) (duration: 19m 32s)
  • 14:38 fabfur: upgrading haproxykafka to version 0.3.10 on A:cp (T393016)
  • 14:29 kartik@deploy1003: cyndywikime, kartik: Continuing with sync
  • 14:27 fabfur: enable puppet and repooled cp7001 (T393016)
  • 14:27 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
  • 14:25 kartik@deploy1003: cyndywikime, kartik: Backport for Growth: Remove GELevelingUpFeaturesEnabled and GEMentorDashboardEnabled feature flags (T379566) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:23 fabfur: uploading haproxykafka 0.3.10 on apt repo (T393016)
  • 14:19 kartik@deploy1003: Started scap sync-world: Backport for Growth: Remove GELevelingUpFeaturesEnabled and GEMentorDashboardEnabled feature flags (T379566)
  • 14:14 kartik@deploy1003: Sync cancelled.
  • 14:10 kartik@deploy1003: kartik, abi: Backport for Remove links to Special:ContentTranslationStats from dashboards (T392839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:52 kartik@deploy1003: Started scap sync-world: Backport for Remove links to Special:ContentTranslationStats from dashboards (T392839)
  • 13:47 kartik@deploy1003: Finished scap sync-world: Backport for Disable APIs used in Special:ContentTranslationStats (T392839) (duration: 13m 23s)
  • 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
  • 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
  • 13:43 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:43 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:43 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.sanitize-wiki (exit_code=1) Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:43 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
  • 13:43 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:42 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.sanitize-wiki (exit_code=1) Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:42 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:41 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:41 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:41 kartik@deploy1003: kartik, abi: Continuing with sync
  • 13:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
  • 13:39 kartik@deploy1003: kartik, abi: Backport for Disable APIs used in Special:ContentTranslationStats (T392839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:34 kartik@deploy1003: Started scap sync-world: Backport for Disable APIs used in Special:ContentTranslationStats (T392839)
  • 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
  • 13:33 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:33 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:29 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:27 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
  • 13:21 kartik@deploy1003: Finished scap sync-world: Backport for Disable Special:ContentTranslationStats page (T392839 T325790) (duration: 15m 29s)
  • 13:20 fabfur: disabled puppet on cp7001 to test haproxykafka version (T393016)
  • 13:19 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
  • 13:18 fabfur: depooling cp7001 to test new haproxykafka version (T393016)
  • 13:14 kartik@deploy1003: kartik, abi: Continuing with sync
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 13:11 elukey@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:11 elukey@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:10 kartik@deploy1003: kartik, abi: Backport for Disable Special:ContentTranslationStats page (T392839 T325790) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:09 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 13:09 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
  • 13:09 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 13:09 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
  • 13:08 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 13:08 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
  • 13:08 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 13:08 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
  • 13:07 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 13:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 13:06 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
  • 13:06 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 13:06 kartik@deploy1003: Started scap sync-world: Backport for Disable Special:ContentTranslationStats page (T392839 T325790)
  • 13:04 tappof: rebooting centrallog1002 to rollback the kernel
  • 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
  • 12:59 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
  • 12:56 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 12:52 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 12:47 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
  • 12:47 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 12:43 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
  • 12:42 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
  • 12:39 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 12:28 tappof: Rolling reboot of Prometheus nodes in eqiad (1005, 1006, 1008) to rollback the kernel
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 12:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
  • 12:06 aqu@deploy1003: Finished deploy [analytics/refinery@dbfa557] (thin): Deploying new refinery/source artifacts THIN [analytics/refinery@dbfa557d] (duration: 01m 07s)
  • 12:04 aqu@deploy1003: Started deploy [analytics/refinery@dbfa557] (thin): Deploying new refinery/source artifacts THIN [analytics/refinery@dbfa557d]
  • 12:04 aqu@deploy1003: Finished deploy [analytics/refinery@dbfa557]: Deploying new refinery/source artifacts [analytics/refinery@dbfa557d] (duration: 03m 17s)
  • 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
  • 12:01 aqu@deploy1003: Started deploy [analytics/refinery@dbfa557]: Deploying new refinery/source artifacts [analytics/refinery@dbfa557d]
  • 12:00 aqu@deploy1003: Finished deploy [analytics/refinery@dbfa557] (hadoop-test): Deploying new refinery/source artifacts TEST [analytics/refinery@dbfa557d] (duration: 00m 53s)
  • 11:59 aqu@deploy1003: Started deploy [analytics/refinery@dbfa557] (hadoop-test): Deploying new refinery/source artifacts TEST [analytics/refinery@dbfa557d]
  • 11:58 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 11:58 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet
  • 11:56 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 11:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
  • 11:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 11:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
  • 11:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 11:46 filippo@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host prometheus2006.codfw.wmnet
  • 11:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 11:45 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
  • 11:44 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 11:38 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
  • 11:34 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 11:12 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
  • 11:05 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
  • 11:05 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on backup[1010-1014].eqiad.wmnet with reason: Upgrade and restart
  • 11:04 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
  • 10:57 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
  • 10:57 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
  • 10:35 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=inference,name=codfw
  • 10:32 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
  • 10:32 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
  • 10:24 tappof: rebooting prometheus1007 into linux-image-6.1.0-33-amd64
  • 10:17 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
  • 09:58 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 09:39 elukey@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:39 elukey@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:38 elukey: depool inference/codfw from DNS discovery to safely apply new pod/container security settings - T369493
  • 09:30 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [plwiki] Add 'abusefilter-view-private' to sysop (T393353) (duration: 13m 04s)
  • 09:23 dreamyjazz@deploy1003: dreamyjazz, msz2001: Continuing with sync
  • 09:21 dreamyjazz@deploy1003: dreamyjazz, msz2001: Backport for [plwiki] Add 'abusefilter-view-private' to sysop (T393353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:17 dreamyjazz@deploy1003: Started scap sync-world: Backport for [plwiki] Add 'abusefilter-view-private' to sysop (T393353)
  • 09:03 godog: powercycle vrts1003 + vrts2002 - soft lockup T393357
  • 08:56 godog: powercycle centrallog2002 - can not login on ssh or console
  • 08:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2015.codfw.wmnet with OS bullseye
  • 08:32 tappof: rebooting prometheus2007 - no ssh, com2 via racadm hangs
  • 08:32 godog: powercycle centrallog1002 - can not login on ssh or console
  • 08:21 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage
  • 08:17 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage
  • 08:17 tappof: powercycle prometheus2008 - no ssh, mgmt console showing systemd units being deactivated, no root login
  • 08:15 elukey: powercycle prometheus2005 - no ssh, mgmt console showing systemd units being deactivated, no root login
  • 08:11 elukey: powercycle prometheus1008 - no ssh, mgmt console showing cpu soft lockup continously
  • 08:05 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 08:05 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 08:02 tappof: rebooting prometheus1005 prometheus1006 and prometheus2006
  • 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2015
  • 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015
  • 08:00 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015
  • 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2015.codfw.wmnet 209.48.192.10.in-addr.arpa 9.0.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 08:00 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2015.codfw.wmnet 209.48.192.10.in-addr.arpa 9.0.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2015 - ryankemper@cumin2002"
  • 08:00 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2015 - ryankemper@cumin2002"
  • 07:59 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 07:59 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 07:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 07:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 07:54 Dreamy_Jazz: UTC morning backport window finished
  • 07:54 dreamyjazz@deploy1003: Finished scap sync-world: Backport for nnwiki: enable wgCiteResponsiveReferences (T393299), ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803), Add checkuserwiki favicon (T393246), nupwiki: add timezone (T390711) (duration: 14m 11s)
  • 07:47 dreamyjazz@deploy1003: dreamyjazz, bunnypranav, anzx: Continuing with sync
  • 07:44 dreamyjazz@deploy1003: dreamyjazz, bunnypranav, anzx: Backport for nnwiki: enable wgCiteResponsiveReferences (T393299), ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803), Add checkuserwiki favicon (T393246), nupwiki: add timezone (T390711) synced to the testservers (https://wikitech.wikimedia.org
  • 07:40 dreamyjazz@deploy1003: Started scap sync-world: Backport for nnwiki: enable wgCiteResponsiveReferences (T393299), ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803), Add checkuserwiki favicon (T393246), nupwiki: add timezone (T390711)
  • 07:31 kartik@deploy1003: Finished scap sync-world: Backport for Mobile frequent languages entrypoint: Add dependency to sitemapper (T393144 T386223) (duration: 17m 27s)
  • 07:25 kartik@deploy1003: abi, kartik: Continuing with sync
  • 07:21 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 07:21 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2015
  • 07:20 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2015.codfw.wmnet with OS bullseye
  • 07:19 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2014.codfw.wmnet with OS bullseye
  • 07:19 kartik@deploy1003: abi, kartik: Backport for Mobile frequent languages entrypoint: Add dependency to sitemapper (T393144 T386223) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:14 kartik@deploy1003: Started scap sync-world: Backport for Mobile frequent languages entrypoint: Add dependency to sitemapper (T393144 T386223)
  • 07:11 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 07:11 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:02 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2014.codfw.wmnet with reason: host reimage
  • 06:57 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2014.codfw.wmnet with reason: host reimage
  • 06:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2014
  • 06:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2014
  • 06:37 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2014
  • 06:37 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2014.codfw.wmnet 192.16.192.10.in-addr.arpa 2.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 06:37 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2014.codfw.wmnet 192.16.192.10.in-addr.arpa 2.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 06:37 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:37 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2014 - ryankemper@cumin2002"
  • 06:37 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2014 - ryankemper@cumin2002"
  • 06:30 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 06:27 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2014
  • 06:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2014.codfw.wmnet with OS bullseye
  • 06:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
  • 06:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 06:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 06:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
  • 05:49 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2008.codfw.wmnet with OS bullseye
  • 05:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
  • 05:25 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
  • 05:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2008
  • 05:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2008
  • 05:06 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2008
  • 05:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2008.codfw.wmnet 194.32.192.10.in-addr.arpa 4.9.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 05:06 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2008.codfw.wmnet 194.32.192.10.in-addr.arpa 4.9.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 05:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2008 - ryankemper@cumin2002"
  • 05:05 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2008 - ryankemper@cumin2002"
  • 05:04 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
  • 05:00 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 04:58 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2008
  • 04:58 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2008.codfw.wmnet with OS bullseye
  • 04:38 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
  • 04:34 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
  • 04:28 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
  • 04:19 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
  • 04:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
  • 04:13 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
  • 04:12 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
  • 04:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
  • 04:05 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
  • 03:54 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
  • 03:54 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
  • 03:53 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from cloudcontrol2009-dev to cloudrabbit2003-dev
  • 03:52 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudrabbit2003-dev
  • 03:52 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudrabbit2003-dev
  • 03:52 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:50 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from cloudcontrol2008-dev to cloudrabbit2002-dev
  • 03:49 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 03:49 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudrabbit2002-dev
  • 03:49 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudrabbit2002-dev
  • 03:49 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:49 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2008-dev to cloudrabbit2002-dev - andrew@cumin1002"
  • 03:48 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2008-dev to cloudrabbit2002-dev - andrew@cumin1002"
  • 03:46 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
  • 03:44 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 03:43 andrew@cumin1002: START - Cookbook sre.hosts.rename from cloudcontrol2009-dev to cloudrabbit2003-dev
  • 03:43 andrew@cumin1002: START - Cookbook sre.hosts.rename from cloudcontrol2008-dev to cloudrabbit2002-dev
  • 03:43 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
  • 03:43 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from cloudcontrol2007-dev to cloudrabbit2001-dev
  • 03:42 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudrabbit2001-dev
  • 03:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
  • 03:42 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudrabbit2001-dev
  • 03:42 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:42 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2007-dev to cloudrabbit2001-dev - andrew@cumin1002"
  • 03:41 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2007-dev to cloudrabbit2001-dev - andrew@cumin1002"
  • 03:37 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 03:36 andrew@cumin1002: START - Cookbook sre.hosts.rename from cloudcontrol2007-dev to cloudrabbit2001-dev
  • 03:26 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
  • 03:24 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
  • 02:59 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
  • 01:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1011.eqiad.wmnet with OS bullseye
  • 01:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: host reimage
  • 01:36 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: host reimage
  • 01:19 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1011.eqiad.wmnet with OS bullseye

2025-05-04

  • 23:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1003.eqiad.wmnet
  • 23:27 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:27 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 23:27 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 23:22 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 23:16 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1003.eqiad.wmnet
  • 23:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1002.eqiad.wmnet
  • 23:15 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:15 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 23:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 23:08 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 23:02 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1002.eqiad.wmnet
  • 23:02 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1001.eqiad.wmnet
  • 23:02 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:02 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 23:01 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 22:57 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 22:52 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1001.eqiad.wmnet
  • 20:29 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 20:29 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 20:07 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1056*,elastic1063* for host appears to have hot shards - bking@cumin2002
  • 20:06 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1056*,elastic1063* for host appears to have hot shards - bking@cumin2002
  • 19:43 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1063* for host appears to have hot shards - bking@cumin2002
  • 19:43 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1063* for host appears to have hot shards - bking@cumin2002
  • 19:35 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1062* for hosts appear to have hot shards - bking@cumin2002
  • 19:35 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1062* for hosts appear to have hot shards - bking@cumin2002
  • 19:10 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1057*,elastic1058* for hosts appear to have hot shards - bking@cumin2002
  • 19:10 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1057*,elastic1058* for hosts appear to have hot shards - bking@cumin2002
  • 19:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1057* for host appears to have hot shards - bking@cumin2002
  • 19:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1057* for host appears to have hot shards - bking@cumin2002
  • 19:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1064* for host appears to have hot shards - bking@cumin2002
  • 19:03 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1064* for host appears to have hot shards - bking@cumin2002
  • 10:36 krinkle@deploy1003: Finished scap sync-world: Backport for actions: Fix handling of redirects to known (non-existing) pages (duration: 30m 22s)
  • 10:26 krinkle@deploy1003: krinkle: Continuing with sync
  • 10:22 krinkle@deploy1003: krinkle: Backport for actions: Fix handling of redirects to known (non-existing) pages synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:06 krinkle@deploy1003: Started scap sync-world: Backport for actions: Fix handling of redirects to known (non-existing) pages

2025-05-03

2025-05-02

  • 21:38 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1054.eqiad.wmnet with OS bookworm
  • 21:23 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 20:34 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 20:31 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:29 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:27 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 20:23 tzatziki: removed 3 files for legal compliance
  • 20:18 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1054.eqiad.wmnet with OS bookworm
  • 20:16 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:15 tzatziki: removed 1 file for legal compliance
  • 20:11 tzatziki: removed 1 file for legal compliance
  • 20:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:09 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:57 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:41 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 19:38 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1168.eqiad.wmnet
  • 17:27 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1168.eqiad.wmnet
  • 17:26 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1167.eqiad.wmnet
  • 17:19 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1167.eqiad.wmnet
  • 17:17 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1166.eqiad.wmnet
  • 17:09 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1166.eqiad.wmnet
  • 16:53 sukhe@dns1004: END - running authdns-update
  • 16:51 sukhe@dns1004: START - running authdns-update
  • 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-f1-codfw.mgmt.codfw.wmnet
  • 16:28 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 18:00:00 on ms-fe1016.eqiad.wmnet with reason: not yet in prod
  • 16:28 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 18:00:00 on ms-fe1015.eqiad.wmnet with reason: not yet in prod
  • 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:24 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-f1-codfw.mgmt.codfw.wmnet
  • 15:45 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1166.eqiad.wmnet
  • 15:11 herron: power cycling prometheus200[78] via rac
  • 15:06 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1168.eqiad.wmnet
  • 15:05 jgleeson: SmashPig changed from 9b3c4587 to ddf64519
  • 15:04 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1168.eqiad.wmnet
  • 15:03 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1167.eqiad.wmnet
  • 15:01 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
  • 15:01 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2076.codfw.wmnet|cirrussearch2080.codfw.wmnet|cirrussearch2081.codfw.wmnet|cirrussearch2083.codfw.wmnet|cirrussearch2084.codfw.wmnet|cirrussearch2092.codfw.wmnet|cirrussearch2093.codfw.wmnet|cirrussearch2100.codfw.wmnet|cirrussearch2106.codfw.wmnet|cirrussearch2108.codfw.wmnet
  • 15:01 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1166.eqiad.wmnet
  • 14:55 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1166.eqiad.wmnet
  • 14:48 dancy@deploy1003: Installation of scap version "4.159.0" completed for 2 hosts
  • 14:46 dancy@deploy1003: Installing scap version "4.159.0" for 2 host(s)
  • 14:11 inflatador: bking@localhost set search_codfw num_concurrent_incoming_recoveries from 20 back down to 4 after migration T391350
  • 13:49 moritzm: imported ruby-defaults 1:3.3~wmf13u1 to component/puppet7 for trixie-wikimedia T392790
  • 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2008.wikimedia.org
  • 13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2008.wikimedia.org
  • 13:25 urandom: invoked manual `garbagecollect`, Cassandra sessionstore — T390514
  • 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2007.codfw.wmnet
  • 13:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2007.codfw.wmnet
  • 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2006.codfw.wmnet
  • 12:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2006.codfw.wmnet
  • 10:06 moritzm: imported ruby-concurrent 1.1.6+dfsg-5~wmf13u1 to component/puppet7 for trixie-wikimedia T392790
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet
  • 09:54 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1167.eqiad.wmnet
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
  • 09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 09:31 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
  • 09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
  • 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
  • 08:29 XioNoX: update codfw pfw NAT - T392843
  • 08:16 jmm@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
  • 08:13 XioNoX: push pfw policies - T393098
  • 08:09 jmm@cumin1002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
  • 06:46 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1167.eqiad.wmnet
  • 06:42 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
  • 06:30 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MarkTraceur out of all services on: 2404 hosts
  • 06:21 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1167.eqiad.wmnet
  • 06:18 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
  • 06:14 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1166.eqiad.wmnet
  • 06:09 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1166.eqiad.wmnet
  • 00:41 dwisehaupt: starting staging db refresh on frdb1006 with civicrm/drupal/fredge restores from 20250430

2025-05-01

  • 22:27 thcipriani: mwscript-k8s -- resetAuthenticationThrottle.pp --wiki=aawiki --signup --ip=<istanbul ips> (x17)
  • 22:09 dzahn@deploy1003: Finished scap sync-world: Backport for Add another throttle rule for Istanbul Hackathon 2025 (T382309) (duration: 14m 32s)
  • 22:02 dzahn@deploy1003: dzahn: Continuing with sync
  • 22:00 dzahn@deploy1003: dzahn: Backport for Add another throttle rule for Istanbul Hackathon 2025 (T382309) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:54 dzahn@deploy1003: Started scap sync-world: Backport for Add another throttle rule for Istanbul Hackathon 2025 (T382309)
  • 21:40 dzahn@deploy1003: Finished scap sync-world: Backport for Add throttle rule for Istanbul Hackathon 2025 (T382309) (duration: 25m 16s)
  • 21:34 dzahn@deploy1003: dzahn: Continuing with sync
  • 21:20 dzahn@deploy1003: dzahn: Backport for Add throttle rule for Istanbul Hackathon 2025 (T382309) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:15 dzahn@deploy1003: Started scap sync-world: Backport for Add throttle rule for Istanbul Hackathon 2025 (T382309)
  • 21:03 ryankemper: T376151 [wdqs-internal lvs teardown] Declaring this officially done. No more irc log spam from me today :)
  • 21:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove VIPs for wdqs-internal - ryankemper@cumin2002"
  • 21:01 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove VIPs for wdqs-internal - ryankemper@cumin2002"
  • 21:01 ryankemper: T376151 [wdqs-internal lvs teardown] `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/codfw/wdqs-internal/wdqs` && `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/codfw/wdqs-internal/`
  • 21:01 ryankemper: T376151 [wdqs-internal lvs teardown] `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/eqiad/wdqs-internal/wdqs` && `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/eqiad/wdqs-internal/`
  • 20:54 ryankemper: T376151 [wdqs-internal lvs teardown] `sudo rm -fv /srv/config-master/pybal/eqiad/wdqs-internal && sudo rm -fv /srv/config-master/pybal/codfw/wdqs-internal` on `config-master[1,2]001`
  • 20:53 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 20:50 ryankemper: T376151 [wdqs-internal lvs teardown] Surrendered `10.2.2.41/32` (eqiad wdqs-internal vip) and `10.2.1.41/32` (codfw wdqs-internal vip) from netbox interface
  • 20:48 ryankemper@dns1004: END - running authdns-update
  • 20:46 ryankemper@dns1004: START - running authdns-update
  • 20:45 jhuneidi@deploy1003: Finished scap sync-world: Backport for Check for content validity before extracting license (T389125), Fix localization for validation errors checking tabular data (T389126) (duration: 30m 35s)
  • 20:40 sukhe: restart pybal on lvs1020
  • 20:35 jhuneidi@deploy1003: bvibber, jhuneidi: Continuing with sync
  • 20:33 jhuneidi@deploy1003: bvibber, jhuneidi: Backport for Check for content validity before extracting license (T389125), Fix localization for validation errors checking tabular data (T389126) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:32 sukhe: sudo cumin 'O:config_master' 'run-puppet-agent'
  • 20:14 jhuneidi@deploy1003: Started scap sync-world: Backport for Check for content validity before extracting license (T389125), Fix localization for validation errors checking tabular data (T389126)
  • 19:37 sukhe: no pending Netbox changes
  • 19:37 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:34 sukhe: [correction] running sre.dns.netbox to ensure no pending changes (NOT in dry-run)
  • 19:34 sukhe: running sre.dns.netbox to ensure no pending changes
  • 19:34 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 19:33 dduvall: re-ran scap sync to fix mw-jobrunner codfw deployments following failed helmfile apply and verified correct image ref manually (T386222)
  • 19:30 dduvall@deploy1003: Finished scap sync-world: retrying sync-world following spurious helmfile apply error (mw-jobrunner codfw) (duration: 11m 24s)
  • 19:20 sukhe: sukhe@netbox1003:~$ sudo systemctl start uwsgi-netbox.service: service was OOM'ed, restarting
  • 19:18 dduvall@deploy1003: Started scap sync-world: retrying sync-world following spurious helmfile apply error (mw-jobrunner codfw)
  • 19:16 jhathaway@dns1004: END - running authdns-update
  • 19:14 jhathaway@dns1004: START - running authdns-update
  • 19:09 ryankemper: T376151 [wdqs-internal lvs teardown] running puppet across `A:wdqs-internal` now that pybal has been restarted
  • 19:09 dduvall: deployment of mw-jobrunner-main for codfw failed during scap train (group2) (T386222)
  • 19:09 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] all IPVS diff check alerts have recovered, rolling restart complete
  • 19:06 dduvall: helm error during group2 deployment "Get "https://kubemaster.svc.codfw.wmnet:6443/api/v1/namespaces/mw-jobrunner/services/mediawiki-main-tls-service": dial tcp 10.2.1.8:6443: connect: no route to host - error from a previous attempt: read tcp 10.64.16.93:41894->10.2.1.8:6443: read: connection reset by peer"
  • 19:04 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] `ipvsadm --delete-service --tcp-service 10.2.2.41:80` on `lvs1019` and `lvs1020`
  • 19:03 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] `ipvsadm --delete-service --tcp-service 10.2.1.41:80` on `A:lvs-secondary-codfw OR A:lvs-low-traffic-codfw`(lvs2013, lvs2014)
  • 18:59 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-low-traffic-codfw` (lvs2013)
  • 18:58 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-secondary-codfw` (lvs2014), waiting 2 mins before proceeding
  • 18:55 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-low-traffic-eqiad` (lvs1019), waiting few mins before proceeding
  • 18:48 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-secondary-eqiad`, it only restarted on ` lvs1020` but for some reason ` lvs1013` doesn't have a pybal service running
  • 18:44 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] ran puppet on `O:Lvs::balancer` after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1136747
  • 18:32 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 18:31 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 18:30 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 18:29 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 18:28 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 18:27 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply
  • 18:26 ryankemper: T376151 (wdqs-internal lvs teardown) Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1136744 to flip `wdqs-internal` service state to `lvs_setup` and running puppet across `A:dnsbox`
  • 18:24 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.27 refs T386222
  • 18:23 ryankemper@dns1004: END - running authdns-update
  • 18:21 ryankemper@dns1004: START - running authdns-update
  • 17:31 jhathaway: testing sasl email relaying on mx-in{1001,2001}
  • 16:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 16:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 16:39 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 16:38 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 16:04 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:02 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2045.codfw.wmnet with OS bookworm
  • 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2045.codfw.wmnet with reason: host reimage
  • 15:40 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2045.codfw.wmnet with reason: host reimage
  • 15:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 15:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
  • 15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
  • 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
  • 15:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:54 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:54 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 13:51 TheresNoTime: ran `[samtar@deploy1003 ~]$ mwscript-k8s --comment="T393093" --follow -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=knwikiquote --logwiki=metawiki '~aanzx' 'A826'` for T393093
  • 13:49 samtar@deploy1003: Finished scap sync-world: Backport for mswikisource: add NamespacesToBeSearchedDefault (T392984) (duration: 12m 44s)
  • 13:42 samtar@deploy1003: anzx, samtar: Continuing with sync
  • 13:41 samtar@deploy1003: anzx, samtar: Backport for mswikisource: add NamespacesToBeSearchedDefault (T392984) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:39 urandom: invoking garbagecollect on sessionstore cluster — T390514
  • 13:36 samtar@deploy1003: Started scap sync-world: Backport for mswikisource: add NamespacesToBeSearchedDefault (T392984)
  • 13:34 urandom: lowering sessionstore gc_grace_seconds to 172800 (two days) — T390514
  • 13:31 samtar@deploy1003: Finished scap sync-world: Backport for [arwiki] Change logo and tagline with sync wordmark (T392858) (duration: 21m 53s)
  • 13:24 samtar@deploy1003: gergesshamon, samtar: Continuing with sync
  • 13:17 samtar@deploy1003: gergesshamon, samtar: Backport for [arwiki] Change logo and tagline with sync wordmark (T392858) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:09 samtar@deploy1003: Started scap sync-world: Backport for [arwiki] Change logo and tagline with sync wordmark (T392858)
  • 12:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 12:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 12:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 12:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:12 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:12 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:46 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 09:45 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 05:29 eileen: civicrm upgraded from 6c99f0c9 to 5a1f3e8e
  • 05:14 eileen: config revision changed from b200409c to ddf64519
  • 01:32 tstarling@deploy1003: Finished scap sync-world: Backport for testwiki: enable wgUseCodexSpecialBlock and wgEnableMultiBlocks (T377121) (duration: 13m 52s)
  • 01:25 tstarling@deploy1003: tstarling, musikanimal: Continuing with sync
  • 01:25 tstarling@deploy1003: tstarling, musikanimal: Backport for testwiki: enable wgUseCodexSpecialBlock and wgEnableMultiBlocks (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 01:18 tstarling@deploy1003: Started scap sync-world: Backport for testwiki: enable wgUseCodexSpecialBlock and wgEnableMultiBlocks (T377121)


Archives

See Server Admin Log/Archives.